Re: [HACKERS] PostgreSQL 8.0.6 crash
On Fri, Feb 10, 2006 at 12:16:04PM -0500, Mark Woodward wrote: And even when PostgreSQL has the server all to itself, having a hashagg spill to disk is *way* better than pushing the machine into a swap storm. At least if you spill the hashagg you only have one backend running at a snail's pace; a swap storm means next to nothing gets done. This was/is an example of where the behavior of PostgreSQL is clearly unacceptable. OK, yes, this problem goes away with an ANALYZE, but it isn't clear how anyone could have known this, and unexpected behavior is bad in any product. Care to submit a documentation patch before releases are bundled (I think on Sunday?) At least then people would be aware that work_mem is just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll have time before the release. :( I would be glad too. What's the process? Well, find the appropriate file in doc/src/sgml, make a copy, edit the file, generate a diff with diff -u, and email that diff/patch to pgsql-patches. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PostgreSQL 8.0.6 crash
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: That's why merely allocating tons of swap doesn't necessarily protect you. It's still possible for a process (or several processes if you allocate more swap than you have address space) to mmap gigabytes of memory without touching it and then start touching those pages. So? If the swap exists to back that memory, there's no problem. It might be slow, but it will not fail. Sure, but there's no way to know how much swap you need. No matter how much swap you allocate these processes can allocate more pages of untouched RAM and then blow up. Of course realistically allocating 4G of swap is enough to deal with something like Postgres where you're not being maliciously attacked. One process on ia32 can't accidentally allocate more than 4G of ram. I was just trying to clarify the situation since someone made some comment about it having to do with memory being swapped out and then finding nowhere to swap in when needed. That's not exactly what's happening. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.0.6 crash
Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: That's why merely allocating tons of swap doesn't necessarily protect you. It's still possible for a process (or several processes if you allocate more swap than you have address space) to mmap gigabytes of memory without touching it and then start touching those pages. So? If the swap exists to back that memory, there's no problem. It might be slow, but it will not fail. Sure, but there's no way to know how much swap you need. No matter how much swap you allocate these processes can allocate more pages of untouched RAM and then blow up. Of course realistically allocating 4G of swap is enough to deal with something like Postgres where you're not being maliciously attacked. One process on ia32 can't accidentally allocate more than 4G of ram. I was just trying to clarify the situation since someone made some comment about it having to do with memory being swapped out and then finding nowhere to swap in when needed. That's not exactly what's happening. I guess the fundamental issue is whether Linux requires all mmap()'ed file contents to be in memory, or whether it pushes data to disk and unmaps it as it runs low on memory. I don't know the answer to that. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
Greg Stark [EMAIL PROTECTED] writes: I was just trying to clarify the situation since someone made some comment about it having to do with memory being swapped out and then finding nowhere to swap in when needed. That's not exactly what's happening. No. I believe the case that is actually hard for the kernel to predict comes from copy-on-write: when a process forks, you could potentially need twice its current memory image, but in reality you probably won't ever need that much since many of the shared pages won't ever be written by either process. However, a non-overcommitting kernel must assume that worst case, and hence fail the fork() if it doesn't have enough swap space to cover both processes. If it does not, then the crunch comes when one process does touch a shared page. If there is no available swap space at that time, kill -9 is the only recourse, because there is no way in the Unix API to fail a write to valid memory. The reason for having a lot more swap space than you really need is just to cover the potential demand from copy-on-write of pages that are currently shared. But given the price of disk these days, it's pretty cheap insurance. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.0.6 crash
Rick Gigger [EMAIL PROTECTED] writes: However if hashagg truly does not obey the limit that is supposed to be imposed by work_mem then it really ought to be documented. Is there a misunderstanding here and it really does obey it? Or is hashagg an exception but the other work_mem associated operations work fine? Or is it possible for them all to go out of bounds? hashagg is the exception. It should be fixed, not documented, but no one's got round to that. Well, it is clearly a pathalogical condition. Fixed? Sure, but someone should document it so that others don't stumble across it. One point to consider is that if the planner's estimate is as far off as exhibited in the OP's example, a hashagg that does spill to disk is likely to take so long that he'll be back here complaining that the query never terminates ;-). That's not fair, now is it? This isn't about the OP (me), it is about PostgreSQL behaving badly. In most practical situations, I think exceeding work_mem is really the best solution, as long as it's not by more than 10x or 100x. It's when the estimate is off by many orders of magnitude that you've got a problem. Running out of memory is not necessarily the worst response ... as long as the system doesn't kill the process in response to that. I don't agree with you here. Many PostgreSQL installations use PostgreSQL as part of a larger whole. Adjusting work_mem should give the admin some control over the memory footprint of the system. It is documented as the limit a specific function path will use before spilling to disk. I set up a lot of systems and I write a lot of software that uses PostgreSQL. Periodically I run across features/problems/limitations of PostgreSQL and post them. This was/is an example of where the behavior of PostgreSQL is clearly unacceptable. OK, yes, this problem goes away with an ANALYZE, but it isn't clear how anyone could have known this, and unexpected behavior is bad in any product. In your statement, he'll be back here complaining that the query never terminates, that's not true. A long query typically gets examined with explain (or in Oracle, explain plan) and evaluated from there. When the process exhibits runaway memory use, that's a problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 03:13:22PM -0500, Greg Stark wrote: Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: Greg Stark [EMAIL PROTECTED] writes: It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. See ulimit (or local equivalent). As much as setting ulimit in shell scripts is fun, I have to admit that I really don't see it happening very much. For one thing it requires admin access to the startup scripts to arrange this. And it's always cluster-wide. Having a GUC parameter would mean it could be set per-session. Even if the GUC parameter were just implemented by calling setrlimit it might be useful. Trying to tune work_mem is extremely difficult in PostgreSQL, because you are constantly running the risk of sending the server into a swap-storm. Having a set-able per-backend memory limit would allow a lot more flexability in setting work_mem, because you could now ensure that you wouldn't push the server into serious swapping. Even better would be a means to set a cluster-wide memory limit, but of course that's substantially more work. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 05:04:38PM -0500, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, Yes it does, because the situation will never arise. it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Given the current price of disk, there is no sane reason not to have enough swap space configured to make this not-a-problem. The OOM kill mechanism was a reasonable solution for running systems that were not expected to be too reliable anyway on small hardware, but if you're trying to run a 24/7 server you're simply incompetent if you don't disable it. BTW, I was shocked when I found out that FreeBSD actually has an OOM killer itself. Yet I've never heard of anyone having problems with it. Granted, the FreeBSD OOM could be better designed to pick the right process to kill, but I'd bet that the real reason you never hear about it is because FreeBSD admins are clued enough to a) setup a reasonable amount of swap and b) do a better job of monitoring memory usage so that you don't start swapping in the first place. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote: In most practical situations, I think exceeding work_mem is really the best solution, as long as it's not by more than 10x or 100x. It's when the estimate is off by many orders of magnitude that you've got a problem. Running out of memory is not necessarily the worst response ... as long as the system doesn't kill the process in response to that. I don't agree with you here. Many PostgreSQL installations use PostgreSQL as part of a larger whole. Adjusting work_mem should give the admin some control over the memory footprint of the system. It is documented as the limit a specific function path will use before spilling to disk. And even when PostgreSQL has the server all to itself, having a hashagg spill to disk is *way* better than pushing the machine into a swap storm. At least if you spill the hashagg you only have one backend running at a snail's pace; a swap storm means next to nothing gets done. This was/is an example of where the behavior of PostgreSQL is clearly unacceptable. OK, yes, this problem goes away with an ANALYZE, but it isn't clear how anyone could have known this, and unexpected behavior is bad in any product. Care to submit a documentation patch before releases are bundled (I think on Sunday?) At least then people would be aware that work_mem is just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll have time before the release. :( -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Fri, Feb 10, 2006 at 10:01:18AM -0600, Jim C. Nasby wrote: BTW, I was shocked when I found out that FreeBSD actually has an OOM killer itself. Yet I've never heard of anyone having problems with it. Granted, the FreeBSD OOM could be better designed to pick the right process to kill, but I'd bet that the real reason you never hear about it is because FreeBSD admins are clued enough to a) setup a reasonable amount of swap and b) do a better job of monitoring memory usage so that you don't start swapping in the first place. Hmm, I do wonder what FreeBSDs overcommit policy is. For example on my computer right now the total allocated VM is approximately 3 times the actual memory in the computer and about twice if you include swap. By a strict policy of overcommit my computer wouldn't complete the boot sequence, whereas as currently it runs without using any swap. Disabling overcommit has a serious cost in that most of your VM will never be used. Are people really suggesting that I can't run a few daemons, X and a web-browser on FreeBSD without allocating 3 times my physical memory in swap? However, my real question is: while trying to find info about FreeBSDs overcommit policy, I just get lot of people complaining about freebsd killing random processes. Does anyone know a site that describes how it works? I understand Linux's overcommit policy just fine. Disclaimer: The Linux OOM killer has never killed the wrong process for me, so I don't have any bad experiences with overcommit. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote: In most practical situations, I think exceeding work_mem is really the best solution, as long as it's not by more than 10x or 100x. It's when the estimate is off by many orders of magnitude that you've got a problem. Running out of memory is not necessarily the worst response ... as long as the system doesn't kill the process in response to that. I don't agree with you here. Many PostgreSQL installations use PostgreSQL as part of a larger whole. Adjusting work_mem should give the admin some control over the memory footprint of the system. It is documented as the limit a specific function path will use before spilling to disk. And even when PostgreSQL has the server all to itself, having a hashagg spill to disk is *way* better than pushing the machine into a swap storm. At least if you spill the hashagg you only have one backend running at a snail's pace; a swap storm means next to nothing gets done. This was/is an example of where the behavior of PostgreSQL is clearly unacceptable. OK, yes, this problem goes away with an ANALYZE, but it isn't clear how anyone could have known this, and unexpected behavior is bad in any product. Care to submit a documentation patch before releases are bundled (I think on Sunday?) At least then people would be aware that work_mem is just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll have time before the release. :( I would be glad too. What's the process? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
Martijn van Oosterhout kleptog@svana.org writes: Disclaimer: The Linux OOM killer has never killed the wrong process for me, so I don't have any bad experiences with overcommit. You haven't tried real hard. What I've seen recently when I do something that makes a PG backend go overboard is that the kernel zaps both the misbehaving backend and the bgwriter process. No idea what it's got against the bgwriter, but the behavior's been pretty consistent under recent Fedora 4 kernels ... (This is on a development machine, not a server, so I'm not particularly worried by leaving the default overcommit policy in place. I wouldn't do that on a server --- but it's not worth my time to change it on a devel machine.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
Martijn van Oosterhout kleptog@svana.org writes: Disabling overcommit has a serious cost in that most of your VM will never be used. Are people really suggesting that I can't run a few daemons, X and a web-browser on FreeBSD without allocating 3 times my physical memory in swap? There's a possibility you're misreading your process info there. The X server maps large areas of graphics memory in its address space which shows up as virtual memory used in ps on some OSes. I'm not sure if BSD is included there. That's not memory that can ever be swapped out and it doesn't take up any memory from the non-video memory. That said, it's true that some applications allocate much more memory than needed. There's a bit of a feedback loop here. Because application writers know that OSes overcommit they don't worry about avoiding unnecessary allocations. Sun's original Java memory management system just started off with allocating 20MB before it ran a single bytecode for example. That's why merely allocating tons of swap doesn't necessarily protect you. It's still possible for a process (or several processes if you allocate more swap than you have address space) to mmap gigabytes of memory without touching it and then start touching those pages. Hopefully the OOM killer targets the offender but there's no real way for it to guarantee it. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.0.6 crash
Greg Stark [EMAIL PROTECTED] writes: That's why merely allocating tons of swap doesn't necessarily protect you. It's still possible for a process (or several processes if you allocate more swap than you have address space) to mmap gigabytes of memory without touching it and then start touching those pages. So? If the swap exists to back that memory, there's no problem. It might be slow, but it will not fail. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] PostgreSQL 8.0.6 crash
PostgreSQL promptly uses all available memory for the query and subsequently crashes. I'm sure it can be corrected with a setting, but should it crash? freedb=# create table ucode as select distinct ucode from cdtitles group by ucode having count(ucode)1 ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. (Table layout) freedb=# \d cdtitles Table public.cdtitles Column | Type| Modifiers ---+---+--- cdid | integer | default nextval('seq_cdid'::text) cdcode| character varying | ucode | character varying | title | character varying | artist| character varying | genre | character varying | dgenre| character varying | year | integer | processed | character varying | submit| character varying | revision | integer | disclen | integer | tracks| integer | extkey| integer | offsets | integer[] | Indexes: cdtitles_ucode btree (ucode) CLUSTER (Size of table) freedb=# select count(*) from cdtitles ; count - 1927912 (1 row) (Sample row) freedb=# select * from cdtitles where cdid = 11 limit 1; cdid | cdcode | ucode | title | artist | genre | dgenre | year |processed | submit | revision | disclen | tracks | extkey | offsets +--+---+--++---++--+-+-+--+-+++--- 11 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V#65533;arna (CD1) | Streaplers | rock ||0 | cddbd v1.5PL3 Copyright (c) Steve Scherf et al. | dBpowerAMP V4.0 |0 | | 19 | | {21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0} (1 row postgresql.conf Description: Binary data ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
More info: the machine has 512M RAM and 512M swap Work mem is set to:work_mem = 1024 This should't have crashed, should it? PostgreSQL promptly uses all available memory for the query and subsequently crashes. I'm sure it can be corrected with a setting, but should it crash? freedb=# create table ucode as select distinct ucode from cdtitles group by ucode having count(ucode)1 ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. (Table layout) freedb=# \d cdtitles Table public.cdtitles Column | Type| Modifiers ---+---+--- cdid | integer | default nextval('seq_cdid'::text) cdcode| character varying | ucode | character varying | title | character varying | artist| character varying | genre | character varying | dgenre| character varying | year | integer | processed | character varying | submit| character varying | revision | integer | disclen | integer | tracks| integer | extkey| integer | offsets | integer[] | Indexes: cdtitles_ucode btree (ucode) CLUSTER (Size of table) freedb=# select count(*) from cdtitles ; count - 1927912 (1 row) (Sample row) freedb=# select * from cdtitles where cdid = 11 limit 1; cdid | cdcode | ucode | title | artist | genre | dgenre | year |processed | submit | revision | disclen | tracks | extkey | offsets +--+---+--++---++--+-+-+--+-+++--- 11 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V#65533;arna (CD1) | Streaplers | rock ||0 | cddbd v1.5PL3 Copyright (c) Steve Scherf et al. | dBpowerAMP V4.0 |0 | | 19 | | {21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0} (1 row ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: PostgreSQL promptly uses all available memory for the query and subsequently crashes. I'll bet a nickel this is on a Linux machine with OOM kill enabled. What does the postmaster log show --- or look in the kernel log to see if it mentions anything about an out-of-memory kill. freedb=# create table ucode as select distinct ucode from cdtitles group by ucode having count(ucode)1 ; server closed the connection unexpectedly What does EXPLAIN show as the plan for that? If it's a hash aggregate, try with enable_hashagg turned off. How many distinct ucode values are there in the table? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: PostgreSQL promptly uses all available memory for the query and subsequently crashes. I'll bet a nickel this is on a Linux machine with OOM kill enabled. What does the postmaster log show --- or look in the kernel log to see if it mentions anything about an out-of-memory kill. That's a no brainer. Maybe I should have rephrased the condition, crash may be the wrong word, it was definitely killed by out of memory. Sorry. freedb=# create table ucode as select distinct ucode from cdtitles group by ucode having count(ucode)1 ; server closed the connection unexpectedly What does EXPLAIN show as the plan for that? If it's a hash aggregate, try with enable_hashagg turned off. How many distinct ucode values are there in the table? There are over 1.7M distinct rows, about 200K non-distinct that I want to somehow remove. It does have hash aggregate: freedb=# explain select distinct ucode from cdtitles group by ucode having count(ucode)1 ; QUERY PLAN - Unique (cost=106536.32..106537.32 rows=200 width=32) - Sort (cost=106536.32..106536.82 rows=200 width=32) Sort Key: ucode - HashAggregate (cost=106527.68..106528.68 rows=200 width=32) Filter: (count(ucode) 1) - Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912 width=32) (6 rows) Well, shouldn't hash aggregate respect work memory limits? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: - HashAggregate (cost=106527.68..106528.68 rows=200 width=32) Filter: (count(ucode) 1) - Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912 width=32) Well, shouldn't hash aggregate respect work memory limits? If the planner thought there were 1.7M distinct values, it wouldn't have used hash aggregate ... but it only thinks there are 200, which IIRC is the default assumption. Have you ANALYZEd this table lately? Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: - HashAggregate (cost=106527.68..106528.68 rows=200 width=32) Filter: (count(ucode) 1) - Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912 width=32) Well, shouldn't hash aggregate respect work memory limits? If the planner thought there were 1.7M distinct values, it wouldn't have used hash aggregate ... but it only thinks there are 200, which IIRC is the default assumption. Have you ANALYZEd this table lately? I thought that I had, but I did CLUSTER at some point. Or maybe I didn't I'm, not sure. I have been working on a file reader/parser/importer program. I created and dropped the DB so many times it is hard to keep track. Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. How is this any different than the FreeBSD having a default 512M process size limit? On FreeBSD, the process would have been killed earlier. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? Think of it as a kernel bug. Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. How is this any different than the FreeBSD having a default 512M process size limit? On FreeBSD, the process would have been killed earlier. No, the process would have been politely told it was out of memory, and would have told you the same. If the kernel's way of notifying a process that it's out of memory is SIGKILL, there is not a damn thing that we can do to operate robustly. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? Think of it as a kernel bug. While I respect your viewpoint that the Linux kernel should not kill an offending process if the system runs out of memory, I sort of disagree in that OOM is a disaster preventor. It should be viewed as a last ditch him or me choice the kernel needs to make and it should not get into that position in the first place. Regardless, it is troubling that failing to have current stats can cause the system, with a large data set, to exceed working memory limits. I think it is still a bug. While it may manifest itself as a pg crash on Linux because of a feature with which you have issue, the fact remains that PG is exeeding its working memory limit. Should failing to run ANALYZE cause this behavior? If so, how does this get clearly documented? If not, can it be prevented? Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. How is this any different than the FreeBSD having a default 512M process size limit? On FreeBSD, the process would have been killed earlier. No, the process would have been politely told it was out of memory, and would have told you the same. If the kernel's way of notifying a process that it's out of memory is SIGKILL, there is not a damn thing that we can do to operate robustly. Lets not waste time on a Linux discussion. Linux and FreeBSD have their strengths, and a debate on the dubious merits of either is a long and contentious debate. Both systems are fine, just with some subtle differences in design goals. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.0.6 crash
Tom Lane wrote: Mark Woodward [EMAIL PROTECTED] writes: Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? Think of it as a kernel bug. Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. How is this any different than the FreeBSD having a default 512M process size limit? On FreeBSD, the process would have been killed earlier. No, the process would have been politely told it was out of memory, and would have told you the same. If the kernel's way of notifying a process that it's out of memory is SIGKILL, there is not a damn thing that we can do to operate robustly. And we have docco on it: http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105 which I assume is still current. Back in October I mentioned the OOM killer to Andrew Morton - his reaction was a very pained look and a curt admonition: turn it off. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: I think it is still a bug. While it may manifest itself as a pg crash on Linux because of a feature with which you have issue, the fact remains that PG is exeeding its working memory limit. The problem is that *we have no way to know what that limit is* --- short of exceeding it and being summarily killed. (BTW, the kernel doesn't know what the limit is either.) There is simply not any way to operate robustly under the OOM-kill regime. While I'll certainly acknowledge that it'd be nice if hashagg had spill-to-disk capability, that wouldn't alter the fundamental fact that if you want reliable behavior you MUST turn off OOM kill. There is not anything we can do at the database level to work around that kernel-level misdesign. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: I think it is still a bug. While it may manifest itself as a pg crash on Linux because of a feature with which you have issue, the fact remains that PG is exeeding its working memory limit. The problem is that *we have no way to know what that limit is* --- short of exceeding it and being summarily killed. (BTW, the kernel doesn't know what the limit is either.) There is simply not any way to operate robustly under the OOM-kill regime. No, you misunderstand what I said, the working memory as defined in postgresql.conf. I don't care about the OS debate. While I'll certainly acknowledge that it'd be nice if hashagg had spill-to-disk capability, that wouldn't alter the fundamental fact that if you want reliable behavior you MUST turn off OOM kill. There is not anything we can do at the database level to work around that kernel-level misdesign. Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. At issue is would a lack of ANALYZE justify this behavior? If so, it should be documented. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.0.6 crash
* Tom Lane ([EMAIL PROTECTED]) wrote: Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Unless I've missed something here, disabling the OOM killer doesn't really solve the problem here. What solves the problem is running ANALYZE but it's still certainly the case that it would make some sense for the Postmaster, upon realizing that it's gone well beyond its work_mem boundary, to ideally switch plans to one which isn't going to exceed its work_mem limit. Less ideally, it could give up and issue an error to the user instead of running the box out of memory. I appriciate that this is probably not very easy to implement but I do believe the current situation could be improved in this regard. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 8.0.6 crash
Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Sorry, I must strongly disagree here. The postgresql.conf working mem is a VERY IMPORTANT setting, it is intended to limit the consumption of memory by the postgresql process. Often times PostgreSQL will work along side other application servers on the same system, infact, may be a sub-part of application servers on the same system. (This is, in fact, how it is used on one of my site servers.) Clearly, if the server will use 1000 times this number (Set for 1024K, but exceeds 1G) this is broken, and it may cause other systems to fail or perform very poorly. If it is not something that can be fixed, it should be clearly documented. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Sorry, I must strongly disagree here. The postgresql.conf working mem is a VERY IMPORTANT setting, it is intended to limit the consumption of memory by the postgresql process. Often times PostgreSQL will work along Actually, no, it's not designed for that at all. side other application servers on the same system, infact, may be a sub-part of application servers on the same system. (This is, in fact, how it is used on one of my site servers.) Clearly, if the server will use 1000 times this number (Set for 1024K, but exceeds 1G) this is broken, and it may cause other systems to fail or perform very poorly. If it is not something that can be fixed, it should be clearly documented. work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. So it says right there that it's very easy to exceed work_mem by a very large amount. Granted, this is a very painful problem to deal with and will hopefully be changed at some point, but it's pretty clear as to how this works. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 11:42:57AM -0500, Mark Woodward wrote: Mark Woodward [EMAIL PROTECTED] writes: Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? Think of it as a kernel bug. While I respect your viewpoint that the Linux kernel should not kill an offending process if the system runs out of memory, I sort of disagree in that OOM is a disaster preventor. It should be viewed as a last ditch him or me choice the kernel needs to make and it should not get into that position in the first place. I've had processes run away on a FreeBSD box before, to the extent of running entirely out of swap and memory. Instead of random processes just dying for no apparent reason, I instead started getting a bunch of out-of-memory errors. No disaster, I just fixed the problem and life went on. Well, ok, the box did become rather unresponsive when my fix for the problem meant that all the sudden there were about 950 perl processes trying to run at the same time. I wish I'd captured top showing 900+ runnable processes. But after a few minutes the processes started completing and exiting and everything was soon back to normal. I rather doubt Linux would survive that... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.0.6 crash
* Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: If it is not something that can be fixed, it should be clearly documented. work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. So it says right there that it's very easy to exceed work_mem by a very large amount. Granted, this is a very painful problem to deal with and will hopefully be changed at some point, but it's pretty clear as to how this works. It also says that when it goes over, it'll spill to disk. Additionally, we're talking about one hash here, not multiple ones. It seems at least misleading as, if I understand correctly, Postgres isn't actually actively checking to see if the amount of memory used by an in-progress hash creation has gone over the limit but rather it guesses at how much memory will be used during the planning stage to decide if a hash plan is possible or not. That guess can certainly be wrong but there's nothing in place to handle the situation where the guess is wrong... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Sorry, I must strongly disagree here. The postgresql.conf working mem is a VERY IMPORTANT setting, it is intended to limit the consumption of memory by the postgresql process. Often times PostgreSQL will work along Actually, no, it's not designed for that at all. I guess that's a matter of opinion. side other application servers on the same system, infact, may be a sub-part of application servers on the same system. (This is, in fact, how it is used on one of my site servers.) Clearly, if the server will use 1000 times this number (Set for 1024K, but exceeds 1G) this is broken, and it may cause other systems to fail or perform very poorly. If it is not something that can be fixed, it should be clearly documented. work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. So it says right there that it's very easy to exceed work_mem by a very large amount. Granted, this is a very painful problem to deal with and will hopefully be changed at some point, but it's pretty clear as to how this works. Well, if you read that paragraph carefully, I'll admit that I was a little too literal in my statement apliying it to the process and not specific functions within the process, but in the documentation: each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. According to the documentation the behavior of hashagg is broken. It did not use up to this amount and then start to use temporary files, it used 1000 times this limit and was killed by the OS. I think it should be documented as the behavior is unpredictable. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
Stephen Frost [EMAIL PROTECTED] writes: Unless I've missed something here, disabling the OOM killer doesn't really solve the problem here. Well in a way it does. Postgres would get an out-of-memory error from malloc which it would handle properly and the world would be happy. Except not quite, since I think an out of memory error still means that backend exits instead of just that query failing. That means if you have an application running such as apache then all subsequent transactions on that connection fail too, instead of just the transaction that misbehaved. And as the other poster mentioned, having Postgres use up every available byte of memory isn't really very friendly to anything else running on the box. It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. That would let an application continue operating normally after getting an error. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.0.6 crash
Greg Stark [EMAIL PROTECTED] writes: Except not quite, since I think an out of memory error still means that backend exits instead of just that query failing. Not at all! PG will recover from this perfectly well ... if it's given the opportunity, rather than being SIGKILLed. It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. See ulimit (or local equivalent). regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
Greg Stark wrote: Well in a way it does. Postgres would get an out-of-memory error from malloc which it would handle properly and the world would be happy. Except not quite, since I think an out of memory error still means that backend exits instead of just that query failing. Not at all -- the transaction is aborted, but the backend can continue working perfectly fine. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.0.6 crash
* Tom Lane ([EMAIL PROTECTED]) wrote: Greg Stark [EMAIL PROTECTED] writes: It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. See ulimit (or local equivalent). As much as setting ulimit in shell scripts is fun, I have to admit that I really don't see it happening very much. Having Postgres set a ulimit for itself may not be a bad idea and would perhaps provide a least suprise for new users. Perhaps shared_buffers + 10*work_mem + maintenance_work_mem + max_stack_depth? Then errors from running out of memory could provide a 'HINT: Memory consumption went well over allowed work_mem, perhaps you need to run ANALYZE or raise work_mem?'. Just some thoughts, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 8.0.6 crash
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: Greg Stark [EMAIL PROTECTED] writes: It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. See ulimit (or local equivalent). As much as setting ulimit in shell scripts is fun, I have to admit that I really don't see it happening very much. For one thing it requires admin access to the startup scripts to arrange this. And it's always cluster-wide. Having a GUC parameter would mean it could be set per-session. Even if the GUC parameter were just implemented by calling setrlimit it might be useful. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.0.6 crash
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: Greg Stark [EMAIL PROTECTED] writes: It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. See ulimit (or local equivalent). As much as setting ulimit in shell scripts is fun, I have to admit that I really don't see it happening very much. For one thing it requires admin access to the startup scripts to arrange this. And it's always cluster-wide. Having a GUC parameter would mean it could be set per-session. Even if the GUC parameter were just implemented by calling setrlimit it might be useful. I don't think it needs a new GUC parameter, just having hashagg respect work_mem would fix the problem. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Thu, Feb 09, 2006 at 02:35:34PM -0500, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Except not quite, since I think an out of memory error still means that backend exits instead of just that query failing. Not at all! PG will recover from this perfectly well ... if it's given the opportunity, rather than being SIGKILLed. FWIW, the problem is mainly from the situation where some process accesses a piece of memory that has been swapped out, but there is no memory available to swap the page in. Or write to a page marked copy-on-write. What do you do? There's is no way to return -ENOMEM from a normal memory access and PostgreSQL wouldn't handle that anyway. When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Feb 9, 2006, at 11:22 AM, Stephen Frost wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Unless I've missed something here, disabling the OOM killer doesn't really solve the problem here. What solves the problem is running ANALYZE but it's still certainly the case that it would make some sense for the Postmaster, upon realizing that it's gone well beyond its work_mem boundary, to ideally switch plans to one which isn't going to exceed its work_mem limit. Less ideally, it could give up and issue an error to the user instead of running the box out of memory. So is the work_mem paramater that is set not strictly enforced? Is it more like a suggestions as to what it SHOULD use and not a hard limit on how much memory the each process is ALLOWED to use? If his work_mem is set to 1 mb and that process is using 500 mb for tasks that are supposed to stay in work_mem then doesn't that mean that that limit is not really a hard limit but rather a suggestion? Rick ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL 8.0.6 crash
Martijn van Oosterhout kleptog@svana.org writes: When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, Yes it does, because the situation will never arise. it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Given the current price of disk, there is no sane reason not to have enough swap space configured to make this not-a-problem. The OOM kill mechanism was a reasonable solution for running systems that were not expected to be too reliable anyway on small hardware, but if you're trying to run a 24/7 server you're simply incompetent if you don't disable it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PostgreSQL 8.0.6 crash
Martijn van Oosterhout kleptog@svana.org writes: When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, Yes it does, because the situation will never arise. it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Given the current price of disk, there is no sane reason not to have enough swap space configured to make this not-a-problem. The OOM kill mechanism was a reasonable solution for running systems that were not expected to be too reliable anyway on small hardware, but if you're trying to run a 24/7 server you're simply incompetent if you don't disable it. And people say I have STRONG opinions. Don't hold back Tom, let us know what you really think. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Friday 10 February 2006 00:53, Mark Woodward wrote: Martijn van Oosterhout kleptog@svana.org writes: When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, Yes it does, because the situation will never arise. it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Given the current price of disk, there is no sane reason not to have enough swap space configured to make this not-a-problem. The OOM kill mechanism was a reasonable solution for running systems that were not expected to be too reliable anyway on small hardware, but if you're trying to run a 24/7 server you're simply incompetent if you don't disable it. And people say I have STRONG opinions. Don't hold back Tom, let us know what you really think. Read http://linux-mm.org/OverCommitAccounting or file://usr/src/linux/Documentation/vm/overcommit-accounting It is a good idea to have enough swap space. If not, set vm.overcommit_memory=2 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.0.6 crash
On Feb 9, 2006, at 12:49 PM, Mark Woodward wrote: On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: Mark Woodward [EMAIL PROTECTED] writes: Again, regardless of OS used, hashagg will exceed working memory as defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). Sorry, I must strongly disagree here. The postgresql.conf working mem is a VERY IMPORTANT setting, it is intended to limit the consumption of memory by the postgresql process. Often times PostgreSQL will work along Actually, no, it's not designed for that at all. I guess that's a matter of opinion. side other application servers on the same system, infact, may be a sub-part of application servers on the same system. (This is, in fact, how it is used on one of my site servers.) Clearly, if the server will use 1000 times this number (Set for 1024K, but exceeds 1G) this is broken, and it may cause other systems to fail or perform very poorly. If it is not something that can be fixed, it should be clearly documented. work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. So it says right there that it's very easy to exceed work_mem by a very large amount. Granted, this is a very painful problem to deal with and will hopefully be changed at some point, but it's pretty clear as to how this works. Well, if you read that paragraph carefully, I'll admit that I was a little too literal in my statement apliying it to the process and not specific functions within the process, but in the documentation: each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. According to the documentation the behavior of hashagg is broken. It did not use up to this amount and then start to use temporary files, it used 1000 times this limit and was killed by the OS. I think it should be documented as the behavior is unpredictable. It seems to me that the solution for THIS INCIDENT is to run an analyze. That should fix the problem at hand. I have nothing to say about the OOM issue except that hopefully the analyze will prevent him from running out of memory at all. However if hashagg truly does not obey the limit that is supposed to be imposed by work_mem then it really ought to be documented. Is there a misunderstanding here and it really does obey it? Or is hashagg an exception but the other work_mem associated operations work fine? Or is it possible for them all to go out of bounds? Even if you've got 100 terabyts of swap space though if seems like if your system is very heavy on reads then you would really want that single backend to start using up your disk space and leave your memory alone so that most of your data can stay cached and largely unaffeted by the problem of one backend. If your bottleneck is writing to the disk then it doesn't really seem to matter. You just need to make sure that huge out of control hashagg never occurs. If your disks get saturated with writes because of the hashagg of one backend then all other processes that need to write a lot of info to disk are going to come to a grinding halt and queries are not going to complete quickly and build up and you will have a huge mess on your hands that will essentially prevent postgres from being able to do it's job even if it doesn't actually die. In this situation disk bandwidth is a scarce commodity and whether you let the OS handle it all with virtual memory or you let postgres swap everything out to disc for that one operation you are still using disc to make up for a lack of RAM. At some point you you've either got to stock up on enough RAM to run your queries properly or alter how your queries run to use less RAM. Having a process go out of control in resource usage is going to cause big problems one way or another.
Re: [HACKERS] PostgreSQL 8.0.6 crash
Rick Gigger [EMAIL PROTECTED] writes: However if hashagg truly does not obey the limit that is supposed to be imposed by work_mem then it really ought to be documented. Is there a misunderstanding here and it really does obey it? Or is hashagg an exception but the other work_mem associated operations work fine? Or is it possible for them all to go out of bounds? hashagg is the exception. It should be fixed, not documented, but no one's got round to that. One point to consider is that if the planner's estimate is as far off as exhibited in the OP's example, a hashagg that does spill to disk is likely to take so long that he'll be back here complaining that the query never terminates ;-). In most practical situations, I think exceeding work_mem is really the best solution, as long as it's not by more than 10x or 100x. It's when the estimate is off by many orders of magnitude that you've got a problem. Running out of memory is not necessarily the worst response ... as long as the system doesn't kill the process in response to that. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly