Re: [GENERAL] Searching for Duplicates and Hosed the System
On Aug 21, 2007, at 12:04 AM, Tom Lane wrote: If you need to deal with very large result sets, the standard advice is to use a cursor so you can pull a few hundred or thousand rows at a time via FETCH. In case it's not obvious... in this case you might want to dump the output of that query into another table; perhaps a temp table... CREATE TEMP TABLE dupe_check AS SELECT ... -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> writes: > I ran VACUUM ANALYZE just before I launched this and there were no other > postgress jobs running. I'm the only user as well. I also ran EXPLAIN > prior to the run and got this: > Nested Loop (cost=11.71..28800.34 rows=7219 width=584) >-> Seq Scan on ers_regions e (cost=0.00..71.90 rows=16 width=28) > Filter: (region = 1) >-> Bitmap Heap Scan on compliance_2006 a (cost=11.71..1788.76 > rows=451 width=584) > Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND > (a.fips_cnty_cd = "outer".fips_cou)) > -> Bitmap Index Scan on key_tract (cost=0.00..11.71 rows=451 > width=0) >Index Cond: ((a.fips_st_cd = "outer".fips_st) AND > (a.fips_cnty_cd = "outer".fips_cou)) > (7 rows) Do those estimated row counts look sane to you --- in particular the estimate of 7219 rows out of the whole query? AFAICS a plan of this shape simply cannot consume a huge amount of memory on the server, no matter how badly off the rowcount estimates are. However, if it sends enough rows to the client, the *client* side could be having a problem absorbing the data. libpq is designed to try to absorb the full result rowset --- the API it provides is not amenable to partial result sets. > I looked in the log and saw this: > LOG: transaction ID wrap limit is 1073746500, limited by database > "postgres" > LOG: transaction ID wrap limit is 1073746500, limited by database > "postgres" > LOG: could not send data to client: Broken pipe Those first two messages are unrelated --- they are just routine autovacuum output. The third one says that a client process crashed. So now I'm thinking that the memory-overrun problem was on the client side. If you need to deal with very large result sets, the standard advice is to use a cursor so you can pull a few hundred or thousand rows at a time via FETCH. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Searching for Duplicates and Hosed the System
Something is really screwy here. I tried what looked like a simpler task than I tried when I started this message thread. The only unusual issue here is that the table compliance_2006 is rather big (over 18 million records). The table ers_regions is pretty small (about 3100 records) and all the WHERE fields are indexed (except e.region). Here's the not-too-complicated SQL: SELECT a.* FROM compliance_2006 a, ers_regions e WHERE a.fips_st_cd=e.fips_st AND a.fips_cnty_cd=e.fips_cou AND e.region=1; I ran VACUUM ANALYZE just before I launched this and there were no other postgress jobs running. I'm the only user as well. I also ran EXPLAIN prior to the run and got this: Nested Loop (cost=11.71..28800.34 rows=7219 width=584) -> Seq Scan on ers_regions e (cost=0.00..71.90 rows=16 width=28) Filter: (region = 1) -> Bitmap Heap Scan on compliance_2006 a (cost=11.71..1788.76 rows=451 width=584) Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND (a.fips_cnty_cd = "outer".fips_cou)) -> Bitmap Index Scan on key_tract (cost=0.00..11.71 rows=451 width=0) Index Cond: ((a.fips_st_cd = "outer".fips_st) AND (a.fips_cnty_cd = "outer".fips_cou)) (7 rows) So I launched it to see what would happen. This resulted in a very busy disk drive for about an hour, and one by one things started dying on my FC6 Linux machine. First the mouse disappeared, then the cursor in the terminal window where I was running psql vanished, the clock stopped, and the keyboard stopped responding. Meanwhile, the disk drive thrashed on. Finally the screen saver kicked in and shortly thereafter I logged back in only to see the word "Killed" sitting there on the last line and all my machine's services were running again. Just no data output. I looked in the log and saw this: LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" LOG: could not send data to client: Broken pipe I guess this is what killed it, but does it make sense that such a simple request wrapped around the transaction limit? Is the bad behavior of my machine while running this query consistent with an error like this or is the machine broken and that's what caused Postgresql to crater? What should I set the wrap limit to? What suggestions would you make for tracking down the exact problem and fixing it? Any help would be appreciated - Bill Thoen Tom Lane wrote: Bill Thoen <[EMAIL PROTECTED]> writes: Tom, here's the "explain" results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr -> Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) Hmm ... no, actually, that shows the planner doing the right thing for lotsa groups: picking GroupAggregate instead of HashAggregate. The estimated number of groups is 1804832, which might or might not have much to do with reality but in any case seems enough to keep it away from HashAggregate. Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE or VACUUM ANALYZE? The only theory I can think of at this point is that your database statistics are more correct now than they were when you had the problem. If you try the query again, does it behave more sanely? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> writes: > Tom, here's the "explain" results: Does this help explain what went wrong? > (And yes, I think there will be a *lot* of groups.) > explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, > tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from > compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, > field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; > QUERY PLAN > > > Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) >Sort Key: count(*) >-> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 > width=160) > -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) >Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, > field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr >-> Seq Scan on compliance_2006 (cost=0.00..1039927.18 > rows=18048318 width=160) > (6 rows) Hmm ... no, actually, that shows the planner doing the right thing for lotsa groups: picking GroupAggregate instead of HashAggregate. The estimated number of groups is 1804832, which might or might not have much to do with reality but in any case seems enough to keep it away from HashAggregate. Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE or VACUUM ANALYZE? The only theory I can think of at this point is that your database statistics are more correct now than they were when you had the problem. If you try the query again, does it behave more sanely? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Searching for Duplicates and Hosed the System
Tom, here's the "explain" results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr -> Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote: > Bill Thoen <[EMAIL PROTECTED]> writes: > > I knew this would take some time, but what I didn't expect was that about > > an hour into the select, my mouse and keyboard locked up and also I > > couldn't log in from another computer via SSH. This is a Linux machine > > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > > the disc too. > > > I finally had to shut the power off and reboot to regain control of my > > computer (that wasn't good idea, either, but eventually I got everything > > working again.) > > I've seen Fedora go nuts like that when it ran out of memory. Once it > starts to swap heavily, performance goes into the tank; and once the > kernel realizes it's in memory trouble, it starts to kill processes > more or less at random. That might explain why ssh stopped working. > > One thing to do to make it more robust is to disable memory overcommit. > I suspect also that configuring it with lots of swap space is > counterproductive, because that just encourages the kernel to allow lots > of swapping. I haven't actually experimented with that part though. > > As for why PG ran the system out of memory, I suspect that the planner > drastically underestimated the number of groups to be created by your > GROUP BY, and thought it could get away with a hash aggregation. We > don't currently have any provision for spilling hash aggregation to > disk, so if there's a very large number of groups the table just gets > very big :-(. The planner is not supposed to choose hash agg if the > estimated table size exceeds work_mem ... but if it had out-of-date > statistics to work with it might have gotten the wrong answer. Have > you ANALYZEd this table recently? What does EXPLAIN show as the > estimated number of result rows? > > regards, tom lane > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> writes: > I knew this would take some time, but what I didn't expect was that about > an hour into the select, my mouse and keyboard locked up and also I > couldn't log in from another computer via SSH. This is a Linux machine > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > the disc too. > I finally had to shut the power off and reboot to regain control of my > computer (that wasn't good idea, either, but eventually I got everything > working again.) I've seen Fedora go nuts like that when it ran out of memory. Once it starts to swap heavily, performance goes into the tank; and once the kernel realizes it's in memory trouble, it starts to kill processes more or less at random. That might explain why ssh stopped working. One thing to do to make it more robust is to disable memory overcommit. I suspect also that configuring it with lots of swap space is counterproductive, because that just encourages the kernel to allow lots of swapping. I haven't actually experimented with that part though. As for why PG ran the system out of memory, I suspect that the planner drastically underestimated the number of groups to be created by your GROUP BY, and thought it could get away with a hash aggregation. We don't currently have any provision for spilling hash aggregation to disk, so if there's a very large number of groups the table just gets very big :-(. The planner is not supposed to choose hash agg if the estimated table size exceeds work_mem ... but if it had out-of-date statistics to work with it might have gotten the wrong answer. Have you ANALYZEd this table recently? What does EXPLAIN show as the estimated number of result rows? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen <[EMAIL PROTECTED]> wrote: > > I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have > a database with a little more than 18 million records that takes up about > 3GB. I need to check to see if there are duplicate records, so I tried a > command like this: > > SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable > GROUP BY fld1, fld2, fld3, fld4 > ORDER BY 1 DESC; > > I knew this would take some time, but what I didn't expect was that about > an hour into the select, my mouse and keyboard locked up and also I > couldn't log in from another computer via SSH. This is a Linux machine > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > the disc too. > > I finally had to shut the power off and reboot to regain control of my > computer (that wasn't good idea, either, but eventually I got everything > working again.) > > Is this normal behavior by PG with large databases? No. Something is wrong. > Did I misconfigure > something? Does anyone know what might be wrong? Possibly, but I would be more inclined to guess that your hardware is faulty and you encountered a RAM error, or the CPU overheated or something along those lines. I'm not familiar with Linux systems hard-locking like that unless there is a hardware issue. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Searching for Duplicates and Hosed the System
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have a database with a little more than 18 million records that takes up about 3GB. I need to check to see if there are duplicate records, so I tried a command like this: SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable GROUP BY fld1, fld2, fld3, fld4 ORDER BY 1 DESC; I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) Is this normal behavior by PG with large databases? Did I misconfigure something? Does anyone know what might be wrong? - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings