Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
Mark Wong escribió: Hrm, tracking just the launcher process certainly doesn't help. Are the spawned processed short lived? I take a snapshot of /proc/pid/io data every 60 seconds. The worker processes can be short-lived, but if they are, obviously they are not vacuuming the large tables. If you want to track all autovacuum actions, change autovacuum_log_min_messages to 0. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to unique-ify HUGE table?
Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; Initially I tried CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; but after waiting for nearly an hour I aborted the query, and repeated it after getting rid of the DISTINCT clause. Everything takes forever with this monster! It's uncanny. Even printing it out to a file takes forever, let alone creating an index for it. Any words of wisdom on how to speed this up would be appreciated. TIA! Kynn
Re: [PERFORM] How to unique-ify HUGE table?
On Tue, Dec 23, 2008 at 10:25 AM, Kynn Jones kyn...@gmail.com wrote: Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; Initially I tried CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; but after waiting for nearly an hour I aborted the query, and repeated it after getting rid of the DISTINCT clause. Everything takes forever with this monster! It's uncanny. Even printing it out to a file takes forever, let alone creating an index for it. Any words of wisdom on how to speed this up would be appreciated. Did you try cranking up work_mem to something that's a large percentage (25 to 50%) of total memory? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to unique-ify HUGE table?
On Tue, 23 Dec 2008 12:25:48 -0500 Kynn Jones kyn...@gmail.com wrote: Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; Initially I tried CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; but after waiting for nearly an hour I aborted the query, and repeated it Do you have an index on x and y? Also, does this work better? CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y; What does ANALYZE EXPLAIN have to say? -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to unique-ify HUGE table?
You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was faster (admittedly this happened with more complex queries). So, try this: CREATE TEMP TABLE huge_table AS SELECT x, y FROM foo GROUP BY 1, 2; Note that you may be tempted to add an index on foo(x,y), but I don't think that helps (or at least I have not been able to hit the index in similar situations). -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- ow...@postgresql.org] On Behalf Of Kynn Jones Sent: Tuesday, December 23, 2008 9:26 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] How to unique-ify HUGE table? Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; Initially I tried CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; but after waiting for nearly an hour I aborted the query, and repeated it after getting rid of the DISTINCT clause. Everything takes forever with this monster! It's uncanny. Even printing it out to a file takes forever, let alone creating an index for it. Any words of wisdom on how to speed this up would be appreciated. TIA! Kynn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to unique-ify HUGE table?
On Tue, Dec 23, 2008 at 11:14 AM, George Pavlov gpav...@mynewplace.com wrote: You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was faster (admittedly this happened with more complex queries). So, try this: Even in 8.3 it looks like group by is faster. Tested it on a decent sized table and group by used a hash agg and ran in ~600 ms, while distinct used a sort and ran in 1300 ms. That was on 500k rows. On a much larger table, one with about 10M rows, a similar statement runs in 1500 ms with group by and in 2390 ms when run with distinct. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to unique-ify HUGE table?
Scott Marlowe wrote: On Tue, Dec 23, 2008 at 11:14 AM, George Pavlov gpav...@mynewplace.com wrote: You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was faster (admittedly this happened with more complex queries). So, try this: Even in 8.3 it looks like group by is faster. Tested it on a decent sized table and group by used a hash agg and ran in ~600 ms, while distinct used a sort and ran in 1300 ms. That was on 500k rows. On a much larger table, one with about 10M rows, a similar statement runs in 1500 ms with group by and in 2390 ms when run with distinct. Not surprising - this is a known limitation in all released versions of postgresql (GROUP BY can use hashing and sorting - DISTINCT only sorting). 8.4 is going to improve that though. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance