Re: [GENERAL] Help with slow table update
On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com wrote: I found some dangling prepared transactions How do you find and remove these?
Re: [GENERAL] Help with slow table update
On 4/19/15 9:53 PM, Tim Uckun wrote: On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com mailto:pawel.vese...@gmail.com wrote: I found some dangling prepared transactions How do you find and remove these? SELECT * FROM pg_prepared_xacts; ROLLBACK PREPARED xid; -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help with slow table update
[skipped] But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along. Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you. I was being desperate :) I still think there is something very wrong with this particular table. First, I have production systems that employ this function on way larger data set, and there is no problem (so far, but still). This machine is part of a test deployment, there is no constant load, the only data that is being written now is when I do these tests. Vacuuming should prune all that dead stuff, and if it's absent, it's unclear where is the time spent navigating/updating the table with 24 rows :) I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made. But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve past current point. What I should've done is 'VACUUM FULL VERBOSE'. Once I did, it told me there were 800k dead rows that can't be removed. After digging around I found some dangling prepared transactions, going back months. Once I threw those away, and re-vacuumed, things got back to normal. Thanks for all your help and advice.
Re: [GENERAL] Help with slow table update
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Pawel Veselov Sent: Tuesday, April 14, 2015 8:01 PM To: Jim Nasby Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Help with slow table update [skipped] This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement. I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right? Also, I'm not sure how LEFT JOIN will help me isolate and insert missed entries... Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update? [skipped] Yes, you can do UPDATE with joins (http://www.postgresql.org/docs/9.4/static/sql-update.html) like this: UPDATE table1 A SET col1 = B.col2 FROM table2 B WHERE A.col3 = B.col4; Regards, Igor Neyman
Re: [GENERAL] Help with slow table update
[skipped] This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement. I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right? Also, I'm not sure how LEFT JOIN will help me isolate and insert missed entries... [skipped] Yes, you can do UPDATE with joins ( http://www.postgresql.org/docs/9.4/static/sql-update.html) like this: UPDATE table1 A SET col1 = B.col2 FROM table2 B WHERE A.col3 = B.col4; I meant using JOIN operator in the update. But it's still possible, though through WITH query.
Re: [GENERAL] Help with slow table update
On Mon, Apr 13, 2015 at 6:03 PM, David G. Johnston david.g.johns...@gmail.com wrote: On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov pawel.vese...@gmail.com wrote: r_agrio_hourly - good, r_agrio_total - bad. Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 2.281 ms Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 106.793 ms What it is you expect to see here? What are the results (count and times) for: SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002; Result: 8 (the whole table is 24 rows). It returns somewhat with a stumble, but relatively quickly. db= explain analyze SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002; QUERY PLAN - Aggregate (cost=4.45..4.46 rows=1 width=0) (actual time=327.194..327.195 rows=1 loops=1) - Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..4.45 rows=1 width=0) (actual time=0.039..327.189 rows=8 loops=1) Index Cond: (tagid = 1002::numeric) Filter: (unitid = 1002::numeric) Total runtime: 327.228 ms SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002; Result is 2869. Returns somewhat quckly. Explain analyze is crazy though: db= explain analyze SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002; QUERY PLAN Aggregate (cost=68134.68..68134.69 rows=1 width=0) (actual time=15177.211..15177.211 rows=1 loops=1) - Index Scan using adunit_r_agrio_hourly on r_agrio_hourly (cost=0.42..67027.10 rows=443035 width=0) (actual time=0.096..15175.730 rows=2869 loops=1) Index Cond: (unitid = 1002::numeric) Filter: (tagid = 1002::numeric) Total runtime: 15177.240 ms More queries along this line might be needed. The underlying question is how many index rows need to be skipped over on total to get the final result - or rather are the columns in the index in descending order of cardinality? Idea is - both tables have unique multi-field indices, and each update hits exactly one row from that index, no more, and all fields from the index are locked with equality condition on the update. All of the updates (within a transaction) would always work on a small subset of rows (max a few hundred, ever; in this case, it's may be around 10). I expect it to be possible for the server to keep the active working set in the cache at all times. Since the index is unique, there shouldn't be a reason to re-scan the table, if a cached row is found, no? Any chance you can perform a REINDEX - maybe there is some bloat present? There are queries to help discern if that may be the case, I do not know then off the top of my head, but just doing it might be acceptable and is definitely quicker if so. That's the thing - I've done both vacuum full, and re-index. The very first time I did vacuum full things improved (60 seconds to 7 seconds). Re-index didn't improve anything (but it was done after vacuum full). I'm still not really following your presentation but maybe my thoughts will spark something. Thank you! I hope I clarified this some :)
Re: [GENERAL] Help with slow table update
On Mon, Apr 13, 2015 at 7:37 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/13/15 7:01 PM, Pawel Veselov wrote: Cursors tend to make things slow. Avoid them if you can. Is there an alternative to iterating over a number of rows, where a direct update query is not an option? I really doubt that either the actual processing logic, including use of types has anything to do with my problem. This is based on the fact that out of the tables that are being changed, only one is exhibiting the problem. All of the involved tables have nearly the same structure, and have the same logical operations performed on them. I thought may be the bad table is slow because it was first in the list, and Postgres was caching the functions results, but I moved things around, and pattern is the same. I'm guessing that you're essentially processing a queue. Take a look at http://www.postgresql.org/message-id/552c750f.2010...@bluetreble.com for some ideas. Basically, not only do cursors have non-trivial overhead, doing a ton of single-row queries is going to have a non-trivial overhead itself. Thank you for the pointers. PgQ sounds interesting, it has to be remote for RDS (I use RDS), but I'll try implementing a solution based on it. However, for all the times that is being spent during this update, the breakdown is: update total table: 10.773033 update hourly table: 00.179711 update daily table: 01.082467 update some other table (actually, it has cardinality similar to total table): 00.168287 clean the queue table: 00.021733 overhead: 00.014922 The overhead is time taken to run the whole procedure, minus all these other times that have been counted. (some notes about the daily table below) As for your specific question, I suggest you modify the plpgsql function so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE actually returns a recordset the same way a SELECT would, with a single column of type text. So you just need to do something with that output. The easiest thing would be to replace this in your function: UPDATE slow_table SET ... to this (untested) RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ... and change the function so it returns SETOF text instead of whatever it returns now. Thank you, that made it a lot easier to see into what's really going on. But the outcome is somewhat the same. The bad table analysis shows a very high cost, and thousands of rows, where the table contains only 24 rows. This time, however, the actual run time is shown, and one can see where the time is spent (I was using just a sum of clock_time()s around the update statements to see where the problem is). r_agrio_hourly - good, r_agrio_total - bad. Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 2.281 ms Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 106.793 ms Keep in mind that the estimated cost is not terribly useful; it's the actual times that matter. I suspect what's happening here is a combination of things. First, the hourly table is basically living in cache, but the total table is not. That means that when you go to find a row in the total table you're actually hitting the disk instead of pulling the data from memory. Second, you may have a lot of dead rows in the total table. I suspect this because of the very large amount of time the index scan is taking. Unless you're running on an old 10MB MFM drive you'd be pretty hard pressed for even 2 IO operations (one for the index leaf page and one for the heap page) to take 32ms. I suspect the index scan is having to read many dead rows in before it finds a live one, and incurring multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm that. That looks most likely to me as well. Most of the updates in a single batch, for the total table would be on the same record, while for hourly table it's a lot less. Logically, the tables contain identical data, except that hourly table breaks it down per hour, and total table contains the data for all times. The daily table contains the same data per day. So, if I compared the tables, the total table has the
Re: [GENERAL] Help with slow table update
On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/14/15 4:44 PM, Pawel Veselov wrote: On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com wrote: On 4/14/15 1:28 PM, Pawel Veselov wrote: I wonder if what I need to do, considering that I update a lot of the same rows as I process this queue, is to create a temp table, update the rows there, and then update the actual tables once at the end... That's what I'd do. Well, in short, I changed (repeat the body of loop for how many tables are there) LOOP (item) UPDATE table with item IF not found INSERT item INTO table; END IF; END LOOP; to: CREATE TEMP TABLE xq_table (like table) on commit drop; LOOP (item) LOOP UPDATE xq_table with item; exit when found; INSERT INTO xq_table select * from table for update; continue when found; INSERT item INTO xq_table; exit; END LOOP; END LOOP; UPDATE table a set (rows) = (xq.rows) FROM xq_table xq WHERE (a.keys) = (xq.keys) That works significantly faster. The final update statement is very fast. The process is somewhat slow in the beginning as it sucks in records from total into xq_total, but once all of that is moved into the temp table, it rushes through the rest. Databases like to think in sets. It will generally be more efficient to do set operations instead of a bunch of row-by-row stuff. Since you're pulling all of this from some other table your best bet is probably something like: CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *; CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY; UPDATE ar_hourly SET ... FROM hourly_v JOIN ...; INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...; -- Same thing for daily -- Same thing for total In my previous post, there was a problem with that pseudo-code, as it's missing inserts into the final table at the end of loop, for those records that need to be inserted and not updated. This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement. I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right? Also, I'm not sure how LEFT JOIN will help me isolate and insert missed entries... Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update? [skipped] But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along. Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you. I was being desperate :) I still think there is something very wrong with this particular table. First, I have production systems that employ this function on way larger data set, and there is no problem (so far, but still). This machine is part of a test deployment, there is no constant load, the only data that is being written now is when I do these tests. Vacuuming should prune all that dead stuff, and if it's absent, it's unclear where is the time spent navigating/updating the table with 24 rows :) I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made. But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve past current point.
Re: [GENERAL] Help with slow table update
On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/14/15 1:28 PM, Pawel Veselov wrote: I wonder if what I need to do, considering that I update a lot of the same rows as I process this queue, is to create a temp table, update the rows there, and then update the actual tables once at the end... That's what I'd do. Well, in short, I changed (repeat the body of loop for how many tables are there) LOOP (item) UPDATE table with item IF not found INSERT item INTO table; END IF; END LOOP; to: CREATE TEMP TABLE xq_table (like table) on commit drop; LOOP (item) LOOP UPDATE xq_table with item; exit when found; INSERT INTO xq_table select * from table for update; continue when found; INSERT item INTO xq_table; exit; END LOOP; END LOOP; UPDATE table a set (rows) = (xq.rows) FROM xq_table xq WHERE (a.keys) = (xq.keys) That works significantly faster. The final update statement is very fast. The process is somewhat slow in the beginning as it sucks in records from total into xq_total, but once all of that is moved into the temp table, it rushes through the rest. The other option would be to use a constraint trigger paired with a per-row trigger on the hourly table to drive the daily table, and on the daily table to drive the total table. The way that would work is the per-row table would simply keep track of all the unique records that were changed in a statement (presumably by putting them in a temp table). Once the statement is done, the constraint trigger would fire; it would summarize all the changed data and do a much smaller number of updates to the table being summarized into. I'm not sure how I would be able to avoid the same number of changes on the total table, trigger would fire on each update, won't it? So, same problem with a lot of changes on a table... BTW, you also made a comment about not having to hit the table if you look at something in an index. You can only do that if all the data you need is in the index, AND the page with the record is marked as being all-visible (google for Postgres Visibility Map). If that's not the case then you still have to pull the row in the table in, in order to determine visibility. The only case where you can still avoid hitting the table is something like a NOT EXISTS; if you can't find any entries in the index for something then they definitely won't be in the table. What I was saying is that if a table has a unique index, and there is cached fact that a particular index value points to a particular row, there shouldn't be a need to re-scan the index again to search for any more matching values (which would be necessary if the index was not unique). Again, all considering the size of the index, the amount of different index values that are being queried, etc. But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along. Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you. I was being desperate :) I still think there is something very wrong with this particular table. First, I have production systems that employ this function on way larger data set, and there is no problem (so far, but still). This machine is part of a test deployment, there is no constant load, the only data that is being written now is when I do these tests. Vacuuming should prune all that dead stuff, and if it's absent, it's unclear where is the time spent navigating/updating the table with 24 rows :)
Re: [GENERAL] Help with slow table update
On 4/14/15 1:28 PM, Pawel Veselov wrote: I wonder if what I need to do, considering that I update a lot of the same rows as I process this queue, is to create a temp table, update the rows there, and then update the actual tables once at the end... That's what I'd do. The other option would be to use a constraint trigger paired with a per-row trigger on the hourly table to drive the daily table, and on the daily table to drive the total table. The way that would work is the per-row table would simply keep track of all the unique records that were changed in a statement (presumably by putting them in a temp table). Once the statement is done, the constraint trigger would fire; it would summarize all the changed data and do a much smaller number of updates to the table being summarized into. BTW, you also made a comment about not having to hit the table if you look at something in an index. You can only do that if all the data you need is in the index, AND the page with the record is marked as being all-visible (google for Postgres Visibility Map). If that's not the case then you still have to pull the row in the table in, in order to determine visibility. The only case where you can still avoid hitting the table is something like a NOT EXISTS; if you can't find any entries in the index for something then they definitely won't be in the table. But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along. Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help with slow table update
On 4/14/15 4:44 PM, Pawel Veselov wrote: On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com wrote: On 4/14/15 1:28 PM, Pawel Veselov wrote: I wonder if what I need to do, considering that I update a lot of the same rows as I process this queue, is to create a temp table, update the rows there, and then update the actual tables once at the end... That's what I'd do. Well, in short, I changed (repeat the body of loop for how many tables are there) LOOP (item) UPDATE table with item IF not found INSERT item INTO table; END IF; END LOOP; to: CREATE TEMP TABLE xq_table (like table) on commit drop; LOOP (item) LOOP UPDATE xq_table with item; exit when found; INSERT INTO xq_table select * from table for update; continue when found; INSERT item INTO xq_table; exit; END LOOP; END LOOP; UPDATE table a set (rows) = (xq.rows) FROM xq_table xq WHERE (a.keys) = (xq.keys) That works significantly faster. The final update statement is very fast. The process is somewhat slow in the beginning as it sucks in records from total into xq_total, but once all of that is moved into the temp table, it rushes through the rest. Databases like to think in sets. It will generally be more efficient to do set operations instead of a bunch of row-by-row stuff. Since you're pulling all of this from some other table your best bet is probably something like: CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *; CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY; UPDATE ar_hourly SET ... FROM hourly_v JOIN ...; INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...; -- Same thing for daily -- Same thing for total The other option would be to use a constraint trigger paired with a per-row trigger on the hourly table to drive the daily table, and on the daily table to drive the total table. The way that would work is the per-row table would simply keep track of all the unique records that were changed in a statement (presumably by putting them in a temp table). Once the statement is done, the constraint trigger would fire; it would summarize all the changed data and do a much smaller number of updates to the table being summarized into. I'm not sure how I would be able to avoid the same number of changes on the total table, trigger would fire on each update, won't it? So, same problem with a lot of changes on a table... The difference is that you'd be doing plain INSERTs into a temp table and then summarizing that. That's going to be a LOT more efficient than a slew of updates on an existing table. BTW, you also made a comment about not having to hit the table if you look at something in an index. You can only do that if all the data you need is in the index, AND the page with the record is marked as being all-visible (google for Postgres Visibility Map). If that's not the case then you still have to pull the row in the table in, in order to determine visibility. The only case where you can still avoid hitting the table is something like a NOT EXISTS; if you can't find any entries in the index for something then they definitely won't be in the table. What I was saying is that if a table has a unique index, and there is cached fact that a particular index value points to a particular row, there shouldn't be a need to re-scan the index again to search for any more matching values (which would be necessary if the index was not unique). Again, all considering the size of the index, the amount of different index values that are being queried, etc. It still has to rescan because of visibility concerns. But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along. Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you. I was being desperate :) I still think there is something very wrong with this particular table. First, I have production systems that employ this function on way larger data set, and there is no problem (so far, but still). This machine is part of a test deployment, there is no constant load, the only data that is being written now is when I do these tests. Vacuuming should prune all that dead stuff, and if it's absent, it's unclear where is the time spent navigating/updating the table with 24 rows :) I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help with slow table update
On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov pawel.vese...@gmail.com wrote: r_agrio_hourly - good, r_agrio_total - bad. Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 2.281 ms Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 106.793 ms What it is you expect to see here? What are the results (count and times) for: SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002; SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002; More queries along this line might be needed. The underlying question is how many index rows need to be skipped over on total to get the final result - or rather are the columns in the index in descending order of cardinality? Any chance you can perform a REINDEX - maybe there is some bloat present? There are queries to help discern if that may be the case, I do not know then off the top of my head, but just doing it might be acceptable and is definitely quicker if so. I'm still not really following your presentation but maybe my thoughts will spark something. David J.
Re: [GENERAL] Help with slow table update
On Sun, Apr 12, 2015 at 5:40 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/9/15 6:18 PM, Pawel Veselov wrote: Hi. I have a plpgsql procedure that updates a few similar tables. for some reason, updates on one of the tables take a lot longer the updates on the other ones. The difference is, say, 7 seconds vs. 80 milliseconds. the procedure uses cursors and record variables to do the updates. For example: update r_agrio_total set unserved = unserved + (agrow-'unserved')::numeric(38), r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn), where tagid = _tagid and unitid = (akey-'unitid')::numeric and placement = (akey-'placement')::numeric and device_type = (akey-'device_type')::numeric; There is another table (xq_agr) that is read record by record, and for each of those records, such update is executed. I was trying to select analyze the updates to see where the time could be spent. There are only 24 row in the bad table, and 3,400 rows in good table. So, for the bad table, most of the updates will be on the same rows. The times were measured on processing 100 original records. When I'm analyzing pure update statements, I don't see anything strange. bad table: explain analyze update r_agrio_total set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100; RESULT: Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.253..0.253 rows=0 loops=1) - Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1 loops=1) Index Cond: (tagid = 1000::numeric) Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric) AND (device_type = 100::numeric)) Rows Removed by Filter: 7 Total runtime: 0.282 ms good table: explain analyze update r_agrio_hourly set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100 and rowdate = '2015-02-23T13'; RESULT: Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.102..0.102 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1 loops=1) Index Cond: ((tagid = 1000::numeric) AND (unitid = 1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND (device_type = 100::numeric) AND (placement = 0::numeric)) Total runtime: 0.135 ms When I try doing it with WITH statement (really, to apply the actual data that the plpgsql function uses), there is something strange in the bad table. explain analyze with SRC as (select * from xq_agr where id = 914830) update r_agrio_total set unconfirmed = unconfirmed + (SRC.r_agrio-'unconfirmed')::numeric(38) from SRC where tagid = (SRC.r_agrio-'key'-'tagid')::numeric and unitid = (SRC.r_agrio-'key'-'unit')::numeric and placement = (SRC.r_agrio-'key'-'placement')::numeric and device_type = (SRC.r_agrio-'key'-'device_type')::numeric; RESULT: Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1) CTE src - Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1) Index Cond: (id = 914830) - Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1) - CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual time=0.032..0.033 rows=1 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: ((tagid = (((src.r_agrio - 'key'::text) - 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio - 'key'::text) - 'unit'::text))::numeric) AND (device_type = (((src.r_agrio - 'key'::text) - 'device_type'::text))::numeric) AND (placement = (((src.r_agrio - 'key'::text) - 'placement'::text))::numeric)) Total runtime: 0.155 ms explain analyze with SRC as (select * from xq_agr where id = 914830) update r_agrio_hourly set unconfirmed = unconfirmed + (SRC.r_agrio-'unconfirmed')::numeric(38) from SRC where tagid = (SRC.r_agrio-'key'-'tagid')::numeric and unitid = (SRC.r_agrio-'key'-'unit')::numeric and placement = (SRC.r_agrio-'key'-'placement')::numeric and device_type = (SRC.r_agrio-'key'-'device_type')::numeric and rowdate = (SRC.r_agrio-'key'-'rowdate'); RESULT: Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual time=0.123..0.123 rows=0 loops=1) CTE src - Index Scan using xq_agr_pkey on xq_agr
Re: [GENERAL] Help with slow table update
On 4/13/15 7:01 PM, Pawel Veselov wrote: Cursors tend to make things slow. Avoid them if you can. Is there an alternative to iterating over a number of rows, where a direct update query is not an option? I really doubt that either the actual processing logic, including use of types has anything to do with my problem. This is based on the fact that out of the tables that are being changed, only one is exhibiting the problem. All of the involved tables have nearly the same structure, and have the same logical operations performed on them. I thought may be the bad table is slow because it was first in the list, and Postgres was caching the functions results, but I moved things around, and pattern is the same. I'm guessing that you're essentially processing a queue. Take a look at http://www.postgresql.org/message-id/552c750f.2010...@bluetreble.com for some ideas. Basically, not only do cursors have non-trivial overhead, doing a ton of single-row queries is going to have a non-trivial overhead itself. As for your specific question, I suggest you modify the plpgsql function so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE actually returns a recordset the same way a SELECT would, with a single column of type text. So you just need to do something with that output. The easiest thing would be to replace this in your function: UPDATE slow_table SET ... to this (untested) RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ... and change the function so it returns SETOF text instead of whatever it returns now. Thank you, that made it a lot easier to see into what's really going on. But the outcome is somewhat the same. The bad table analysis shows a very high cost, and thousands of rows, where the table contains only 24 rows. This time, however, the actual run time is shown, and one can see where the time is spent (I was using just a sum of clock_time()s around the update statements to see where the problem is). r_agrio_hourly - good, r_agrio_total - bad. Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 2.281 ms Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1) Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric)) Total runtime: 106.793 ms Keep in mind that the estimated cost is not terribly useful; it's the actual times that matter. I suspect what's happening here is a combination of things. First, the hourly table is basically living in cache, but the total table is not. That means that when you go to find a row in the total table you're actually hitting the disk instead of pulling the data from memory. Second, you may have a lot of dead rows in the total table. I suspect this because of the very large amount of time the index scan is taking. Unless you're running on an old 10MB MFM drive you'd be pretty hard pressed for even 2 IO operations (one for the index leaf page and one for the heap page) to take 32ms. I suspect the index scan is having to read many dead rows in before it finds a live one, and incurring multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm that. Third, I think something odd is happening with the update itself. I'm pretty sure that the index scan itself is visiting the heap pages, so each page should be in shared buffers by the time each tuple hits the update node. That makes me wonder what on earth is taking 60ms to update the tuple. I suspect it's going into either finding a free buffer to put the new tuple on, or waiting to try and extend the relation. Selecting ctid from the freshly updated rows and comparing the first number to the total number of pages in the heap would show if the new tuples are all ending up at the end of the heap. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help with slow table update
On 4/9/15 6:18 PM, Pawel Veselov wrote: Hi. I have a plpgsql procedure that updates a few similar tables. for some reason, updates on one of the tables take a lot longer the updates on the other ones. The difference is, say, 7 seconds vs. 80 milliseconds. the procedure uses cursors and record variables to do the updates. For example: update r_agrio_total set unserved = unserved + (agrow-'unserved')::numeric(38), r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn), where tagid = _tagid and unitid = (akey-'unitid')::numeric and placement = (akey-'placement')::numeric and device_type = (akey-'device_type')::numeric; There is another table (xq_agr) that is read record by record, and for each of those records, such update is executed. I was trying to select analyze the updates to see where the time could be spent. There are only 24 row in the bad table, and 3,400 rows in good table. So, for the bad table, most of the updates will be on the same rows. The times were measured on processing 100 original records. When I'm analyzing pure update statements, I don't see anything strange. bad table: explain analyze update r_agrio_total set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100; RESULT: Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.253..0.253 rows=0 loops=1) - Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1 loops=1) Index Cond: (tagid = 1000::numeric) Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric) AND (device_type = 100::numeric)) Rows Removed by Filter: 7 Total runtime: 0.282 ms good table: explain analyze update r_agrio_hourly set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100 and rowdate = '2015-02-23T13'; RESULT: Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.102..0.102 rows=0 loops=1) - Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1 loops=1) Index Cond: ((tagid = 1000::numeric) AND (unitid = 1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND (device_type = 100::numeric) AND (placement = 0::numeric)) Total runtime: 0.135 ms When I try doing it with WITH statement (really, to apply the actual data that the plpgsql function uses), there is something strange in the bad table. explain analyze with SRC as (select * from xq_agr where id = 914830) update r_agrio_total set unconfirmed = unconfirmed + (SRC.r_agrio-'unconfirmed')::numeric(38) from SRC where tagid = (SRC.r_agrio-'key'-'tagid')::numeric and unitid = (SRC.r_agrio-'key'-'unit')::numeric and placement = (SRC.r_agrio-'key'-'placement')::numeric and device_type = (SRC.r_agrio-'key'-'device_type')::numeric; RESULT: Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1) CTE src - Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1) Index Cond: (id = 914830) - Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1) - CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual time=0.032..0.033 rows=1 loops=1) - Index Scan using u_r_agrio_total on r_agrio_total (*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: ((tagid = (((src.r_agrio - 'key'::text) - 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio - 'key'::text) - 'unit'::text))::numeric) AND (device_type = (((src.r_agrio - 'key'::text) - 'device_type'::text))::numeric) AND (placement = (((src.r_agrio - 'key'::text) - 'placement'::text))::numeric)) Total runtime: 0.155 ms explain analyze with SRC as (select * from xq_agr where id = 914830) update r_agrio_hourly set unconfirmed = unconfirmed + (SRC.r_agrio-'unconfirmed')::numeric(38) from SRC where tagid = (SRC.r_agrio-'key'-'tagid')::numeric and unitid = (SRC.r_agrio-'key'-'unit')::numeric and placement = (SRC.r_agrio-'key'-'placement')::numeric and device_type = (SRC.r_agrio-'key'-'device_type')::numeric and rowdate = (SRC.r_agrio-'key'-'rowdate'); RESULT: Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual time=0.123..0.123 rows=0 loops=1) CTE src - Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1) Index Cond: (id = 914830) - Nested Loop (cost=0.47..44.47 rows=20 width=417)