On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 4/14/15 1:28 PM, Pawel Veselov wrote:
>
>>
>> I wonder if what I need to do, considering that I update a lot of "the
>> same" rows as I process this queue, is to create a temp table, update
>> the rows there, and then update the actual tables once at the end...
>>
>
> That's what I'd do.
>

Well, in short, I changed (repeat the body of loop for how many tables are
there)

LOOP (item)
  UPDATE table with item
  IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
  LOOP
    UPDATE xq_table with item;
    exit when found;
    INSERT INTO xq_table select * from table for update;
    continue when found;
    INSERT item INTO xq_table;
    exit;
  END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
  FROM xq_table xq
  WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very fast.
The process is somewhat slow in the beginning as it sucks in records from
"total" into "xq_total", but once all of that is moved into the temp table,
it rushes through the rest.


> The other option would be to use a constraint trigger paired with a
> per-row trigger on the hourly table to drive the daily table, and on the
> daily table to drive the total table. The way that would work is the
> per-row table would simply keep track of all the unique records that were
> changed in a statement (presumably by putting them in a temp table). Once
> the statement is "done", the constraint trigger would fire; it would
> summarize all the changed data and do a much smaller number of updates to
> the table being summarized into.
>

I'm not sure how I would be able to avoid the same number of changes on the
total table, trigger would fire on each update, won't it? So, same problem
with a lot of changes on a table...


> BTW, you also made a comment about not having to hit the table if you look
> at something in an index. You can only do that if all the data you need is
> in the index, AND the page with the record is marked as being all-visible
> (google for Postgres Visibility Map). If that's not the case then you still
> have to pull the row in the table in, in order to determine visibility. The
> only case where you can still avoid hitting the table is something like a
> NOT EXISTS; if you can't find any entries in the index for something then
> they definitely won't be in the table.


What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row, there
shouldn't be a need to re-scan the index again to search for any more
matching values (which would be necessary if the index was not unique).
Again, all considering the size of the index, the amount of different index
values that are being queried, etc.


> But remember that if you update or delete a row, removing it from an
> index, the data will stay in that index until vacuum comes along.
>
> Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full
> rebuilds all the indexes for you.


I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is part
of a test deployment, there is no constant load, the only data that is
being written now is when I do these tests. Vacuuming should prune all that
dead stuff, and if it's absent, it's unclear where is the time spent
navigating/updating the table with 24 rows :)

Reply via email to