<< pgbouncer is more for making connections line up single-file when the
line is moving at a very fast clip, say 0.01 second per turn.  If I were
trying to make tasks that can each last for hours or days line up and take
turns, I don't think pgbouncer would be the way to go.
>>

 

The recommendation at the time was assuming that write contention was
slowing things down and consuming resources, since I can't stop people from
creating big multi-threaded imports. Each import consists of about 50 writes


 

>> Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?



Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular  item (an "item" being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated -
all in the order they occurred.

 

You can imagine how useful this in creating a document describing what
happened and why.

 

The same data generated by the report is used to "resurrect" an item. If -
for example - our business logic has changed, but the change only affects a
small sub-set of our core data, then we perform a "rollback" (a logical
cascading delete) on the affected items. Then we create a "rebuild" which is
a script that is generated to re-import ONLY the import table rows defined
in the audit report.

 

So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.

Reply via email to