On 18.12.2018 1:28, Tomas Vondra wrote:
4) There was a problem with marking top-level transaction as having catalog changes if one of its subtransactions has. It was causing a problem with DDL statements just after subtransaction start (savepoint), so data from new columns is not replicated.5) Similar issue with schema send. You send schema only once per each sub/transaction (IIRC), while we have to update schema on each catalog change: invalidation execution, snapshot rebuild, adding new tuple cids. So I ended up with adding is_schema_send flag to ReorderBufferTXN, since it is easy to set it inside RB and read in the output plugin. Probably, we have to choose a better place for this flag.Hmm. Can you share an example how to trigger these issues?
Test cases inside 014_stream_tough_ddl.pl and old ones (with streaming=true option added) should reproduce all these issues. In general, it happens in a txn like:
INSERT SAVEPOINT ALTER TABLE ... ADD COLUMN INSERT then the second insert may discover old version of catalog.
Interesting. Any idea where does the extra overhead in this particular case come from? It's hard to deduce that from the single flame graph, when I don't have anything to compare it with (i.e. the flame graph for the "normal" case).
I guess that bottleneck is in disk operations. You can check logical_repl_worker_new_perf.svg flame graph: disk reads (~9%) and writes (~26%) take around 35% of CPU time in summary. To compare, please, see attached flame graph for the following transaction:
INSERT INTO large_text SELECT (SELECT string_agg('x', ',') FROM generate_series(1, 2000)) FROM generate_series(1, 1000000); Execution Time: 44519.816 ms Time: 98333,642 ms (01:38,334)where disk IO is only ~7-8% in total. So we get very roughly the same ~x4-5 performance drop here. JFYI, I am using a machine with SSD for tests.
Therefore, probably you may write changes on receiver in bigger chunks, not each change separately.
So I'm not particularly worried, but I'll look into that. I'd be much more worried if there was measurable overhead in cases when there's no streaming happening (either because it's disabled or the memory limit was not hit).
What I have also just found, is that if a table row is large enough to be TOASTed, e.g.:
INSERT INTO large_text SELECT (SELECT string_agg('x', ',') FROM generate_series(1, 1000000)) FROM generate_series(1, 1000);then logical_work_mem limit is not hit and we neither stream, nor spill to disk this transaction, while it is still large. In contrast, the transaction above (with 1000000 smaller rows) being comparable in size is streamed. Not sure, that it is easy to add proper accounting of TOAST-able columns, but it worth it.
-- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
<<attachment: logical_repl_worker_text_stream_perf.zip>>