Hi, In <cad21aobkde4jwjpgclxsewqu3nn4vxjkys9vprqdwa2gwnq...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>> I was just looking at bit at this series of patch labelled with v31, >> to see what is happening here. >> >> In 0001, we have that: >> >> + /* format-specific routines */ >> + const CopyToRoutine *routine; >> [...] >> - CopySendEndOfRow(cstate); >> + cstate->routine->CopyToOneRow(cstate, slot); >> >> Having a callback where the copy state is processed once per row is >> neat in terms of design for the callbacks and what extensions can do, >> and this is much better than what 2889fd23be5 has attempted (later >> reverted in 1aa8324b81fa) because we don't do indirect function calls >> for each attribute. Still, I have a question here: what happens for a >> COPY TO that involves one attribute, a short field size like an int2 >> and many rows (the more rows the more pronounced the effect, of >> course)? Could this level of indirection still be the cause of some >> regressions in a case like that? This is the worst case I can think >> about, on top of my mind, and I am not seeing tests with few >> attributes like this one, where we would try to make this callback as >> hot as possible. This is a performance-sensitive area. > > FYI when Sutou-san last measured the performance[1], it showed a > slight speed up even with fewer columns (5 columns) in both COPY TO > and COPY FROM cases. The callback design has not changed since then. > But it would be a good idea to run the benchmark with a table having a > single small size column. > > [1] > https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com I measured v31 patch set with 1,6,11,16,21,26,31 int2 columns. See the attached PDF for 0001 and 0002 result. 0001 - to: It's faster than master when the number of rows are 1,000,000-5,000,000. It's almost same as master when the number of rows are 6,000,000-10,000,000. There is no significant slow down when the number of columns is 1. 0001 - from: 0001 doesn't change COPY FROM code. So the differences are not real difference. 0002 - to: 0002 doesn't change COPY TO code. So "0001 - to" and "0002 - to" must be the same result. But 0002 is faster than master for all cases. It shows that the CopyToOneRow() approach doesn't have significant slow down. 0002 - from: 0002 changes COPY FROM code. So this may have performance impact. It's almost same as master when data is smaller ((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16 columns)). It's faster than master when data is larger. There is no significant slow down by 0002. Thanks, -- kou
v31-intel-core-i7-3770-result-1-2.pdf
Description: Adobe PDF document