On Tue, Jan 6, 2026 at 2:05 PM Manni Wood <[email protected]>
wrote:

>
>
> On Wed, Dec 31, 2025 at 7:04 AM Nazir Bilal Yavuz <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <[email protected]> wrote:
>> >
>> > Hello,
>> > Following the same path of optimizing COPY FROM using SIMD, i found
>> that COPY TO can also benefit from this.
>> >
>> > I attached a small patch that uses SIMD to skip data and advance as far
>> as the first special character is found, then fallback to scalar processing
>> for that character and re-enter the SIMD path again...
>> > There's two ways to do this:
>> > 1) Essentially we do SIMD until we find a special character, then
>> continue scalar path without re-entering SIMD again.
>> > - This gives from 10% to 30% speedups depending on the weight of
>> special characters in the attribute, we don't lose anything here since it
>> advances with SIMD until it can't (using the previous scripts: 1/3, 2/3
>> specials chars).
>> >
>> > 2) Do SIMD path, then use scalar path when we hit a special character,
>> keep re-entering the SIMD path each time.
>> > - This is equivalent to the COPY FROM story, we'll need to find the
>> same heuristic to use for both COPY FROM/TO to reduce the regressions (same
>> regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>> >
>> > Something else to note is that the scalar path for COPY TO isn't as
>> heavy as the state machine in COPY FROM.
>> >
>> > So if we find the sweet spot for the heuristic, doing the same for COPY
>> TO will be trivial and always beneficial.
>> > Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is
>> the second one.
>>
>> Patches look correct to me. I think we could move these SIMD code
>> portions into a shared function to remove duplication, although that
>> might have a performance impact. I have not benchmarked these patches
>> yet.
>>
>> Another consideration is that these patches might need their own
>> thread, though I am not completely sure about this yet.
>>
>> One question: what do you think about having a 0004-style approach for
>> COPY FROM? What I have in mind is running SIMD for each line & column,
>> stopping SIMD once it can no longer skip an entire chunk, and then
>> continuing with the next line & column.
>>
>> --
>> Regards,
>> Nazir Bilal Yavuz
>> Microsoft
>>
>
> Hello, Nazir, I tried your suggested cpupower commands as well as
> disabling turbo, and my results are indeed more uniform. (see attached
> screenshot of my spreadsheet).
>
> This time, I ran the tests on my Tower PC instead of on my laptop.
>
> I also followed Mark Wong's advice and used the taskset command to pin my
> postgres postmaster (and all of its children) to a single cpu core.
>
> So when I start postgres, I do this to pin it to core 27:
>
> ${PGHOME}/bin/pg_ctl -D ${PGHOME}/data -l ${PGHOME}/logfile.txt start
> PGPID=$(head -1 ${PGHOME}/data/postmaster.pid)
> taskset --cpu-list -p 27 ${PGPID}
>
>
> My results seem similar to yours:
>
> master: Nazir 85ddcc2f4c | Manni 877ae5db
>
> text, no special: 102294 | 302651
> text, 1/3 special: 108946 | 326208
> csv, no special: 121831 | 348930
> csv, 1/3 special: 140063 | 439786
>
> v3
>
> text, no special: 88890 (13.1% speedup) | 227874 (24.7% speedup)
> text, 1/3 special: 110463 (1.4% regression) | 322637 (1.1% speedup)
> csv, no special: 89781 (26.3% speedup) | 226525 (35.1% speedup)
> csv, 1/3 special: 147094 (5.0% regression) | 461501 (4.9% regression)
>
> v4.2
>
> text, no special: 87785 (14.2% speedup) | 225702 (25.4% speedup)
> text, 1/3 special: 127008 (16.6% regression) | 343480 (5.3% regression)
> csv, no special: 88093 (27.7% speedup) | 226633 (35.0% speedup)
> csv, 1/3 special: 164487 (17.4% regression) | 510954 (16.2% regression)
>
> It would seem that both your results and mine show a more serious
> worst-case regression for the v4.2 patches than for the v3 patches. It
> seems also that the speedups for v4.2 and v3 are similar.
>
> I'm currently working with Mark Wong to see if his results continue to be
> dissimilar (as they currently are now) and, if so, why.
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>

Hello, all.

Now that I am following Nazir's on how to configure my CPU for performance
test run, and now that I am following Mark's advice on pinning the
postmaster to a particular CPU core, I figured I would share the scripts I
have been using to build, run, and test Postges with various patches
applied: https://github.com/manniwood/copysimdperf

With Nazir and Mark's tips, I have seen more consistent numbers on my tower
PC, as shared in a previous e-mail. But Mark and I saw rather variable
results on a different Linux system he has access to. So this has inspired
me to spin up an AWS EC2 instance and test that when I find the time. And
maybe re-test on my Linux laptop.

If anybody else is inspired to test on different setups, that would be
great.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com

Reply via email to