On Mon, Mar 9, 2026 at 1:25 PM Nathan Bossart <[email protected]>
wrote:

> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool
> is_csv,
> > +
> bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?
>
> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or
> EOF characters. In
> > +                              * practice, it does not matter for EOF
> because parsing ends
> > +                              * there, but we keep the behavior
> consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.
>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the
> first vector. This means
> > +                              * lines are not long enough to skip fully
> sized vector. If
> > +                              * this happens two times consecutively,
> then disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if
> (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled =
> false;
> > +
> > +                                     cstate->simd_failed_first_vector =
> true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.
>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv,
> &temp_hit_eof,
> > +
>              &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.
>
> --
> nathan
>

Here are some benchmarks showing what performance will look like for users
who continue to use default_toast_compression = pglz.

all compiled by meson with debugoptimized (-g -O2)

arm NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 10055.141000 ms
CSV :                 10549.174500 ms
TXT with 1/3 escapes: 10213.864750 ms
CSV with 1/3 quotes:  12188.039000 ms

arm NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 10070.153750 ms  -0.149304% regression
CSV :                 10161.348750 ms   3.676361% improvement
TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression

arm WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 11355.602750 ms
CSV :                 13893.110500 ms
TXT with 1/3 escapes: 12872.690500 ms
CSV with 1/3 quotes:  16722.262500 ms

arm WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 9001.007250 ms  20.735099% improvement
CSV :                 8988.679750 ms  35.301171% improvement
TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement


x86 NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 26243.084500 ms
CSV :                 27719.564000 ms
TXT with 1/3 escapes: 29578.192750 ms
CSV with 1/3 quotes:  34467.571250 ms

x86 NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 26371.996750 ms  -0.491224% regression
CSV :                 26137.186500 ms   5.708522% improvement
TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement

x86 WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 28734.774750 ms
CSV :                 35700.485000 ms
TXT with 1/3 escapes: 32376.878250 ms
CSV with 1/3 quotes:  47024.985750 ms

x86 WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 22753.755750 ms  20.814567% improvement
CSV :                 22977.195500 ms  35.638982% improvement
TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement
-- 
-- Manni Wood EDB: https://www.enterprisedb.com

Reply via email to