Hello! I ran some COPY FROM tests using master and then Nazir's v7-0001 and v7-0002 patches applied to master.
x86 master TXT : 29222.524250 ms CSV : 36162.588500 ms TXT with 1/3 escapes: 32922.649750 ms CSV with 1/3 quotes: 47631.423750 ms x86 v7-0001 TXT : 23247.834250 ms 20.445496% improvement CSV : 23162.711750 ms 35.948413% improvement TXT with 1/3 escapes: 31786.386000 ms 3.451313% improvement CSV with 1/3 quotes: 43330.475500 ms 9.029645% improvement x86 v7-0002 TXT : 22394.812500 ms 23.364552% improvement CSV : 22374.645750 ms 38.127643% improvement TXT with 1/3 escapes: 32378.929750 ms 1.651507% improvement CSV with 1/3 quotes: 47139.171750 ms 1.033461% improvement arm master TXT : 9448.900500 ms CSV : 11135.871500 ms TXT with 1/3 escapes: 10786.418750 ms CSV with 1/3 quotes: 14115.335500 ms arm v7-0001 TXT : 7271.170500 ms 23.047443% improvement CSV : 7259.866750 ms 34.806479% improvement TXT with 1/3 escapes: 10894.445500 ms -1.001507% regression CSV with 1/3 quotes: 13398.444000 ms 5.078813% improvement arm v7-0002 TXT : 7165.707250 ms 24.163587% improvement CSV : 7140.497250 ms 35.878416% improvement TXT with 1/3 escapes: 10308.782250 ms 4.428129% improvement CSV with 1/3 quotes: 12576.179500 ms 10.904140% improvement v7-0001 + v7-0002 applied to master certainly seems promising: nice to see speed improvements across the board on both x86 and arm! On Fri, Feb 13, 2026 at 5:09 PM Nathan Bossart <[email protected]> wrote: > On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote: > > Also, if I change this code to: > > > > if (cstate->simd_enabled) > > { > > if (is_csv) > > result = CopyReadLineText(cstate, true, true); > > else > > result = CopyReadLineText(cstate, false, true); > > } > > else > > { > > if (is_csv) > > result = CopyReadLineText(cstate, true, false); > > else > > result = CopyReadLineText(cstate, false, false); > > } > > > > then I see ~%5 performance improvement in scalar path compared to master. > > Hm. What difference do you see if you just do > > if (is_csv) > result = CopyReadLineText(cstate, true); > else > result = CopyReadLineText(cstate, false); > > both with and without the SIMD stuff? IIUC this is allowing the compiler > to remove several branches in CopyReadLineText(), which might be a nice > improvement on its own. That being said, I'm less convinced that adding a > simd_enabled parameter to CopyReadLineText() helps, because 1) it's > involved in fewer branches and 2) we change it within the function, so the > compiler can't remove the branches, anyway. But perhaps I'm missing > something. > > Some other random thoughts: > > + match = vector8_or(vector8_eq(chunk, nl), > vector8_eq(chunk, cr)); > > + match = vector8_or(vector8_eq(chunk, nl), > vector8_eq(chunk, cr)); > > Since \n and \r are well below "normal" ASCII values, I wonder if we could > simplify these to something like > > match = vector8_gt(... vector with all lanes set to \r + 1 ..., > chunk); > > + /* Check if we found any special characters */ > + mask = vector8_highbit_mask(match); > + if (mask != 0) > > vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if > waiting until we enter the "if" block to calculate it has any benefit. > > + simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || > !in_quote); > > If (is_csv && in_quote), we shouldn't have picked up \r or \n in the first > place, right? > > + simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv; > + > + /* > + * Do not disable SIMD when we hit EOL or EOF characters. > In > + * practice, it does not matter for EOF because parsing > ends > + * there, but we keep the behavior consistent. > + */ > + if (!(simd_hit_eof || simd_hit_eol)) > > I'd think that doing less unnecessary work would outweigh the benefits of > consistency for the EOF case. > > -- > nathan > -- -- Manni Wood EDB: https://www.enterprisedb.com
