Hi, On Wed, 11 Mar 2026 at 23:42, Nathan Bossart <[email protected]> wrote: > > On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote: > > Here is v14 which is v13-0001 + v13-0002. > > Thanks! It's getting close. > > > + /* > > + * Temporary variables are used here instead of passing the > > actual > > + * variables (especially input_buf_ptr) directly to the > > helper. Taking > > + * the address of a local variable might force the compiler to > > + * allocate it on the stack rather than in a register. > > Because > > + * input_buf_ptr is used heavily in the hot scalar path > > below, keeping > > + * it in a register is important for performance. > > + */ > > + int temp_input_buf_ptr; > > + bool temp_hit_eof = hit_eof; > > A few notes: > > * Does using a temporary variable for hit_eof actually make a difference? > AFAICT that's only updated when loading more data. > > * Does inlining the function produce the same results? > > * Also, I'm curious what the usual benchmarks look like with and without > this hack for the latest patch.
I tried to benchmark all of these questions, here are the results: Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155). v14 means d841ca2d14 + v14. v14 + #1 means removing temporary variables. v14 + #2 means removing temp_hit_eof variable only. v14 + #3 means inlining CopyReadLineTextSIMDHelper(). v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing temporary variables (#1). ------------------------------------------------------------ Results for default_toast_compression = 'lz4': +-------------------------------------------+ | Optimization: -O2 | +------------+--------------+---------------+ | | Text | CSV | +------------+------+-------+-------+-------+ | WIDE | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 4260 | 4789 | 5930 | 8276 | +------------+------+-------+-------+-------+ | v14 | 2489 | 4439 | 2529 | 8098 | +------------+------+-------+-------+-------+ | v14 + #1 | 2472 | 5177 | 2479 | 9285 | +------------+------+-------+-------+-------+ | v14 + #2 | 2521 | 4252 | 2481 | 8050 | +------------+------+-------+-------+-------+ | v14 + #3 | 2632 | 4569 | 2458 | 8657 | +------------+------+-------+-------+-------+ | v14 + #4 | 2476 | 4239 | 2475 | 10544 | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | Text | CSV | +------------+------+-------+-------+-------+ | NARROW | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 9955 | 10056 | 10329 | 10872 | +------------+------+-------+-------+-------+ | v14 | 9917 | 10080 | 10104 | 10510 | +------------+------+-------+-------+-------+ | v14 + #1 | 9913 | 10090 | 10120 | 10532 | +------------+------+-------+-------+-------+ | v14 + #2 | 9937 | 10130 | 10072 | 10520 | +------------+------+-------+-------+-------+ | v14 + #3 | 9880 | 10258 | 10220 | 10604 | +------------+------+-------+-------+-------+ | v14 + #4 | 9827 | 10306 | 10308 | 10734 | +------------+------+-------+-------+-------+ ------------------------------------------------------------ Results for default_toast_compression = 'pglz': +-------------------------------------------+ | Optimization: -O2 | +------------+--------------+---------------+ | | Text | CSV | +------------+------+-------+-------+-------+ | WIDE | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 4260 | 4789 | 5930 | 8276 | +------------+------+-------+-------+-------+ | v14 | 2489 | 4439 | 2529 | 8098 | +------------+------+-------+-------+-------+ | v14 + #1 | 2472 | 5177 | 2479 | 9285 | +------------+------+-------+-------+-------+ | v14 + #2 | 2521 | 4252 | 2481 | 8050 | +------------+------+-------+-------+-------+ | v14 + #3 | 2632 | 4569 | 2458 | 8657 | +------------+------+-------+-------+-------+ | v14 + #4 | 2476 | 4239 | 2475 | 10544 | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | Text | CSV | +------------+------+-------+-------+-------+ | NARROW | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 9955 | 10056 | 10329 | 10872 | +------------+------+-------+-------+-------+ | v14 | 9917 | 10080 | 10104 | 10510 | +------------+------+-------+-------+-------+ | v14 + #1 | 9913 | 10090 | 10120 | 10532 | +------------+------+-------+-------+-------+ | v14 + #2 | 9937 | 10130 | 10072 | 10520 | +------------+------+-------+-------+-------+ | v14 + #3 | 9880 | 10258 | 10220 | 10604 | +------------+------+-------+-------+-------+ | v14 + #4 | 9827 | 10306 | 10308 | 10734 | +------------+------+-------+-------+-------+ ------------------------------------------------------------ By looking these results: v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases. v14 + #4 performs worse on CSV & wide & 1/3 cases. v14 and v14 + #2 perform very similarly. They don't have regression. I think we can move forward with one of these. -- Regards, Nazir Bilal Yavuz Microsoft
