Hi, On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <[email protected]> wrote: > > On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <[email protected]> wrote: >> >> * I'm curious if we'll see better performance on large inputs if we flush to >> `line_buf` periodically (e.g., at least every few thousand bytes or so). >> Otherwise we might see poor data cache behavior if large inputs with no >> control characters get evicted before we've copied them over. See the >> approach taken in escape_json_with_len() in utils/adt/json.c >> > So i gave this a try, attached is the small patch that has v3 + the > suggestion added, here are the results with different threshold for line_buf > refill: > > Execution time compared to master: > Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k) > text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3% > text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2% > csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6% > csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0% > > L1d cache miss rates: > WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k) > text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22% > text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22% > csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22% > csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20% > > On my laptop I have 32KB L1 cache per core. > Results are super close, it is hard to see in the cache misses numbers but > execution times are saying other things, doing the periodic filling of > line_buf seems good to do. > If Manni can rerun the benchmarks on these too, it would be nice to confirm > this.
I looked at this change and had a couple of points. We already have REFILL_LINEBUF at the start of the for loop in the CopyReadLineText() function (let’s call this refill #1). This refills when the input_buf_ptr >= copy_buf_len check is true. On my end, copy_buf_len stays at 8191 until the end of the input, and then it becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to 8192, the REFILL_LINEBUF you added shouldn’t be called; instead, refill #1 should be triggered. I verified this manually by adding some logging, and the results seem to confirm this behavior. Based on that, there shouldn’t be a performance difference when LINE_BUF_FLUSH_AFTER >= 8k. Could you please take a look and confirm whether you see the same behavior? Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512, so it might be worth trying smaller values here as well. -- Regards, Nazir Bilal Yavuz Microsoft
