Hi,

On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <[email protected]> wrote:
>
> On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <[email protected]> wrote:
>>
>> * I'm curious if we'll see better performance on large inputs if we flush to 
>> `line_buf` periodically (e.g., at least every few thousand bytes or so). 
>> Otherwise we might see poor data cache behavior if large inputs with no 
>> control characters get evicted before we've copied them over. See the 
>> approach taken in escape_json_with_len() in utils/adt/json.c
>>
> So i gave this a try, attached is the small patch that has v3 + the 
> suggestion added, here are the results with different threshold for line_buf 
> refill:
>
> Execution time compared to master:
> Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3%
> text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2%
> csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6%
> csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0%
>
> L1d cache miss rates:
> WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22%
> text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22%
> csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22%
> csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20%
>
> On my laptop I have 32KB L1 cache per core.
> Results are super close, it is hard to see in the cache misses numbers but 
> execution times are saying other things, doing the periodic filling of 
> line_buf seems good to do.
> If Manni can rerun the benchmarks on these too, it would be nice to confirm 
> this.

I looked at this change and had a couple of points.

We already have REFILL_LINEBUF at the start of the for loop in the
CopyReadLineText() function (let’s call this refill #1). This refills
when the input_buf_ptr >= copy_buf_len check is true. On my end,
copy_buf_len stays at 8191 until the end of the input, and then it
becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to
8192, the REFILL_LINEBUF you added shouldn’t be called; instead,
refill #1 should be triggered.

I verified this manually by adding some logging, and the results seem
to confirm this behavior. Based on that, there shouldn’t be a
performance difference when LINE_BUF_FLUSH_AFTER >= 8k.

Could you please take a look and confirm whether you see the same behavior?

Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512,
so it might be worth trying smaller values here as well.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Reply via email to