I saw an interesting video on this topic -was anyone at the conference?

https://youtu.be/h3UcecN5fvQ?si=PlhrwMIv8s_wxAF1

Antoine, given you clearly understand the topic, what exactly does the
content at 25:30 mean (especially in terms of parquet)?

FYI, the ASF Community over Code conference in Glasgow will have its CfP
announced before long, and I think some talks on code security would be
good. I've got a working title of one "Open Source and CVEs: the forever
war"...
Something on fuzzing would be really good too

On Mon, 9 Feb 2026 at 09:23, Antoine Pitrou <[email protected]> wrote:

>
> Hi Micah,
>
> Le 08/02/2026 à 21:08, Micah Kornfield a écrit :
> >>
> >> I am also toying with the idea of a encoding/decoding fuzzer that
> >> roundtrips data (see "function/inverse pairs" in
> >> https://blog.regehr.org/archives/856). The question becomes in which
> >> format the fuzzer would accept input data for the encoding step (as
> >> Parquet files, which would mean a decoding/encoding/decoding roundtrip?
> >> as Arrow IPC files, which are a simpler format?).
> >
> > Sorry for the late reply.  It could also be the IPC json testing format?
>
> It could, but that introduces more overhead. The current Parquet full
> file fuzzer runs at around 100 iterations/second. Ideally a low-level
> Parquet encoding fuzzer should run at least 1-2 orders of magnitude
> faster so as to explore the search space more quickly.
>
> So my current inclination is to go with a custom fixed-size struct
> header indicating the physical type, encoding type and perhaps a couple
> other pieces of information.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to