Hi Micah, Thanks for sharing this, it will be really helpful for FSST and other encodings. For FSST, we realized the best results are achieved with per-page symbol tables since symbol tables are relatively small (~2 kb) and isolated to data-page, the draft FSST PR implements this. Would love to expand FSST to use this once available, multiple dictionary pages sounds very interesting! Thanks for sharing.
Warm Regards, Arnav On Mon, Nov 24, 2025 at 10:41 AM Micah Kornfield <[email protected]> wrote: > With the recent discussion on FSST, a few issues came up: > 1. Potentially sharing the FSST dictionary across pages. > 2. The current byte array value encodings aren't great for random access. > > I put together a doc [1] that provides a sketch of I think what the next > iteration page layouts for byte array values might look like. Mostly I'd > like to gather opinions if there is anything else people had in mind, any > strong objections to what is proposed and if others have thoughts they like > to pursue (ideally we can collect them on the doc). This work is somewhat > orthogonal to FSST but I think as long as we are addressing issues with > byte arrays via new encoding we should look holistically at the issues > involved. > > I'll note that there is probably some complexity that might prove not > useful, but I hope to assess this during evaluation. > > After I gather feedback (especially if people think there are hard > blockers) I'll put some effort into prototyping and reporting back results. > > Thanks, > Micah > > > > [1] > > https://docs.google.com/document/d/1FBcKzRa2XnZjuQbA8vA52-tu901z5K2szDQpXW9tbYU/edit?tab=t.0#heading=h.1k4n1nnvpb73 >
