On Wed, May 8, 2024 at 6:54 AM Justin Pryzby <pry...@telsasoft.com> wrote:
> On Tue, May 07, 2024 at 10:55:28AM +1200, Thomas Munro wrote:
> > https://github.com/openzfs/zfs/issues/11641
> >
> > I don't know enough to say anything useful about that but it certainly
> > smells similar...
>
> Wow - I'd completely forgotten about that problem report.
> The symptoms are the same, even with a zfs version 3+ years newer.
> I wish the ZFS people would do more with their problem reports.

If I had to guess, my first idea would be that your 1MB or ginormous
16MB recordsize (a relatively new option) combined with PostgreSQL's
8KB block-at-a-time random order I/O patterns are tickling strange
corners and finding a bug that no one has seen before.  I would
imagine that almost everyone in the galaxy who uses very large records
does so with 'settled' data that gets streamed out once sequentially
(for example I think some of the OpenZFS maintainers are at Lawrence
Livermore National Lab where I guess they might pump around petabytes
of data produced by particle physics research or whatever it might be,
probably why they they are also adding direct I/O to avoid caches
completely...).  But for us, if we have lots of backends reading,
writing and extending random 8KB fragments of a 16MB page concurrently
(2048 pages per record!), maybe we hit some broken edge...  I'd be
sure to include that sort of detail in any future reports.

Normally I suppress urges to blame problems on kernels, file systems
etc and in the past accusations that ZFS was buggy turned out to be
bugs in PostgreSQL IIRC, but user space sure seems to be off the hook
for this one...


Reply via email to