Hi,

We have an application where each rank needs to write data into non-overlapping regions of a pre-existing file. As the writes are for checkpoints there is no need to read any of the data back before the file is closed.

A simple open(...), pwrite(...), pwrite(...), close(...) chain works fine here. However, as one might expect the locking situation isn't ideal.

We are therefore looking into using group locks where after each rank opens the file it issues the relevant ioctl(..., LL_IOC_GROUP_LOCK, ...). If we issue our same set of pwrites under a group lock we find that data is occasionally missing. Given our writes can straddle pages this isn't surprising as my understanding is that the page cache only tracks if a page is dirty or not.

So, we reworked our code slightly to ensure that each page is only ever written to by a single rank. However, even here we find data to occasionally be missing from the file with the offsets corresponding to boundaries between hosts. We have even tried increasing the size up to the stripe size for the file (so each N MiB stripe is only ever written to by a single rank) but to no avail.

Hence, I am wondering what the specific semantics are for writes under a group lock? Do we have to use O_DIRECT and bypass the page cache, are there more significant alignment requirements than pages?

Regards, Freddie.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to