Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
On 02/08/2022 14.37, Ulrich Windl wrote: "Fabio M. Di Nitto" schrieb am 02.08.2022 um 14:30 in Nachricht <0b26c097-1e21-3945-24ba-355cd0ccf...@fabbione.net>: Hello Kazunori-san, On 02/08/2022 12.13, 井上和徳 wrote: Hi, Since O_DIRECT is not specified in open() [1], it reads the buffer cache and may result in a false negative. I fear that this possibility increases in environments with large buffer cache and running disk-reading applications such as database. So, I think it's better to specify O_RDONLY|O_DIRECT, but what about it? (in this case, lseek() processing is unnecessary.) I will have to defer to Christine (in CC) to this email on why we didn´t use O_DIRECT. I have a vague recollection that some storage devices didn´t like the O_DIRECT option causing other issues later, but it´s been a while since I touched the code. The thing with O_DIRECT is that a "software block" has to be a multiple of a "hardware block"; i.e: you cannot read a partial block, and the buffer has some alignment requirements. that was it! thanks for refreshing my memory. Fabio # I am ready to create a patch that works with O_DIRECT. Also, I wouldn't mind # a "change to add a new mode of inspection with O_DIRECT # (add a option to storage_mon) while keeping the current inspection process". [1] https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# L47-L90 It might be a very reasonable solution, tho let´s wait for an answer from Chrissie. Cheers Fabio ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
On Wed, Aug 3, 2022 at 4:02 PM Ulrich Windl wrote: > > >>> Klaus Wenninger schrieb am 03.08.2022 um 15:51 in > Nachricht > : > > On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot wrote: > >> > >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: > >> > Hi, > >> > > >> > Since O_DIRECT is not specified in open() [1], it reads the buffer > >> > cache and > >> > may result in a false negative. I fear that this possibility > >> > increases > >> > in environments with large buffer cache and running disk-reading > >> > applications > >> > such as database. > >> > > >> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about > >> > it? > >> > (in this case, lseek() processing is unnecessary.) > >> > > >> > # I am ready to create a patch that works with O_DIRECT. Also, I > >> > wouldn't mind > >> > # a "change to add a new mode of inspection with O_DIRECT > >> > # (add a option to storage_mon) while keeping the current inspection > >> > process". > >> > > >> > [1] > >> > > > > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# > > > L47-L90 > >> > > >> > Best Regards, > >> > Kazunori INOUE > >> > >> I agree, it makes sense to use O_DIRECT when available. I don't think > >> an option is necessary. > > > > Might as well be interesting to adjust block-size/alignment to the > > device. > > Another consideration could be to on top directly access the block-layer > > using aio. > > Again AIO is POSIX; it depends on the implementation what it really does. Wasn't speaking of the Linux POSIX AIO implementation in userspace (guess that is still the case) but what is available as syscalls (io_submit, io_setup, io_cancel, io_destroy, io_getevents) that is afaik Linux proprietary and can't be wrapped into the Posix interface. > > > Both is being done in sbd (storage-based-death) and yes it as well > > adds Linux specific stuff that might have to be conditional for other OSs. > > > > Klaus > > > >> > >> However, O_DIRECT is not available on all OSes, so the configure script > >> should detect support. Also, it is not supported by all filesystems, so > >> if the open fails, we should retry without O_DIRECT. > >> -- > >> Ken Gaillot > >> > >> ___ > >> Manage your subscription: > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> ClusterLabs home: https://www.clusterlabs.org/ > > > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
>>> Klaus Wenninger schrieb am 03.08.2022 um 15:51 in Nachricht : > On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot wrote: >> >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: >> > Hi, >> > >> > Since O_DIRECT is not specified in open() [1], it reads the buffer >> > cache and >> > may result in a false negative. I fear that this possibility >> > increases >> > in environments with large buffer cache and running disk-reading >> > applications >> > such as database. >> > >> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about >> > it? >> > (in this case, lseek() processing is unnecessary.) >> > >> > # I am ready to create a patch that works with O_DIRECT. Also, I >> > wouldn't mind >> > # a "change to add a new mode of inspection with O_DIRECT >> > # (add a option to storage_mon) while keeping the current inspection >> > process". >> > >> > [1] >> > > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# > L47-L90 >> > >> > Best Regards, >> > Kazunori INOUE >> >> I agree, it makes sense to use O_DIRECT when available. I don't think >> an option is necessary. > > Might as well be interesting to adjust block-size/alignment to the > device. > Another consideration could be to on top directly access the block-layer > using aio. Again AIO is POSIX; it depends on the implementation what it really does. > Both is being done in sbd (storage-based-death) and yes it as well > adds Linux specific stuff that might have to be conditional for other OSs. > > Klaus > >> >> However, O_DIRECT is not available on all OSes, so the configure script >> should detect support. Also, it is not supported by all filesystems, so >> if the open fails, we should retry without O_DIRECT. >> -- >> Ken Gaillot >> >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
On 03.08.2022 09:02, Ulrich Windl wrote: Ken Gaillot schrieb am 02.08.2022 um 16:09 in > Nachricht > <0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>: >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: >>> Hi, >>> >>> Since O_DIRECT is not specified in open() [1], it reads the buffer >>> cache and >>> may result in a false negative. I fear that this possibility >>> increases >>> in environments with large buffer cache and running disk-reading >>> applications >>> such as database. >>> >>> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about >>> it? >>> (in this case, lseek() processing is unnecessary.) >>> >>> # I am ready to create a patch that works with O_DIRECT. Also, I >>> wouldn't mind >>> # a "change to add a new mode of inspection with O_DIRECT >>> # (add a option to storage_mon) while keeping the current inspection >>> process". >>> >>> [1] >>> >> > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# > >> L47-L90 >>> >>> Best Regards, >>> Kazunori INOUE >> >> I agree, it makes sense to use O_DIRECT when available. I don't think >> an option is necessary. >> >> However, O_DIRECT is not available on all OSes, so the configure script >> should detect support. Also, it is not supported by all filesystems, so >> if the open fails, we should retry without O_DIRECT. > > I just looked it up: It seems POSIX has O_RSYNC and O_SYNC and O_DSYNC) > instead. That is something entirely different. O_SYNC etc are about *file system level*, while O_DIRECT is about *device* level. O_DIRECT makes process to talk directly to device. It is unclear whether this is side effect of implementation or intentional. > The buffer cache handling may be different though. > Synchronous operation does not actually imply media access. O_RSYNC: "the operation has been completed or diagnosed if unsuccessful. The read is complete only when an image of the data has been successfully transferred to the requesting process". Returning buffered data satisfies this definition. Besides, Linux does not support O_RSYNC. O_DSYNC: "the operation has been completed or diagnosed if unsuccessful. The write is complete only when the data specified in the write request is successfully transferred and all file system information required to retrieve the data is successfully transferred". Writing to journal located on external device seems to comply with this definition. O_SYNC simply adds filesystem metadata update completion. So no, O_SYNC & Co cannot replace O_DIRECT. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
>>> Ken Gaillot schrieb am 02.08.2022 um 16:09 in Nachricht <0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>: > On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote: >> Hi, >> >> Since O_DIRECT is not specified in open() [1], it reads the buffer >> cache and >> may result in a false negative. I fear that this possibility >> increases >> in environments with large buffer cache and running disk-reading >> applications >> such as database. >> >> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about >> it? >> (in this case, lseek() processing is unnecessary.) >> >> # I am ready to create a patch that works with O_DIRECT. Also, I >> wouldn't mind >> # a "change to add a new mode of inspection with O_DIRECT >> # (add a option to storage_mon) while keeping the current inspection >> process". >> >> [1] >> > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# > L47-L90 >> >> Best Regards, >> Kazunori INOUE > > I agree, it makes sense to use O_DIRECT when available. I don't think > an option is necessary. > > However, O_DIRECT is not available on all OSes, so the configure script > should detect support. Also, it is not supported by all filesystems, so > if the open fails, we should retry without O_DIRECT. I just looked it up: It seems POSIX has O_RSYNC and O_SYNC and O_DSYNC) instead. The buffer cache handling may be different though. Regards, Ulrich > -- > Ken Gaillot > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon
>>> "Fabio M. Di Nitto" schrieb am 02.08.2022 um 14:30 in Nachricht <0b26c097-1e21-3945-24ba-355cd0ccf...@fabbione.net>: > Hello Kazunori-san, > > On 02/08/2022 12.13, 井上和徳 wrote: >> Hi, >> >> Since O_DIRECT is not specified in open() [1], it reads the buffer cache and >> may result in a false negative. I fear that this possibility increases >> in environments with large buffer cache and running disk-reading > applications >> such as database. >> >> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about it? >> (in this case, lseek() processing is unnecessary.) > > I will have to defer to Christine (in CC) to this email on why we didn´t > use O_DIRECT. > > I have a vague recollection that some storage devices didn´t like the > O_DIRECT option causing other issues later, but it´s been a while since > I touched the code. The thing with O_DIRECT is that a "software block" has to be a multiple of a "hardware block"; i.e: you cannot read a partial block, and the buffer has some alignment requirements. > >> >> # I am ready to create a patch that works with O_DIRECT. Also, I wouldn't > mind >> # a "change to add a new mode of inspection with O_DIRECT >> # (add a option to storage_mon) while keeping the current inspection > process". >> >> [1] > https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c# > L47-L90 > > It might be a very reasonable solution, tho let´s wait for an answer > from Chrissie. > > Cheers > Fabio > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/