Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-04 Thread Fabio M. Di Nitto



On 02/08/2022 14.37, Ulrich Windl wrote:

"Fabio M. Di Nitto"  schrieb am 02.08.2022 um 14:30

in
Nachricht <0b26c097-1e21-3945-24ba-355cd0ccf...@fabbione.net>:

Hello Kazunori-san,

On 02/08/2022 12.13, 井上和徳 wrote:

Hi,

Since O_DIRECT is not specified in open() [1], it reads the buffer cache

and

may result in a false negative. I fear that this possibility increases
in environments with large buffer cache and running disk-reading

applications

such as database.

So, I think it's better to specify O_RDONLY|O_DIRECT, but what about it?
(in this case, lseek() processing is unnecessary.)


I will have to defer to Christine (in CC) to this email on why we didn´t
use O_DIRECT.

I have a vague recollection that some storage devices didn´t like the
O_DIRECT option causing other issues later, but it´s been a while since
I touched the code.


The thing with O_DIRECT is that a "software block" has to be a multiple of a
"hardware block"; i.e: you cannot read a partial block, and the buffer has some
alignment requirements.



that was it! thanks for refreshing my memory.

Fabio





# I am ready to create a patch that works with O_DIRECT. Also, I wouldn't

mind

# a "change to add a new mode of inspection with O_DIRECT
# (add a option to storage_mon) while keeping the current inspection

process".


[1]



https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#


L47-L90

It might be a very reasonable solution, tho let´s wait for an answer
from Chrissie.

Cheers
Fabio
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Klaus Wenninger
On Wed, Aug 3, 2022 at 4:02 PM Ulrich Windl
 wrote:
>
> >>> Klaus Wenninger  schrieb am 03.08.2022 um 15:51 in
> Nachricht
> :
> > On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot  wrote:
> >>
> >> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
> >> > Hi,
> >> >
> >> > Since O_DIRECT is not specified in open() [1], it reads the buffer
> >> > cache and
> >> > may result in a false negative. I fear that this possibility
> >> > increases
> >> > in environments with large buffer cache and running disk-reading
> >> > applications
> >> > such as database.
> >> >
> >> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
> >> > it?
> >> > (in this case, lseek() processing is unnecessary.)
> >> >
> >> > # I am ready to create a patch that works with O_DIRECT. Also, I
> >> > wouldn't mind
> >> > # a "change to add a new mode of inspection with O_DIRECT
> >> > # (add a option to storage_mon) while keeping the current inspection
> >> > process".
> >> >
> >> > [1]
> >> >
> >
> https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#
>
> > L47-L90
> >> >
> >> > Best Regards,
> >> > Kazunori INOUE
> >>
> >> I agree, it makes sense to use O_DIRECT when available. I don't think
> >> an option is necessary.
> >
> > Might as well be interesting to adjust block-size/alignment to the
> > device.
> > Another consideration could be to on top directly access the block-layer
> > using aio.
>
> Again AIO is POSIX; it depends on the implementation what it really does.

Wasn't speaking of the Linux POSIX AIO implementation in userspace
(guess that is still the case) but what is available as syscalls
(io_submit, io_setup, io_cancel, io_destroy, io_getevents) that is afaik
Linux proprietary and can't be wrapped into the Posix interface.

>
> > Both is being done in sbd (storage-based-death) and yes it as well
> > adds Linux specific stuff that might have to be conditional for other OSs.
> >
> > Klaus
> >
> >>
> >> However, O_DIRECT is not available on all OSes, so the configure script
> >> should detect support. Also, it is not supported by all filesystems, so
> >> if the open fails, we should retry without O_DIRECT.
> >> --
> >> Ken Gaillot 
> >>
> >> ___
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Ulrich Windl
>>> Klaus Wenninger  schrieb am 03.08.2022 um 15:51 in
Nachricht
:
> On Tue, Aug 2, 2022 at 4:10 PM Ken Gaillot  wrote:
>>
>> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
>> > Hi,
>> >
>> > Since O_DIRECT is not specified in open() [1], it reads the buffer
>> > cache and
>> > may result in a false negative. I fear that this possibility
>> > increases
>> > in environments with large buffer cache and running disk-reading
>> > applications
>> > such as database.
>> >
>> > So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
>> > it?
>> > (in this case, lseek() processing is unnecessary.)
>> >
>> > # I am ready to create a patch that works with O_DIRECT. Also, I
>> > wouldn't mind
>> > # a "change to add a new mode of inspection with O_DIRECT
>> > # (add a option to storage_mon) while keeping the current inspection
>> > process".
>> >
>> > [1]
>> > 
>
https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#

> L47-L90
>> >
>> > Best Regards,
>> > Kazunori INOUE
>>
>> I agree, it makes sense to use O_DIRECT when available. I don't think
>> an option is necessary.
> 
> Might as well be interesting to adjust block-size/alignment to the
> device.
> Another consideration could be to on top directly access the block-layer
> using aio.

Again AIO is POSIX; it depends on the implementation what it really does.

> Both is being done in sbd (storage-based-death) and yes it as well
> adds Linux specific stuff that might have to be conditional for other OSs.
> 
> Klaus
> 
>>
>> However, O_DIRECT is not available on all OSes, so the configure script
>> should detect support. Also, it is not supported by all filesystems, so
>> if the open fails, we should retry without O_DIRECT.
>> --
>> Ken Gaillot 
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Andrei Borzenkov
On 03.08.2022 09:02, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 02.08.2022 um 16:09 in
> Nachricht
> <0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>:
>> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
>>> Hi,
>>>
>>> Since O_DIRECT is not specified in open() [1], it reads the buffer
>>> cache and
>>> may result in a false negative. I fear that this possibility
>>> increases
>>> in environments with large buffer cache and running disk-reading
>>> applications
>>> such as database.
>>>
>>> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
>>> it?
>>> (in this case, lseek() processing is unnecessary.)
>>>
>>> # I am ready to create a patch that works with O_DIRECT. Also, I
>>> wouldn't mind
>>> # a "change to add a new mode of inspection with O_DIRECT
>>> # (add a option to storage_mon) while keeping the current inspection
>>> process".
>>>
>>> [1] 
>>>
>>
> https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#
> 
>> L47-L90
>>>
>>> Best Regards,
>>> Kazunori INOUE
>>
>> I agree, it makes sense to use O_DIRECT when available. I don't think
>> an option is necessary.
>>
>> However, O_DIRECT is not available on all OSes, so the configure script
>> should detect support. Also, it is not supported by all filesystems, so
>> if the open fails, we should retry without O_DIRECT.
> 
> I just looked it up: It seems POSIX has O_RSYNC and O_SYNC and O_DSYNC)
> instead.

That is something entirely different. O_SYNC etc are about *file system
level*, while O_DIRECT is about *device* level. O_DIRECT makes process
to talk directly to device. It is unclear whether this is side effect of
implementation or intentional.

> The buffer cache handling may be different though.
> 

Synchronous operation does not actually imply media access.

O_RSYNC: "the operation has been completed or diagnosed if unsuccessful.
The read is complete only when an image of the data has been
successfully transferred to the requesting process". Returning buffered
data satisfies this definition. Besides, Linux does not support O_RSYNC.

O_DSYNC: "the operation has been completed or diagnosed if unsuccessful.
The write is complete only when the data specified in the write request
is successfully transferred and all file system information required to
retrieve the data is successfully transferred". Writing to journal
located on external device seems to comply with this definition.

O_SYNC simply adds filesystem metadata update completion.

So no, O_SYNC & Co cannot replace O_DIRECT.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 02.08.2022 um 16:09 in
Nachricht
<0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>:
> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
>> Hi,
>> 
>> Since O_DIRECT is not specified in open() [1], it reads the buffer
>> cache and
>> may result in a false negative. I fear that this possibility
>> increases
>> in environments with large buffer cache and running disk-reading
>> applications
>> such as database.
>> 
>> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
>> it?
>> (in this case, lseek() processing is unnecessary.)
>> 
>> # I am ready to create a patch that works with O_DIRECT. Also, I
>> wouldn't mind
>> # a "change to add a new mode of inspection with O_DIRECT
>> # (add a option to storage_mon) while keeping the current inspection
>> process".
>> 
>> [1] 
>> 
>
https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#

> L47-L90
>> 
>> Best Regards,
>> Kazunori INOUE
> 
> I agree, it makes sense to use O_DIRECT when available. I don't think
> an option is necessary.
> 
> However, O_DIRECT is not available on all OSes, so the configure script
> should detect support. Also, it is not supported by all filesystems, so
> if the open fails, we should retry without O_DIRECT.

I just looked it up: It seems POSIX has O_RSYNC and O_SYNC and O_DSYNC)
instead.
The buffer cache handling may be different though.

Regards,
Ulrich

> -- 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-02 Thread Ulrich Windl
>>> "Fabio M. Di Nitto"  schrieb am 02.08.2022 um 14:30
in
Nachricht <0b26c097-1e21-3945-24ba-355cd0ccf...@fabbione.net>:
> Hello Kazunori-san,
> 
> On 02/08/2022 12.13, 井上和徳 wrote:
>> Hi,
>> 
>> Since O_DIRECT is not specified in open() [1], it reads the buffer cache
and
>> may result in a false negative. I fear that this possibility increases
>> in environments with large buffer cache and running disk-reading 
> applications
>> such as database.
>> 
>> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about it?
>> (in this case, lseek() processing is unnecessary.)
> 
> I will have to defer to Christine (in CC) to this email on why we didn´t 
> use O_DIRECT.
> 
> I have a vague recollection that some storage devices didn´t like the 
> O_DIRECT option causing other issues later, but it´s been a while since 
> I touched the code.

The thing with O_DIRECT is that a "software block" has to be a multiple of a
"hardware block"; i.e: you cannot read a partial block, and the buffer has some
alignment requirements.

> 
>> 
>> # I am ready to create a patch that works with O_DIRECT. Also, I wouldn't 
> mind
>> # a "change to add a new mode of inspection with O_DIRECT
>> # (add a option to storage_mon) while keeping the current inspection 
> process".
>> 
>> [1] 
>
https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#

> L47-L90
> 
> It might be a very reasonable solution, tho let´s wait for an answer 
> from Chrissie.
> 
> Cheers
> Fabio
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/