Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-28 Thread Martin Wilck
On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:
> On Tue, 2021-04-27 at 20:33 +, Martin Wilck wrote:
> > On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > > 
> > > There's no way to do that, in principle.  Because there could be
> > > other I/Os in flight.  You might (somehow) avoid retrying an I/O
> > > that got a UA until you figured out if something changed, but other
> > > I/Os can already have been sent to the target, or issued before you
> > > get to look at the status.
> > 
> > Right. But in practice, a WWID change will hardly happen under full
> > IO
> > load. The storage side will probably have to block IO while this
> > happens, at least for a short time period. So blocking and quiescing
> > the queue upon an UA might still work, most of the time. Even if we
> > were too late already, the sooner we stop the queue, the better.
> > 
> > The current algorithm in multipath-tools needs to detect a path going
> > down and being reinstated. The time interval during which a WWID
> > change
> > will go unnoticed is one or more path checker intervals, typically on
> > the order of 5-30 seconds. If we could decrease this interval to a
> > sub-
> > second or even millisecond range by blocking the queue in the kernel
> > quickly, we'd have made a big step forward.
> 
> Yes, and in many situations this may help.  But in the general case
> we can't protect against a storage array misconfiguration,
> where something like this can happen.  So I worry about people
> believing the host software will protect them against a mistake,
> when we can't really do that.

I agree. I expressed a similar notion in the following thread about
multipathd's WWID change detection capabilities in the face of really
bad mistakes on the administrator's (or storage array's, FTM)  part:
https://listman.redhat.com/archives/dm-devel/2021-February/msg00248.html
But others stressed that nonetheless we should try our best to
avoid customer data corruption (which I agree with, too), and thus we
settled on the current algorithm, which suited the needs at least of
the affected user(s) in that specific case.

Personally I think that the current "5-30s" time period for WWID change
detection in multipathd is unsafe both theoretically and practially,
and may lure users into a false feeling of safety. Therefore I'd
strongly welcome a kernel-side solution that might still not be safe
theoretically, but cover most practical problem scenarios much better
than we currently do.

Regards
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-27 Thread Martin Wilck
On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> On Mon, 2021-04-26 at 13:16 +, Martin Wilck wrote:
> > On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > > > 
> > > > 
> > > > While we're at it, I'd like to mention another issue: WWID
> > > > changes.
> > > > 
> > > > This is a big problem for multipathd. The gist is that the
> > > > device
> > > > identification attributes in sysfs only change after rescanning
> > > > the
> > > > device. Thus if a user changes LUN assignments on a storage
> > > > system,
> > > > it can happen that a direct INQUIRY returns a different WWID as
> > > > in
> > > > sysfs, which is fatal. If we plan to rely more on sysfs for
> > > > device
> > > > identification in the future, the problem gets worse. 
> > > 
> > > I think many devices rely on the fact that they are identified by
> > > Vendor/model/serial_nr, because in most professional SAN storage
> > > systems you
> > > can pre-set the serial number to a custom value; so if you want a
> > > new
> > > disk
> > > (maybe a snapshot) to be compatible with the old one, just assign
> > > the
> > > same
> > > serial number. I guess that's the idea behind.
> > 
> > What you are saying sounds dangerous to me. If a snapshot has the
> > same
> > WWID as the device it's a snapshot of, it must not be exposed to
> > any
> > host(s) at the same time with its origin, otherwise the host may
> > happily combine it with the origin into one multipath map, and data
> > corruption will almost certainly result. 
> > 
> > My argument is about how the host is supposed to deal with a WWID
> > change if it happens. Here, "WWID change" means that a given
> > H:C:T:L
> > suddenly exposes different device designators than it used to,
> > while
> > this device is in use by a host. Here, too, data corruption is
> > imminent, and can happen in a blink of an eye. To avoid this,
> > several
> > things are needed:
> > 
> >  1) the host needs to get notified about the change (likely by an
> > UA
> > of
> > some sort)
> >  2) the kernel needs to react to the notification immediately, e.g.
> > by
> > blocking IO to the device,
> 
> There's no way to do that, in principle.  Because there could be
> other I/Os in flight.  You might (somehow) avoid retrying an I/O
> that got a UA until you figured out if something changed, but other
> I/Os can already have been sent to the target, or issued before you
> get to look at the status.

Right. But in practice, a WWID change will hardly happen under full IO
load. The storage side will probably have to block IO while this
happens, at least for a short time period. So blocking and quiescing
the queue upon an UA might still work, most of the time. Even if we
were too late already, the sooner we stop the queue, the better.

The current algorithm in multipath-tools needs to detect a path going
down and being reinstated. The time interval during which a WWID change
will go unnoticed is one or more path checker intervals, typically on
the order of 5-30 seconds. If we could decrease this interval to a sub-
second or even millisecond range by blocking the queue in the kernel
quickly, we'd have made a big step forward.

Regards
Martin

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-27 Thread Ewan D. Milne
On Tue, 2021-04-27 at 20:33 +, Martin Wilck wrote:
> On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > 
> > There's no way to do that, in principle.  Because there could be
> > other I/Os in flight.  You might (somehow) avoid retrying an I/O
> > that got a UA until you figured out if something changed, but other
> > I/Os can already have been sent to the target, or issued before you
> > get to look at the status.
> 
> Right. But in practice, a WWID change will hardly happen under full
> IO
> load. The storage side will probably have to block IO while this
> happens, at least for a short time period. So blocking and quiescing
> the queue upon an UA might still work, most of the time. Even if we
> were too late already, the sooner we stop the queue, the better.
> 
> The current algorithm in multipath-tools needs to detect a path going
> down and being reinstated. The time interval during which a WWID
> change
> will go unnoticed is one or more path checker intervals, typically on
> the order of 5-30 seconds. If we could decrease this interval to a
> sub-
> second or even millisecond range by blocking the queue in the kernel
> quickly, we'd have made a big step forward.

Yes, and in many situations this may help.  But in the general case
we can't protect against a storage array misconfiguration,
where something like this can happen.  So I worry about people
believing the host software will protect them against a mistake,
when we can't really do that.

All it takes is one I/O (a discard) to make a thorough mess of the LUN.

-Ewan

> 
> Regards
> Martin
> 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-27 Thread Ewan D. Milne
On Mon, 2021-04-26 at 13:16 +, Martin Wilck wrote:
> On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > > 
> > > 
> > > While we're at it, I'd like to mention another issue: WWID
> > > changes.
> > > 
> > > This is a big problem for multipathd. The gist is that the device
> > > identification attributes in sysfs only change after rescanning
> > > the
> > > device. Thus if a user changes LUN assignments on a storage
> > > system,
> > > it can happen that a direct INQUIRY returns a different WWID as
> > > in
> > > sysfs, which is fatal. If we plan to rely more on sysfs for
> > > device
> > > identification in the future, the problem gets worse. 
> > 
> > I think many devices rely on the fact that they are identified by
> > Vendor/model/serial_nr, because in most professional SAN storage
> > systems you
> > can pre-set the serial number to a custom value; so if you want a
> > new
> > disk
> > (maybe a snapshot) to be compatible with the old one, just assign
> > the
> > same
> > serial number. I guess that's the idea behind.
> 
> What you are saying sounds dangerous to me. If a snapshot has the
> same
> WWID as the device it's a snapshot of, it must not be exposed to any
> host(s) at the same time with its origin, otherwise the host may
> happily combine it with the origin into one multipath map, and data
> corruption will almost certainly result. 
> 
> My argument is about how the host is supposed to deal with a WWID
> change if it happens. Here, "WWID change" means that a given H:C:T:L
> suddenly exposes different device designators than it used to, while
> this device is in use by a host. Here, too, data corruption is
> imminent, and can happen in a blink of an eye. To avoid this, several
> things are needed:
> 
>  1) the host needs to get notified about the change (likely by an UA
> of
> some sort)
>  2) the kernel needs to react to the notification immediately, e.g.
> by
> blocking IO to the device,

There's no way to do that, in principle.  Because there could be
other I/Os in flight.  You might (somehow) avoid retrying an I/O
that got a UA until you figured out if something changed, but other
I/Os can already have been sent to the target, or issued before you
get to look at the status.

-Ewan

>  3) userspace tooling such as udev or multipathd need to figure out
> how
> to  how to deal with the situation cleanly, and eventually unblock
> it.
> 
> Wrt 1), we can only hope that it's the case. But 2) and 3) need work,
> afaics.
> 
> Martin
> 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-26 Thread Martin Wilck
On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > 
> > 
> > While we're at it, I'd like to mention another issue: WWID changes.
> > 
> > This is a big problem for multipathd. The gist is that the device
> > identification attributes in sysfs only change after rescanning the
> > device. Thus if a user changes LUN assignments on a storage system,
> > it can happen that a direct INQUIRY returns a different WWID as in
> > sysfs, which is fatal. If we plan to rely more on sysfs for device
> > identification in the future, the problem gets worse. 
> 
> I think many devices rely on the fact that they are identified by
> Vendor/model/serial_nr, because in most professional SAN storage
> systems you
> can pre-set the serial number to a custom value; so if you want a new
> disk
> (maybe a snapshot) to be compatible with the old one, just assign the
> same
> serial number. I guess that's the idea behind.

What you are saying sounds dangerous to me. If a snapshot has the same
WWID as the device it's a snapshot of, it must not be exposed to any
host(s) at the same time with its origin, otherwise the host may
happily combine it with the origin into one multipath map, and data
corruption will almost certainly result. 

My argument is about how the host is supposed to deal with a WWID
change if it happens. Here, "WWID change" means that a given H:C:T:L
suddenly exposes different device designators than it used to, while
this device is in use by a host. Here, too, data corruption is
imminent, and can happen in a blink of an eye. To avoid this, several
things are needed:

 1) the host needs to get notified about the change (likely by an UA of
some sort)
 2) the kernel needs to react to the notification immediately, e.g. by
blocking IO to the device,
 3) userspace tooling such as udev or multipathd need to figure out how
to  how to deal with the situation cleanly, and eventually unblock it.

Wrt 1), we can only hope that it's the case. But 2) and 3) need work,
afaics.

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-23 Thread Martin Wilck
On Thu, 2021-04-22 at 21:40 -0400, Martin K. Petersen wrote:
> 
> Martin,
> 
> > I suppose 99.9% of users never bother with customizing the udev
> > rules.
> 
> Except for the other 99.9%, perhaps? :) We definitely have many users
> that tweak udev storage rules for a variety of reasons. Including
> being
> able to use RII for LUN naming purposes.
> 
> > But we can actually combine both approaches. If "wwid" yields a
> > good
> > value most of the time (which is true IMO), we could make user
> > space
> > rely on it by default, and make it possible to set an udev property
> > (e.g. ENV{ID_LEGACY}="1") to tell udev rules to determine WWID
> > differently. User-space apps like multipath could check the
> > ID_LEGACY
> > property to determine whether or not reading the "wwid" attribute
> > would
> > be consistent with udev. That would simplify matters a lot for us
> > (Ben,
> > do you agree?), without the need of adding endless BLIST entries.
> 
> That's fine with me.
> 
> > AFAICT, no major distribution uses "wwid" for this purpose (yet).
> 
> We definitely have users that currently rely on wwid, although
> probably
> not through standard distro scripts.
> 
> > In a recent discussion with Hannes, the idea came up that the
> > priority
> > of "SCSI name string" designators should actually depend on their
> > subtype. "naa." name strings should map to the respective NAA
> > descriptors, and "eui.", likewise (only "iqn." descriptors have no
> > binary counterpart; we thought they should rather be put below NAA,
> > prio-wise).
> 
> I like what NVMe did wrt. to exporting eui, nguid, uuid separately
> from
> the best-effort wwid. That's why I suggested separate sysfs files for
> the various page 0x83 descriptors. I like the idea of being able to
> explicitly ask for an eui if that's what I need. But that appears to
> be
> somewhat orthogonal to your request.
> 
> > I wonder if you'd agree with a change made that way for "wwid". I
> > suppose you don't. I'd then propose to add a new attribute
> > following
> > this logic. It could simply be an additional attribute with a
> > different
> > name. Or this new attribute could be a property of the block device
> > rather than the SCSI device, like NVMe does it
> > (/sys/block/nvme0n2/wwid).
> 
> That's fine. I am not a big fan of the idea that block/foo/wwid and
> block/foo/device/wwid could end up being different. But I do think
> that
> from a userland tooling perspective the consistency with NVMe is more
> important.

OK, then here's the plan: Change SCSI (block) device identification to
work similar to NVMe (in addition to what we have now).

 1. add a new sysfs attribute for SCSI block devices as
/sys/block/sd$X/wwid, the value derived similar to the current "wwid"
SCSI device attribute, but using the same prio for SCSI name strings as
for their binary counterparts, as described above.

 2. add "naa" and "eui" attributes, too, for user-space applications
that are interested in these specific attributes. 
Fixme: should we differentiate between different "naa" or eui subtypes,
like "naa_regext", "eui64" or similar? If the device defines multiple
"naa" designators, which one should we choose?

 3. Change udev rules such that they primarily look at the attribute in
1.) on new installments, and introduce a variable ID_LEGACY to tell the
rules to fall back to the current algorithm. I suppose it makes sense
to have at least ID_VENDOR and ID_PRODUCT available when making this
decision, so that it doesn't have to be a global setting on a given
host.

While we're at it, I'd like to mention another issue: WWID changes.

This is a big problem for multipathd. The gist is that the device
identification attributes in sysfs only change after rescanning the
device. Thus if a user changes LUN assignments on a storage system, 
it can happen that a direct INQUIRY returns a different WWID as in
sysfs, which is fatal. If we plan to rely more on sysfs for device
identification in the future, the problem gets worse. 

I wonder if there's a chance that future kernels would automatically
update the attributes if a corresponding UNIT ATTENTION condition such
as INQUIRY DATA HAS CHANGED is received (*), or if we can find some
other way to avoid data corruption resulting from writing to the wrong
device.

Regards,
Martin

(*) I've been told that WWID changes can happen even without receiving
an UA. But in that case I'm inclined to put the blame on the storage.

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-22 Thread Martin K. Petersen


Martin,

> I suppose 99.9% of users never bother with customizing the udev rules.

Except for the other 99.9%, perhaps? :) We definitely have many users
that tweak udev storage rules for a variety of reasons. Including being
able to use RII for LUN naming purposes.

> But we can actually combine both approaches. If "wwid" yields a good
> value most of the time (which is true IMO), we could make user space
> rely on it by default, and make it possible to set an udev property
> (e.g. ENV{ID_LEGACY}="1") to tell udev rules to determine WWID
> differently. User-space apps like multipath could check the ID_LEGACY
> property to determine whether or not reading the "wwid" attribute would
> be consistent with udev. That would simplify matters a lot for us (Ben,
> do you agree?), without the need of adding endless BLIST entries.

That's fine with me.

> AFAICT, no major distribution uses "wwid" for this purpose (yet).

We definitely have users that currently rely on wwid, although probably
not through standard distro scripts.

> In a recent discussion with Hannes, the idea came up that the priority
> of "SCSI name string" designators should actually depend on their
> subtype. "naa." name strings should map to the respective NAA
> descriptors, and "eui.", likewise (only "iqn." descriptors have no
> binary counterpart; we thought they should rather be put below NAA,
> prio-wise).

I like what NVMe did wrt. to exporting eui, nguid, uuid separately from
the best-effort wwid. That's why I suggested separate sysfs files for
the various page 0x83 descriptors. I like the idea of being able to
explicitly ask for an eui if that's what I need. But that appears to be
somewhat orthogonal to your request.

> I wonder if you'd agree with a change made that way for "wwid". I
> suppose you don't. I'd then propose to add a new attribute following
> this logic. It could simply be an additional attribute with a different
> name. Or this new attribute could be a property of the block device
> rather than the SCSI device, like NVMe does it
> (/sys/block/nvme0n2/wwid).

That's fine. I am not a big fan of the idea that block/foo/wwid and
block/foo/device/wwid could end up being different. But I do think that
from a userland tooling perspective the consistency with NVMe is more
important.

-- 
Martin K. Petersen  Oracle Linux Engineering
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-22 Thread Benjamin Marzinski
On Thu, Apr 22, 2021 at 09:07:15AM +, Martin Wilck wrote:
> On Wed, 2021-04-21 at 22:46 -0400, Martin K. Petersen wrote:
> > 
> > Martin,
> > 
> > > Hm, it sounds intriguing, but it has issues in its own right. For
> > > years to come, user space will have to probe whether these attribute
> > > exist, and fall back to the current ones ("wwid", "vpd_pg83")
> > > otherwise. So user space can't be simplified any time soon. Speaking
> > > for an important user space consumer of WWIDs (multipathd), I doubt
> > > that this would improve matters for us. We'd be happy if the kernel
> > > could just pick the "best" designator for us. But I understand that
> > > the kernel can't guarantee a good choice (user space can't either).
> > 
> > But user space can be adapted at runtime to pick one designator over
> > the
> > other (ha!).
> 
> And that's exactly the problem. Effectively, all user space relies on
> udev today, because that's where this "adaptation" is taking place. It
> happens
> 
>  1) either in systemd's scsi_id built-in 
>
> (https://github.com/systemd/systemd/blob/7feb1dd6544d1bf373dbe13dd33cf563ed16f891/src/udev/scsi_id/scsi_serial.c#L37)
>  2) or in the udev rules coming with sg3_utils 
>
> (https://github.com/hreinecke/sg3_utils/blob/master/scripts/55-scsi-sg3_id.rules)
> 
> 1) is just as opaque and un-"adaptable" as the kernel, and the logic is
> suboptimal. 2) is of course "adaptable", but that's a problem in
> practice, if udev fails to provide a WWID. multipath-tools go through
> various twists for this case to figure out "fallback" WWIDs, guessing
> whether that "fallback" matches what udev would have returned if it had
> worked.
> 
> That's the gist of it - the general frustration about udev among some
> of its heaviest users (talk to the LVM2 maintainers).
> 
> I suppose 99.9% of users never bother with customizing the udev rules.
> IOW, these users might as well just use a kernel-provided value. But
> the remaining 0.1% causes headaches for user-space applications, which
> can't make solid assumptions about the rules. Thus, in a way, the
> flexibility of the rules does more harm than it helps.
> 
> > We could do that in the kernel too, of course, but I'm afraid what
> > the
> > resulting BLIST changes would end up looking like over time.
> 
> That's something we want to avoid, sure.
> 
> But we can actually combine both approaches. If "wwid" yields a good
> value most of the time (which is true IMO), we could make user space
> rely on it by default, and make it possible to set an udev property
> (e.g. ENV{ID_LEGACY}="1") to tell udev rules to determine WWID
> differently. User-space apps like multipath could check the ID_LEGACY
> property to determine whether or not reading the "wwid" attribute would
> be consistent with udev. That would simplify matters a lot for us (Ben,
> do you agree?), without the need of adding endless BLIST entries.
> 

Yeah, as long as ID_LEGACY was changed in a careful manner, so WWIDs
didn't simply change without warning because of an upgrade, a path out
of this complexity is a definitely helpful.

-Ben

> 
> > I am also very concerned about changing what the kernel currently
> > exports in a given variable like "wwid". A seemingly innocuous change
> > to
> > the reported value could lead to a system no longer booting after
> > updating the kernel.
> 
> AFAICT, no major distribution uses "wwid" for this purpose (yet). I
> just recently realized that the kernel's ALUA code refers to it. (*)
> 
> In a recent discussion with Hannes, the idea came up that the priority
> of "SCSI name string" designators should actually depend on their
> subtype. "naa." name strings should map to the respective NAA
> descriptors, and "eui.", likewise (only "iqn." descriptors have no
> binary counterpart; we thought they should rather be put below NAA,
> prio-wise).
> 
> I wonder if you'd agree with a change made that way for "wwid". I
> suppose you don't. I'd then propose to add a new attribute following
> this logic. It could simply be an additional attribute with a different
> name. Or this new attribute could be a property of the block device
> rather than the SCSI device, like NVMe does it
> (/sys/block/nvme0n2/wwid).
> 
> I don't like the idea of having separate sysfs attributes for
> designators of different types, that's impractical for user space.
> 
> > But taking a step back: Other than "it's not what userland currently
> > does", what specifically is the problem with designator_prio()? We've
> > picked the priority list once and for all. If we promise never to
> > change
> > it, what is the issue?
> 
> If the prioritization in kernel and user space was the same, we could
> migrate away from udev more easily without risking boot failure.
> 
> Thanks,
> Martin
> 
> (*) which is an argument for using "wwid" in user space too - just to
> be consitent with the kernel's internal logic.
> 
> -- 
> Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
> SUSE Software Solutions Germa

Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-22 Thread Martin Wilck
On Wed, 2021-04-21 at 22:46 -0400, Martin K. Petersen wrote:
> 
> Martin,
> 
> > Hm, it sounds intriguing, but it has issues in its own right. For
> > years to come, user space will have to probe whether these attribute
> > exist, and fall back to the current ones ("wwid", "vpd_pg83")
> > otherwise. So user space can't be simplified any time soon. Speaking
> > for an important user space consumer of WWIDs (multipathd), I doubt
> > that this would improve matters for us. We'd be happy if the kernel
> > could just pick the "best" designator for us. But I understand that
> > the kernel can't guarantee a good choice (user space can't either).
> 
> But user space can be adapted at runtime to pick one designator over
> the
> other (ha!).

And that's exactly the problem. Effectively, all user space relies on
udev today, because that's where this "adaptation" is taking place. It
happens

 1) either in systemd's scsi_id built-in 
   
(https://github.com/systemd/systemd/blob/7feb1dd6544d1bf373dbe13dd33cf563ed16f891/src/udev/scsi_id/scsi_serial.c#L37)
 2) or in the udev rules coming with sg3_utils 
   
(https://github.com/hreinecke/sg3_utils/blob/master/scripts/55-scsi-sg3_id.rules)

1) is just as opaque and un-"adaptable" as the kernel, and the logic is
suboptimal. 2) is of course "adaptable", but that's a problem in
practice, if udev fails to provide a WWID. multipath-tools go through
various twists for this case to figure out "fallback" WWIDs, guessing
whether that "fallback" matches what udev would have returned if it had
worked.

That's the gist of it - the general frustration about udev among some
of its heaviest users (talk to the LVM2 maintainers).

I suppose 99.9% of users never bother with customizing the udev rules.
IOW, these users might as well just use a kernel-provided value. But
the remaining 0.1% causes headaches for user-space applications, which
can't make solid assumptions about the rules. Thus, in a way, the
flexibility of the rules does more harm than it helps.

> We could do that in the kernel too, of course, but I'm afraid what
> the
> resulting BLIST changes would end up looking like over time.

That's something we want to avoid, sure.

But we can actually combine both approaches. If "wwid" yields a good
value most of the time (which is true IMO), we could make user space
rely on it by default, and make it possible to set an udev property
(e.g. ENV{ID_LEGACY}="1") to tell udev rules to determine WWID
differently. User-space apps like multipath could check the ID_LEGACY
property to determine whether or not reading the "wwid" attribute would
be consistent with udev. That would simplify matters a lot for us (Ben,
do you agree?), without the need of adding endless BLIST entries.


> I am also very concerned about changing what the kernel currently
> exports in a given variable like "wwid". A seemingly innocuous change
> to
> the reported value could lead to a system no longer booting after
> updating the kernel.

AFAICT, no major distribution uses "wwid" for this purpose (yet). I
just recently realized that the kernel's ALUA code refers to it. (*)

In a recent discussion with Hannes, the idea came up that the priority
of "SCSI name string" designators should actually depend on their
subtype. "naa." name strings should map to the respective NAA
descriptors, and "eui.", likewise (only "iqn." descriptors have no
binary counterpart; we thought they should rather be put below NAA,
prio-wise).

I wonder if you'd agree with a change made that way for "wwid". I
suppose you don't. I'd then propose to add a new attribute following
this logic. It could simply be an additional attribute with a different
name. Or this new attribute could be a property of the block device
rather than the SCSI device, like NVMe does it
(/sys/block/nvme0n2/wwid).

I don't like the idea of having separate sysfs attributes for
designators of different types, that's impractical for user space.

> But taking a step back: Other than "it's not what userland currently
> does", what specifically is the problem with designator_prio()? We've
> picked the priority list once and for all. If we promise never to
> change
> it, what is the issue?

If the prioritization in kernel and user space was the same, we could
migrate away from udev more easily without risking boot failure.

Thanks,
Martin

(*) which is an argument for using "wwid" in user space too - just to
be consitent with the kernel's internal logic.

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-21 Thread Martin K. Petersen


Martin,

> Hm, it sounds intriguing, but it has issues in its own right. For
> years to come, user space will have to probe whether these attribute
> exist, and fall back to the current ones ("wwid", "vpd_pg83")
> otherwise. So user space can't be simplified any time soon. Speaking
> for an important user space consumer of WWIDs (multipathd), I doubt
> that this would improve matters for us. We'd be happy if the kernel
> could just pick the "best" designator for us. But I understand that
> the kernel can't guarantee a good choice (user space can't either).

But user space can be adapted at runtime to pick one designator over the
other (ha!).

We could do that in the kernel too, of course, but I'm afraid what the
resulting BLIST changes would end up looking like over time.

I am also very concerned about changing what the kernel currently
exports in a given variable like "wwid". A seemingly innocuous change to
the reported value could lead to a system no longer booting after
updating the kernel.

(Ignoring for a moment that some arrays will helpfully add a new ID
designator after a firmware upgrade and thus change what the kernel
reports. *sigh*)

> What is your idea how these new sysfs attributes should be named? Just
> enumerate, or name them by type somehow?

Up to you. Whatever you think would be easiest for userland to deal
with. I don't have a good feeling for how common vendor specific ones
are in practice. Things would obviously be easier if SCSI didn't have so
many choices :(

But taking a step back: Other than "it's not what userland currently
does", what specifically is the problem with designator_prio()? We've
picked the priority list once and for all. If we promise never to change
it, what is the issue?

-- 
Martin K. Petersen  Oracle Linux Engineering
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-16 Thread Martin Wilck
Hello Martin,

Sorry for the late response, still recovering from a week out of
office.

On Tue, 2021-04-06 at 00:47 -0400, Martin K. Petersen wrote:
> 
> Martin,
> 
> > The kernel's preference for type 8 designators (see below) is in
> > contrast with the established user space algorithms, which
> > determine
> > SCSI WWIDs on productive systems in practice. User space can try to
> > adapt to the kernel logic, but it will necessarily be a slow and
> > painful path if we want to avoid breaking user setups.
> 
> I was concerned when you changed the kernel prioritization a while
> back
> and I still don't think that we should tweak that code any further.

Ok.

> If the kernel picks one ID over another, that should be for the
> kernel's
> use. Letting the kernel decide which ID is best for userland is not a
> good approach.

Well, the kernel itself doesn't make any use of this property currently
(and user space doesn't much either, afaik).


> So I think my inclination would be to leave the current wwid as-is to
> avoid the risk of breaking things. And then export all ID descriptors
> reported in sysfs. Even though vpd83 is already exported in its
> entirety, I don't have any particular concerns about the individual
> values being exported separately. That makes many userland things so
> much easier. And I think the kernel is in a good position to
> disseminate
> information reported by the hardware.
> 
> This puts the prioritization entirely in the distro/udev/scripting
> domain. Taking the kernel out of the picture will make migration
> easier. And it allows a user to pick their descriptor of choice
> should a
> device report something completely unusable in type 8.

Hm, it sounds intriguing, but it has issues in its own right. For years
to come, user space will have to probe whether these attribute exist,
and fall back to the current ones ("wwid", "vpd_pg83") otherwise. So
user space can't be simplified any time soon. Speaking for an important
user space consumer of WWIDs (multipathd), I doubt that this would
improve matters for us. We'd be happy if the kernel could just pick the
"best" designator for us. But I understand that the kernel can't
guarantee a good choice (user space can't either).

What is your idea how these new sysfs attributes should be named? Just
enumerate, or name them by type somehow?

Thanks,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: one more time: SCSI device identification

2021-04-05 Thread Martin K. Petersen


Martin,

> The kernel's preference for type 8 designators (see below) is in
> contrast with the established user space algorithms, which determine
> SCSI WWIDs on productive systems in practice. User space can try to
> adapt to the kernel logic, but it will necessarily be a slow and
> painful path if we want to avoid breaking user setups.

I was concerned when you changed the kernel prioritization a while back
and I still don't think that we should tweak that code any further.

If the kernel picks one ID over another, that should be for the kernel's
use. Letting the kernel decide which ID is best for userland is not a
good approach.

So while I originally liked the idea of exposing a transport and
protocol agnostic wwid for each block device, I think that all the
descriptors and ID formats available in both SCSI and NVMe have shown
that that approach is fraught with peril.

Descriptors that provide "good uniqueness" on one device may be a
completely sub-optimal choice for another (zero-padded values, full of
spaces, vendors getting things wrong in general).

So I think my inclination would be to leave the current wwid as-is to
avoid the risk of breaking things. And then export all ID descriptors
reported in sysfs. Even though vpd83 is already exported in its
entirety, I don't have any particular concerns about the individual
values being exported separately. That makes many userland things so
much easier. And I think the kernel is in a good position to disseminate
information reported by the hardware.

This puts the prioritization entirely in the distro/udev/scripting
domain. Taking the kernel out of the picture will make migration
easier. And it allows a user to pick their descriptor of choice should a
device report something completely unusable in type 8.

-- 
Martin K. Petersen  Oracle Linux Engineering
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] RFC: one more time: SCSI device identification

2021-03-29 Thread Martin Wilck
Hello,

[sorry for cross-posting, I think this is relevant to multiple
communities.]

I'm referring to the recent discussion about SCSI device identification
for multipath-tools 
(https://listman.redhat.com/archives/dm-devel/2021-March/msg00332.html)

As you all know, there are different designators to identify SCSI LUNs,
and the specs don't mandate priorities for devices that support
multiple designator types. There are various implementations for device
identification, which use different priorities (summarized below).

It's highly desirable to clean up this confusion and settle on a single
instance and a unique priority order. I believe this instance should be
the kernel.

OTOH, changing device WWIDs is highly dangerous for productive systems.
The WWID is prominently used in multipath-tools, but also in lots of
other important places such as fstab, grub.cfg, dracut, etc. No doubt
that we'll be stuck with the different algorithms for years, especially
for LTS distributions. But perhaps we can figure out a long-term exit
strategy?

The kernel's preference for type 8 designators (see below) is in
contrast with the established user space algorithms, which determine
SCSI WWIDs on productive systems in practice. User space can try to
adapt to the kernel logic, but it will necessarily be a slow and
painful path if we want to avoid breaking user setups.

In principle, I believe the kernel is "right" to prefer type 8. But
because the "wwid" attribute isn't actually used for device
identification today, changing the kernel logic would be less prone to
regressions than changing user space, even if it violates the principle
that the kernel's user space API must remain stable.

Would it be an option to modify the kernel logic?

If we can't, I think we should start with making the "wwid" attribute
part of the udev rule logic, and letting distros configure whether the
kernel logic or the traditional udev logic would be used.

Please tell me your thoughts on this matter.

Regards,
Martin

PS: Incomplete list of algorithms for SCSI designator priorities:

The kernel ("wwid" sysfs attribute) prefers "SCSI name string" (type 8)
designators over other types
(https://elixir.bootlin.com/linux/latest/A/ident/designator_prio).

The current set of udev rules in sg3_utils
(https://github.com/hreinecke/sg3_utils/blob/master/scripts/55-scsi-sg3_id.rules)
don't use the kernel's wwid attribute; they parse VPD 83 and 80
instead and prioritize types 36, 35, 32, and 2 over type 8.

udev's "scsi_id" tool, historically the first attempt to implement a
priority for this, doesn't look at the SCSI name attribute at all:
https://github.com/systemd/systemd/blob/main/src/udev/scsi_id/scsi_serial.c

There's a "fallback" logic in multipath-tools in case udev doesn't
provide a WWID:
https://github.com/opensvc/multipath-tools/blob/a41a61e8482def33e3ca8c9e3639ad2c37611551/libmultipath/discovery.c#L1040

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel