Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-23 Thread Christoph Hellwig
On Tue, Dec 22, 2020 at 09:06:04PM +, Alasdair G Kergon wrote:
> I have not read the background about whatever the new problem is - I'm
> jumping in cold seeing this message - but from the very beginning of
> device-mapper we have strongly recommended that userspace supplies the
> block device in the form MAJOR:MINOR and all our own tools do that.  We
> guarantee not to deadlock in these places when this is done.
> 
> We also accept the device in the form of a path name as we know there
> are times when this is safe and convenient, but then we offer no
> guarantees - we place the responsibility upon userspace only to do this
> when it knows it is safe to do so i.e. no race and no deadlock.

644bda6f346038bce7ad3ed48f7044c10dde6d47 changes that by accepting all
kinds of weirdo formats :(


Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-23 Thread Christoph Hellwig
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> Ok. The problem from my perspective is that device-mapper needs to
> a) ensure that the arbitrary string passed in with the table definition 
> refers to a valid block device
> and
> b) the block device can be opened with O_EXCL, so that device-mapper can 
> then use it.
>
> Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted 
> the string to a 'dev_t' representation, and then the block device itself 
> was checked and opened in dm_get_table_device().
> 'lookup_bdev' was just being used to convert the path if the string was not 
> in the canonical major:minor format, as then it was assumed that it 
> referred to a block device node, and then lookup_bdev kinda makes sense.

Yes, 644bda6f3460 is the cause of all evil, as it uses an API in a place
where it should not be used.  It and the prep patch
(e6e20a7a5f3f49bfee518d5c6849107398d83912) which did the grave mistake
of making name_to_dev_t available outside of the early init code really
need to be reverted.

> However, lookup_bdev() now always recurses into the filesystem, causing 
> multipath to stall in an all-paths-down scenario.

lookup_bdev always did a file system lookup, and always only accepted
a valid name in the file system.

> Alternatively, if Mike says that only major:minor is the valid format for a 
> table definition we can kill that code completely. But clearly _I_ cannot 
> make the call here.

Before 644bda6f3460 the table definitions only accepted a valid name in
the file system.  Which is the proper interface.


Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-22 Thread Alasdair G Kergon
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> However, lookup_bdev() now always recurses into the filesystem, causing 
> multipath to stall in an all-paths-down scenario.
 
I have not read the background about whatever the new problem is - I'm
jumping in cold seeing this message - but from the very beginning of
device-mapper we have strongly recommended that userspace supplies the
block device in the form MAJOR:MINOR and all our own tools do that.  We
guarantee not to deadlock in these places when this is done.

We also accept the device in the form of a path name as we know there
are times when this is safe and convenient, but then we offer no
guarantees - we place the responsibility upon userspace only to do this
when it knows it is safe to do so i.e. no race and no deadlock.

Alasdair



Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-22 Thread Hannes Reinecke

On 12/22/20 3:53 PM, Mike Snitzer wrote:

[added linux-block and dm-devel, if someone replies to this email to
continue "proper discussion" _please_ at least drop sfr and linux-next
from Cc]

On Tue, Dec 22 2020 at  8:15am -0500,
Christoph Hellwig  wrote:


Mike, Hannes,

I think this patch is rather harmful.  Why does device mapper even
mix file system path with a dev_t and all the other weird forms
parsed by name_to_dev_t, which was supposed to be be for the early
init code where no file system is available.


OK, I'll need to revisit (unless someone beats me to it) because this
could've easily been a blind-spot for me when the dm-init code went in.
Any dm-init specific enabling interface shouldn't be used by more
traditional DM interfaces.  So Hannes' change might be treating symptom
rather than the core problem (which would be better treated by factoring
out dm-init requirements for a name_to_dev_t()-like interface?).

DM has supported passing maj:min and blockdev names on DM table lines
forever... so we'll need to be very specific about where/why things
regressed.



Ok. The problem from my perspective is that device-mapper needs to
a) ensure that the arbitrary string passed in with the table definition 
refers to a valid block device

and
b) the block device can be opened with O_EXCL, so that device-mapper can 
then use it.


Originally (ie prior to commit 644bda6f3460) dm_get_device() just 
converted the string to a 'dev_t' representation, and then the block 
device itself was checked and opened in dm_get_table_device().
'lookup_bdev' was just being used to convert the path if the string was 
not in the canonical major:minor format, as then it was assumed that it 
referred to a block device node, and then lookup_bdev kinda makes sense.


However, lookup_bdev() now always recurses into the filesystem, causing 
multipath to stall in an all-paths-down scenario.


So, the real issue is the table definiton; as it also accepts a device 
to be specified by the block device _node_ name, we need to have a way 
of converting that into a dev_t.


If lookup_bdev() is the wrong interface for that, by all means, please, 
do tell me. I'd be happy to draft up a patch.


Alternatively, if Mike says that only major:minor is the valid format 
for a table definition we can kill that code completely. But clearly _I_ 
cannot make the call here.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-22 Thread Mike Snitzer
[added linux-block and dm-devel, if someone replies to this email to
continue "proper discussion" _please_ at least drop sfr and linux-next
from Cc]

On Tue, Dec 22 2020 at  8:15am -0500,
Christoph Hellwig  wrote:

> Mike, Hannes,
> 
> I think this patch is rather harmful.  Why does device mapper even
> mix file system path with a dev_t and all the other weird forms
> parsed by name_to_dev_t, which was supposed to be be for the early
> init code where no file system is available.

OK, I'll need to revisit (unless someone beats me to it) because this
could've easily been a blind-spot for me when the dm-init code went in.
Any dm-init specific enabling interface shouldn't be used by more
traditional DM interfaces.  So Hannes' change might be treating symptom
rather than the core problem (which would be better treated by factoring
out dm-init requirements for a name_to_dev_t()-like interface?).

DM has supported passing maj:min and blockdev names on DM table lines
forever... so we'll need to be very specific about where/why things
regressed.

> Can we please kick off a proper discussion for this on the linux-block
> list?

Sure, done. But I won't drive that discussion in the near-term. I need
to take some time off for a few weeks.

In the meantime I'll drop Hannes' patch for 5.11; I'm open to an
alternative fix that I'd pickup during 5.11-rcX.

Thanks,
Mike