Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
On Tue, Dec 22, 2020 at 09:06:04PM +, Alasdair G Kergon wrote: > I have not read the background about whatever the new problem is - I'm > jumping in cold seeing this message - but from the very beginning of > device-mapper we have strongly recommended that userspace supplies the > block device in the form MAJOR:MINOR and all our own tools do that. We > guarantee not to deadlock in these places when this is done. > > We also accept the device in the form of a path name as we know there > are times when this is safe and convenient, but then we offer no > guarantees - we place the responsibility upon userspace only to do this > when it knows it is safe to do so i.e. no race and no deadlock. 644bda6f346038bce7ad3ed48f7044c10dde6d47 changes that by accepting all kinds of weirdo formats :(
Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote: > Ok. The problem from my perspective is that device-mapper needs to > a) ensure that the arbitrary string passed in with the table definition > refers to a valid block device > and > b) the block device can be opened with O_EXCL, so that device-mapper can > then use it. > > Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted > the string to a 'dev_t' representation, and then the block device itself > was checked and opened in dm_get_table_device(). > 'lookup_bdev' was just being used to convert the path if the string was not > in the canonical major:minor format, as then it was assumed that it > referred to a block device node, and then lookup_bdev kinda makes sense. Yes, 644bda6f3460 is the cause of all evil, as it uses an API in a place where it should not be used. It and the prep patch (e6e20a7a5f3f49bfee518d5c6849107398d83912) which did the grave mistake of making name_to_dev_t available outside of the early init code really need to be reverted. > However, lookup_bdev() now always recurses into the filesystem, causing > multipath to stall in an all-paths-down scenario. lookup_bdev always did a file system lookup, and always only accepted a valid name in the file system. > Alternatively, if Mike says that only major:minor is the valid format for a > table definition we can kill that code completely. But clearly _I_ cannot > make the call here. Before 644bda6f3460 the table definitions only accepted a valid name in the file system. Which is the proper interface.
Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote: > However, lookup_bdev() now always recurses into the filesystem, causing > multipath to stall in an all-paths-down scenario. I have not read the background about whatever the new problem is - I'm jumping in cold seeing this message - but from the very beginning of device-mapper we have strongly recommended that userspace supplies the block device in the form MAJOR:MINOR and all our own tools do that. We guarantee not to deadlock in these places when this is done. We also accept the device in the form of a path name as we know there are times when this is safe and convenient, but then we offer no guarantees - we place the responsibility upon userspace only to do this when it knows it is safe to do so i.e. no race and no deadlock. Alasdair
Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
On 12/22/20 3:53 PM, Mike Snitzer wrote: [added linux-block and dm-devel, if someone replies to this email to continue "proper discussion" _please_ at least drop sfr and linux-next from Cc] On Tue, Dec 22 2020 at 8:15am -0500, Christoph Hellwig wrote: Mike, Hannes, I think this patch is rather harmful. Why does device mapper even mix file system path with a dev_t and all the other weird forms parsed by name_to_dev_t, which was supposed to be be for the early init code where no file system is available. OK, I'll need to revisit (unless someone beats me to it) because this could've easily been a blind-spot for me when the dm-init code went in. Any dm-init specific enabling interface shouldn't be used by more traditional DM interfaces. So Hannes' change might be treating symptom rather than the core problem (which would be better treated by factoring out dm-init requirements for a name_to_dev_t()-like interface?). DM has supported passing maj:min and blockdev names on DM table lines forever... so we'll need to be very specific about where/why things regressed. Ok. The problem from my perspective is that device-mapper needs to a) ensure that the arbitrary string passed in with the table definition refers to a valid block device and b) the block device can be opened with O_EXCL, so that device-mapper can then use it. Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted the string to a 'dev_t' representation, and then the block device itself was checked and opened in dm_get_table_device(). 'lookup_bdev' was just being used to convert the path if the string was not in the canonical major:minor format, as then it was assumed that it referred to a block device node, and then lookup_bdev kinda makes sense. However, lookup_bdev() now always recurses into the filesystem, causing multipath to stall in an all-paths-down scenario. So, the real issue is the table definiton; as it also accepts a device to be specified by the block device _node_ name, we need to have a way of converting that into a dev_t. If lookup_bdev() is the wrong interface for that, by all means, please, do tell me. I'd be happy to draft up a patch. Alternatively, if Mike says that only major:minor is the valid format for a table definition we can kill that code completely. But clearly _I_ cannot make the call here. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
[added linux-block and dm-devel, if someone replies to this email to continue "proper discussion" _please_ at least drop sfr and linux-next from Cc] On Tue, Dec 22 2020 at 8:15am -0500, Christoph Hellwig wrote: > Mike, Hannes, > > I think this patch is rather harmful. Why does device mapper even > mix file system path with a dev_t and all the other weird forms > parsed by name_to_dev_t, which was supposed to be be for the early > init code where no file system is available. OK, I'll need to revisit (unless someone beats me to it) because this could've easily been a blind-spot for me when the dm-init code went in. Any dm-init specific enabling interface shouldn't be used by more traditional DM interfaces. So Hannes' change might be treating symptom rather than the core problem (which would be better treated by factoring out dm-init requirements for a name_to_dev_t()-like interface?). DM has supported passing maj:min and blockdev names on DM table lines forever... so we'll need to be very specific about where/why things regressed. > Can we please kick off a proper discussion for this on the linux-block > list? Sure, done. But I won't drive that discussion in the near-term. I need to take some time off for a few weeks. In the meantime I'll drop Hannes' patch for 5.11; I'm open to an alternative fix that I'd pickup during 5.11-rcX. Thanks, Mike