Re: [PATCH v8 04/22] Change direct_access calling convention
On Thu, 2014-07-31 at 14:30 -0600, Ross Zwisler wrote: > On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote: > > On 07/31/2014 08:19 PM, Matthew Wilcox wrote: > > > On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote: > > >> Matthew what is your opinion about this, do we need to push for removal > > >> of the partition dead code which never worked for brd, or we need to push > > >> for fixing and implementing new partition support for brd? > > > > > > Fixing the code gets my vote. brd is useful for testing things ... and > > > sometimes we need to test things that involve partitions. > > > > > > > OK I'm on it, its what I'm doing today. > > > > rrr I manged to completely trash my vm by doing 'make install' of > > util-linux and after reboot it never recovered, I remember that > > mount complained about a now missing library and I forgot and rebooted, > > that was the end of that. Anyway I installed a new fc20 system wanted > > that for a long time over my old fc18 > > Ah, I'm already working on this as well. :) If you want you can wait for my > patches to BRD & test - they should be out this week. > > I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe. Ugh, it turns out that if you remove the "*part = 0" bit from brd_probe(), attempts to create new BRD devices using mknod hit a deadlock. Removal of that code, ie: @@ -550,7 +549,6 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data) kobj = brd ? get_disk(brd->brd_disk) : NULL; mutex_unlock(&brd_devices_mutex); - *part = 0; return kobj; } is necessary if we want to do any sort of partitioning. This isn't a use case for PRD, so I'll move over to that and try to add dynamic minors there instead. If we really do want partitions to work in BRD, though, eventually we'll have to debug the deadlock. - Ross
Re: [PATCH v8 04/22] Change direct_access calling convention
On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote: > On 07/31/2014 08:19 PM, Matthew Wilcox wrote: > > On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote: > >> Matthew what is your opinion about this, do we need to push for removal > >> of the partition dead code which never worked for brd, or we need to push > >> for fixing and implementing new partition support for brd? > > > > Fixing the code gets my vote. brd is useful for testing things ... and > > sometimes we need to test things that involve partitions. > > > > OK I'm on it, its what I'm doing today. > > rrr I manged to completely trash my vm by doing 'make install' of > util-linux and after reboot it never recovered, I remember that > mount complained about a now missing library and I forgot and rebooted, > that was the end of that. Anyway I installed a new fc20 system wanted > that for a long time over my old fc18 Ah, I'm already working on this as well. :) If you want you can wait for my patches to BRD & test - they should be out this week. I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe. - Ross N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/31/2014 08:19 PM, Matthew Wilcox wrote: > On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote: >> Matthew what is your opinion about this, do we need to push for removal >> of the partition dead code which never worked for brd, or we need to push >> for fixing and implementing new partition support for brd? > > Fixing the code gets my vote. brd is useful for testing things ... and > sometimes we need to test things that involve partitions. > OK I'm on it, its what I'm doing today. rrr I manged to completely trash my vm by doing 'make install' of util-linux and after reboot it never recovered, I remember that mount complained about a now missing library and I forgot and rebooted, that was the end of that. Anyway I installed a new fc20 system wanted that for a long time over my old fc18 >> Also another thing I saw is that if we leave the flag >> GENHD_FL_SUPPRESS_PARTITION_INFO >> >> then mount -U UUID stops to work, regardless of partitions or not, >> this is because Kernel will not put us on /proc/patitions. >> I'll submit another patch to remove it. > > Yes, we should probably fix that too. > Yes this is good stuff. I found out about the gpt option in fdisk that's really good stuff because it gives you a PARTUUID even before the mkfs, and the partitions are so mach more logical. But only without that flag >> BTW I hit another funny bug where the partition beginning was not >> 4K aligned apparently fdisk lets you do this if the total size is small >> enough (like 4096 which is default for brd) so I ended up with accessing >> sec zero, the supper-block, failing because of the alignment check at >> direct_access(). > > That's why I added on the partition start before doing the alignment > check :-) > Yes, exactly, I had very similar code to yours. I moved to your code now First patch in the set is your patch 4/22 squashed with the modifications you sent, then my fix, then the getgeo patch, then the remove of the flag. But I'm still fighting fdisk's sector math, I can't for the life of me figure out fdisk math, and it is all too easy to create a partition schema that has an unaligned first/last sector. I can observe and see the dis-alignment when the partitions are first created, I can detect that at prd_probe time. I can probably fix it by this logic: When first detecting a new partition ie if bd_part->start_sect is not aligned round-up to PAGE_SIZE. Then subtract from bd_part->nr_sects the fixed up size and round-down bd_part->nr_sects to PAGE_SIZE This way I still live inside the confined space that fdisk gave me but only IO within largest aligned space. The leftover sectors are just wasted space. >> Do you know of any API that brd/prd can do to not let fdisk do this? >> I'm looking at it right now I just thought it is worth asking. > > I think it's enough to refuse the mount. That feels like a patch to > ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different > block boundary?) > We should not leave this to the FSs to do again and again all over. I wonder if there is some getgeo or some disk properties info somewhere that I can set to force the core block layer to do this for me, I'm surprised that this is the first place we have this problem? Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote: > Matthew what is your opinion about this, do we need to push for removal > of the partition dead code which never worked for brd, or we need to push > for fixing and implementing new partition support for brd? Fixing the code gets my vote. brd is useful for testing things ... and sometimes we need to test things that involve partitions. > Also another thing I saw is that if we leave the flag > GENHD_FL_SUPPRESS_PARTITION_INFO > > then mount -U UUID stops to work, regardless of partitions or not, > this is because Kernel will not put us on /proc/patitions. > I'll submit another patch to remove it. Yes, we should probably fix that too. > BTW I hit another funny bug where the partition beginning was not > 4K aligned apparently fdisk lets you do this if the total size is small > enough (like 4096 which is default for brd) so I ended up with accessing > sec zero, the supper-block, failing because of the alignment check at > direct_access(). That's why I added on the partition start before doing the alignment check :-) > Do you know of any API that brd/prd can do to not let fdisk do this? > I'm looking at it right now I just thought it is worth asking. I think it's enough to refuse the mount. That feels like a patch to ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different block boundary?) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/31/2014 05:13 PM, Matthew Wilcox wrote: > On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote: > + if (size < 0) if(size < PAGE_SIZE), No? >>> >>> No, absolutely not. PAGE_SIZE is unsigned long, which (if I understand >>> my C integer promotions correctly) means that 'size' gets promoted to >>> an unsigned long, and we compare them unsigned, so errors will never be >>> caught by this check. >> >> Good point I agree that you need a cast ie. >> >> if(size < (long)PAGE_SIZE) >> >> The reason I'm saying this is because of a bug I actually hit when >> playing with partitioning and fdisk, it came out that the last partition's >> size was not page aligned, and code that checked for (< 0) crashed because >> prd returned the last two sectors of the partition, since your API is sector >> based this can happen for you here, before you are memseting a PAGE_SIZE >> you need to test there is space, No? > > Not in ext2/ext4. It requires block size == PAGE_SIZE, so it's never > going to request the last partial block in a partition. > OK cool. then. Matthew what is your opinion about this, do we need to push for removal of the partition dead code which never worked for brd, or we need to push for fixing and implementing new partition support for brd? Also another thing I saw is that if we leave the flag GENHD_FL_SUPPRESS_PARTITION_INFO then mount -U UUID stops to work, regardless of partitions or not, this is because Kernel will not put us on /proc/patitions. I'll submit another patch to remove it. BTW I hit another funny bug where the partition beginning was not 4K aligned apparently fdisk lets you do this if the total size is small enough (like 4096 which is default for brd) so I ended up with accessing sec zero, the supper-block, failing because of the alignment check at direct_access(). Do you know of any API that brd/prd can do to not let fdisk do this? I'm looking at it right now I just thought it is worth asking. Thanks for everything Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote: > >>> + if (size < 0) > >> > >>if(size < PAGE_SIZE), No? > > > > No, absolutely not. PAGE_SIZE is unsigned long, which (if I understand > > my C integer promotions correctly) means that 'size' gets promoted to > > an unsigned long, and we compare them unsigned, so errors will never be > > caught by this check. > > Good point I agree that you need a cast ie. > > if(size < (long)PAGE_SIZE) > > The reason I'm saying this is because of a bug I actually hit when > playing with partitioning and fdisk, it came out that the last partition's > size was not page aligned, and code that checked for (< 0) crashed because > prd returned the last two sectors of the partition, since your API is sector > based this can happen for you here, before you are memseting a PAGE_SIZE > you need to test there is space, No? Not in ext2/ext4. It requires block size == PAGE_SIZE, so it's never going to request the last partial block in a partition. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/30/2014 11:34 PM, Matthew Wilcox wrote: > On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote: >> Off course I was wrong here size is in bytes not in sectors. Which points >> out that maybe this API needs to be in sectors. >> >> [Actually it needs to be in pages both size and offset, because of return of >> pfn, but its your call.] > > I considered a number of options here. The VM wants things to be in pages. > The filesystem wants things to be in block size. The block device wants > things to be in sectors. It's all a mess when you try and converge. > Everybody understands bytes. > > Here's what I've come up with on top of patch 4/22. Let me know what > you think. > > > diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c > index 3ee1c08..741293f 100644 > --- a/arch/powerpc/sysdev/axonram.c > +++ b/arch/powerpc/sysdev/axonram.c > @@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, > sector_t sector, > struct axon_ram_bank *bank = device->bd_disk->private_data; > loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT; > > - if (offset >= bank->size) { > - dev_err(&bank->device->dev, "Access outside of address > space\n"); > - return -ERANGE; > - } > - > *kaddr = (void *)(bank->ph_addr + offset); > *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; > > diff --git a/drivers/block/brd.c b/drivers/block/brd.c > index 33a39e7..3483458 100644 > --- a/drivers/block/brd.c > +++ b/drivers/block/brd.c > @@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, > sector_t sector, > > if (!brd) > return -ENODEV; > - if (sector & (PAGE_SECTORS-1)) > - return -EINVAL; > - if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk)) > - return -ERANGE; > page = brd_insert_page(brd, sector); > if (!page) > return -ENOSPC; > diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c > index 58958d1..2ee5556 100644 > --- a/drivers/s390/block/dcssblk.c > +++ b/drivers/s390/block/dcssblk.c > @@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, > sector_t secnum, > if (!dev_info) > return -ENODEV; > dev_sz = dev_info->end - dev_info->start; > - if (secnum % (PAGE_SIZE/512)) > - return -EINVAL; > offset = secnum * 512; > - if (offset > dev_sz) > - return -ERANGE; > *kaddr = (void *) (dev_info->start + offset); > *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index f1a158e..93ebdd53 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, > sector_t sector, > const struct block_device_operations *ops = bdev->bd_disk->fops; > if (!ops->direct_access) > return -EOPNOTSUPP; > - return ops->direct_access(bdev, sector + get_start_sect(bdev), addr, > - pfn, size); > + if ((sector + DIV_ROUND_UP(size, 512)) > > + part_nr_sects_read(bdev->bd_part)) > + return -ERANGE; > + sector += get_start_sect(bdev); > + if (sector % (PAGE_SIZE / 512)) > + return -EINVAL; > + size = ops->direct_access(bdev, sector, addr, pfn, size); > + return size ? size : -ERANGE; > } > EXPORT_SYMBOL_GPL(bdev_direct_access); > > Very nice yes. Only the check at ext2 I think is missing. Once you send the complete patch I'll review it again. It would be very helpful if you send this patch a head of Q, even for 3.17. It will help with merging later I think Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/30/2014 10:45 PM, Matthew Wilcox wrote: <> >> +if (sector & (PAGE_SECTORS-1)) >> +return -EINVAL; > > Mmm. PAGE_SECTORS is private to brd (and also private to bcache!) at > this point. We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE, > SECTOR_SHIFT and so on, dotted throughout various random include files. > I am not the river to flush those Augean stables today. > > I'll go with this, from the dcssblk driver: > > if (sector % (PAGE_SIZE / 512)) > return -EINVAL; > Sigh, right, sure I did not mean to make that fight. Works as well <> >> Style: Need a space between declaration and code (have you check-patch) > > That's a bullshit check. I don't know why it's in checkpatch. > I did not invent the rules. But I do respect them. I think the merit of sticking to some common style is much higher then any particular style choice. Though this particular one I do like, because of the C rule that forces all declarations before code, so it makes it easier on the maintenance. In any way Maintainers are suppose to run checkpatch before submission, some do ;-) <> >>> + if (size < 0) >> >> if(size < PAGE_SIZE), No? > > No, absolutely not. PAGE_SIZE is unsigned long, which (if I understand > my C integer promotions correctly) means that 'size' gets promoted to > an unsigned long, and we compare them unsigned, so errors will never be > caught by this check. Good point I agree that you need a cast ie. if(size < (long)PAGE_SIZE) The reason I'm saying this is because of a bug I actually hit when playing with partitioning and fdisk, it came out that the last partition's size was not page aligned, and code that checked for (< 0) crashed because prd returned the last two sectors of the partition, since your API is sector based this can happen for you here, before you are memseting a PAGE_SIZE you need to test there is space, No? > > Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote: > Off course I was wrong here size is in bytes not in sectors. Which points > out that maybe this API needs to be in sectors. > > [Actually it needs to be in pages both size and offset, because of return of > pfn, but its your call.] I considered a number of options here. The VM wants things to be in pages. The filesystem wants things to be in block size. The block device wants things to be in sectors. It's all a mess when you try and converge. Everybody understands bytes. Here's what I've come up with on top of patch 4/22. Let me know what you think. diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c index 3ee1c08..741293f 100644 --- a/arch/powerpc/sysdev/axonram.c +++ b/arch/powerpc/sysdev/axonram.c @@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, sector_t sector, struct axon_ram_bank *bank = device->bd_disk->private_data; loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT; - if (offset >= bank->size) { - dev_err(&bank->device->dev, "Access outside of address space\n"); - return -ERANGE; - } - *kaddr = (void *)(bank->ph_addr + offset); *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 33a39e7..3483458 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector, if (!brd) return -ENODEV; - if (sector & (PAGE_SECTORS-1)) - return -EINVAL; - if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk)) - return -ERANGE; page = brd_insert_page(brd, sector); if (!page) return -ENOSPC; diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index 58958d1..2ee5556 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum, if (!dev_info) return -ENODEV; dev_sz = dev_info->end - dev_info->start; - if (secnum % (PAGE_SIZE/512)) - return -EINVAL; offset = secnum * 512; - if (offset > dev_sz) - return -ERANGE; *kaddr = (void *) (dev_info->start + offset); *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; diff --git a/fs/block_dev.c b/fs/block_dev.c index f1a158e..93ebdd53 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, sector_t sector, const struct block_device_operations *ops = bdev->bd_disk->fops; if (!ops->direct_access) return -EOPNOTSUPP; - return ops->direct_access(bdev, sector + get_start_sect(bdev), addr, - pfn, size); + if ((sector + DIV_ROUND_UP(size, 512)) > + part_nr_sects_read(bdev->bd_part)) + return -ERANGE; + sector += get_start_sect(bdev); + if (sector % (PAGE_SIZE / 512)) + return -EINVAL; + size = ops->direct_access(bdev, sector, addr, pfn, size); + return size ? size : -ERANGE; } EXPORT_SYMBOL_GPL(bdev_direct_access); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On Wed, Jul 30, 2014 at 07:03:24PM +0300, Boaz Harrosh wrote: > > +long bdev_direct_access(struct block_device *bdev, sector_t sector, > > + void **addr, unsigned long *pfn, long size) > > +{ > > + const struct block_device_operations *ops = bdev->bd_disk->fops; > > + if (!ops->direct_access) > > + return -EOPNOTSUPP; > > You need to check alignment on PAGE_SIZE since this API requires it, do > to pfn defined to a page_number. > > (Unless you want to define another output-param like page_offset. > but this exercise can be left to the caller) > > You also need to check against the partition boundary. so something like: > > + if (sector & (PAGE_SECTORS-1)) > + return -EINVAL; Mmm. PAGE_SECTORS is private to brd (and also private to bcache!) at this point. We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE, SECTOR_SHIFT and so on, dotted throughout various random include files. I am not the river to flush those Augean stables today. I'll go with this, from the dcssblk driver: if (sector % (PAGE_SIZE / 512)) return -EINVAL; > + if (unlikely(sector + size > part_nr_sects_read(bdev->bd_part))) > + return -ERANGE; > > Then perhaps you can remove that check from drivers As noted in your followup, size is in terms of bytes. Perhaps it should be named 'length' to make it more clear that it's a byte count, not a sector count? In any case, this looks best to me: if ((sector + DIV_ROUND_UP(size, 512)) > part_nr_sects_read(bdev->bd_part)) return -ERANGE; > Style: Need a space between declaration and code (have you check-patch) That's a bullshit check. I don't know why it's in checkpatch. > > + if (size < 0) > > if(size < PAGE_SIZE), No? No, absolutely not. PAGE_SIZE is unsigned long, which (if I understand my C integer promotions correctly) means that 'size' gets promoted to an unsigned long, and we compare them unsigned, so errors will never be caught by this check. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/30/2014 07:03 PM, Boaz Harrosh wrote: <> >> +long bdev_direct_access(struct block_device *bdev, sector_t sector, >> +void **addr, unsigned long *pfn, long size) >> +{ >> +const struct block_device_operations *ops = bdev->bd_disk->fops; >> +if (!ops->direct_access) >> +return -EOPNOTSUPP; > > You need to check alignment on PAGE_SIZE since this API requires it, do > to pfn defined to a page_number. > > (Unless you want to define another output-param like page_offset. > but this exercise can be left to the caller) > > You also need to check against the partition boundary. so something like: > > + if (sector & (PAGE_SECTORS-1)) > + return -EINVAL; > + if (unlikely(sector + size > part_nr_sects_read(bdev->bd_part))) Off course I was wrong here size is in bytes not in sectors. Which points out that maybe this API needs to be in sectors. [Actually it needs to be in pages both size and offset, because of return of pfn, but its your call.] Anyway my code above needs to be fixed with (size + 512 -1) / 512 Thanks Boaz > + return -ERANGE; > > Then perhaps you can remove that check from drivers > >> +return ops->direct_access(bdev, sector + get_start_sect(bdev), addr, >> +pfn, size); >> +} >> +EXPORT_SYMBOL_GPL(bdev_direct_access); <> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 04/22] Change direct_access calling convention
On 07/22/2014 10:47 PM, Matthew Wilcox wrote: > In order to support accesses to larger chunks of memory, pass in a > 'size' parameter (counted in bytes), and return the amount available at > that address. > > Support partitioning the underlying block device through a new helper > function, bdev_direct_access(), since partition handling should be done > in the block layer, not the filesystem, nor device driver. > > Signed-off-by: Matthew Wilcox > Reviewed-by: Jan Kara > --- > Documentation/filesystems/xip.txt | 15 +-- > arch/powerpc/sysdev/axonram.c | 12 > drivers/block/brd.c | 8 +--- > drivers/s390/block/dcssblk.c | 19 ++- > fs/block_dev.c| 28 > fs/ext2/xip.c | 31 +-- > include/linux/blkdev.h| 6 -- > 7 files changed, 73 insertions(+), 46 deletions(-) > > diff --git a/Documentation/filesystems/xip.txt > b/Documentation/filesystems/xip.txt > index 0466ee5..b62eabf 100644 > --- a/Documentation/filesystems/xip.txt > +++ b/Documentation/filesystems/xip.txt > @@ -28,12 +28,15 @@ Implementation > Execute-in-place is implemented in three steps: block device operation, > address space operation, and file operations. > > -A block device operation named direct_access is used to retrieve a > -reference (pointer) to a block on-disk. The reference is supposed to be > -cpu-addressable, physical address and remain valid until the release > operation > -is performed. A struct block_device reference is used to address the device, > -and a sector_t argument is used to identify the individual block. As an > -alternative, memory technology devices can be used for this. > +A block device operation named direct_access is used to translate the > +block device sector number to a page frame number (pfn) that identifies > +the physical page for the memory. It also returns a kernel virtual > +address that can be used to access the memory. > + > +The direct_access method takes a 'size' parameter that indicates the > +number of bytes being requested. The function should return the number > +of bytes that it can provide, although it must not exceed the number of > +bytes requested. It may also return a negative errno if an error occurs. > > The block device operation is optional, these block devices support it as of > today: > diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c > index 830edc8..3ee1c08 100644 > --- a/arch/powerpc/sysdev/axonram.c > +++ b/arch/powerpc/sysdev/axonram.c > @@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, > struct bio *bio) > * axon_ram_direct_access - direct_access() method for block device > * @device, @sector, @data: see block_device_operations method > */ > -static int > +static long > axon_ram_direct_access(struct block_device *device, sector_t sector, > -void **kaddr, unsigned long *pfn) > +void **kaddr, unsigned long *pfn, long size) > { > struct axon_ram_bank *bank = device->bd_disk->private_data; > - loff_t offset; > + loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT; > > - offset = sector; > - if (device->bd_part != NULL) > - offset += device->bd_part->start_sect; > - offset <<= AXON_RAM_SECTOR_SHIFT; > if (offset >= bank->size) { > dev_err(&bank->device->dev, "Access outside of address > space\n"); > return -ERANGE; > @@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, > sector_t sector, > *kaddr = (void *)(bank->ph_addr + offset); > *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; > > - return 0; > + return min_t(long, size, bank->size - offset); > } > > static const struct block_device_operations axon_ram_devops = { > diff --git a/drivers/block/brd.c b/drivers/block/brd.c > index c7d138e..96e4c96 100644 > --- a/drivers/block/brd.c > +++ b/drivers/block/brd.c > @@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, > sector_t sector, > } > > #ifdef CONFIG_BLK_DEV_XIP > -static int brd_direct_access(struct block_device *bdev, sector_t sector, > - void **kaddr, unsigned long *pfn) > +static long brd_direct_access(struct block_device *bdev, sector_t sector, > + void **kaddr, unsigned long *pfn, long size) > { > struct brd_device *brd = bdev->bd_disk->private_data; > struct page *page; Mathew hi You need to remove the wrong checks both alignment and size from this driver. (alignment is redundant but size is wrong) These checks should be guarantied by the blk wrapper. (see below) > @@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, > sector_t sector, > *kaddr = page_address(page); > *pfn = page_to_pfn(page); > > - return 0; > + /* Could optimistically check to
[PATCH v8 04/22] Change direct_access calling convention
In order to support accesses to larger chunks of memory, pass in a 'size' parameter (counted in bytes), and return the amount available at that address. Support partitioning the underlying block device through a new helper function, bdev_direct_access(), since partition handling should be done in the block layer, not the filesystem, nor device driver. Signed-off-by: Matthew Wilcox Reviewed-by: Jan Kara --- Documentation/filesystems/xip.txt | 15 +-- arch/powerpc/sysdev/axonram.c | 12 drivers/block/brd.c | 8 +--- drivers/s390/block/dcssblk.c | 19 ++- fs/block_dev.c| 28 fs/ext2/xip.c | 31 +-- include/linux/blkdev.h| 6 -- 7 files changed, 73 insertions(+), 46 deletions(-) diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt index 0466ee5..b62eabf 100644 --- a/Documentation/filesystems/xip.txt +++ b/Documentation/filesystems/xip.txt @@ -28,12 +28,15 @@ Implementation Execute-in-place is implemented in three steps: block device operation, address space operation, and file operations. -A block device operation named direct_access is used to retrieve a -reference (pointer) to a block on-disk. The reference is supposed to be -cpu-addressable, physical address and remain valid until the release operation -is performed. A struct block_device reference is used to address the device, -and a sector_t argument is used to identify the individual block. As an -alternative, memory technology devices can be used for this. +A block device operation named direct_access is used to translate the +block device sector number to a page frame number (pfn) that identifies +the physical page for the memory. It also returns a kernel virtual +address that can be used to access the memory. + +The direct_access method takes a 'size' parameter that indicates the +number of bytes being requested. The function should return the number +of bytes that it can provide, although it must not exceed the number of +bytes requested. It may also return a negative errno if an error occurs. The block device operation is optional, these block devices support it as of today: diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c index 830edc8..3ee1c08 100644 --- a/arch/powerpc/sysdev/axonram.c +++ b/arch/powerpc/sysdev/axonram.c @@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio) * axon_ram_direct_access - direct_access() method for block device * @device, @sector, @data: see block_device_operations method */ -static int +static long axon_ram_direct_access(struct block_device *device, sector_t sector, - void **kaddr, unsigned long *pfn) + void **kaddr, unsigned long *pfn, long size) { struct axon_ram_bank *bank = device->bd_disk->private_data; - loff_t offset; + loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT; - offset = sector; - if (device->bd_part != NULL) - offset += device->bd_part->start_sect; - offset <<= AXON_RAM_SECTOR_SHIFT; if (offset >= bank->size) { dev_err(&bank->device->dev, "Access outside of address space\n"); return -ERANGE; @@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, sector_t sector, *kaddr = (void *)(bank->ph_addr + offset); *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; - return 0; + return min_t(long, size, bank->size - offset); } static const struct block_device_operations axon_ram_devops = { diff --git a/drivers/block/brd.c b/drivers/block/brd.c index c7d138e..96e4c96 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector, } #ifdef CONFIG_BLK_DEV_XIP -static int brd_direct_access(struct block_device *bdev, sector_t sector, - void **kaddr, unsigned long *pfn) +static long brd_direct_access(struct block_device *bdev, sector_t sector, + void **kaddr, unsigned long *pfn, long size) { struct brd_device *brd = bdev->bd_disk->private_data; struct page *page; @@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector, *kaddr = page_address(page); *pfn = page_to_pfn(page); - return 0; + /* Could optimistically check to see if the next page in the +* file is mapped to the next page of physical RAM */ + return min_t(long, PAGE_SIZE, size); } #endif diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index 0f47175..58958d1 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -28,8 +28,8 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode); static