Re: [PATCH v8 04/22] Change direct_access calling convention

2014-08-01 Thread Zwisler, Ross
On Thu, 2014-07-31 at 14:30 -0600, Ross Zwisler wrote:
> On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote:
> > On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
> > > On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
> > >> Matthew what is your opinion about this, do we need to push for removal
> > >> of the partition dead code which never worked for brd, or we need to push
> > >> for fixing and implementing new partition support for brd?
> > > 
> > > Fixing the code gets my vote.  brd is useful for testing things ... and
> > > sometimes we need to test things that involve partitions.
> > > 
> > 
> > OK I'm on it, its what I'm doing today.
> > 
> > rrr I manged to completely trash my vm by doing 'make install' of
> > util-linux and after reboot it never recovered, I remember that
> > mount complained about a now missing library and I forgot and rebooted,
> > that was the end of that. Anyway I installed a new fc20 system wanted
> > that for a long time over my old fc18
> 
> Ah, I'm already working on this as well.  :)  If you want you can wait for my
> patches to BRD & test - they should be out this week.
> 
> I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe.

Ugh, it turns out that if you remove the "*part = 0" bit from brd_probe(),
attempts to create new BRD devices using mknod hit a deadlock.  Removal of
that code, ie:

@@ -550,7 +549,6 @@ static struct kobject *brd_probe(dev_t dev, int *part, void 
*data)
kobj = brd ? get_disk(brd->brd_disk) : NULL;
mutex_unlock(_devices_mutex);

-   *part = 0;
return kobj;
 }

is necessary if we want to do any sort of partitioning.

This isn't a use case for PRD, so I'll move over to that and try to add
dynamic minors there instead.  If we really do want partitions to work in BRD,
though, eventually we'll have to debug the deadlock.

- Ross



Re: [PATCH v8 04/22] Change direct_access calling convention

2014-08-01 Thread Zwisler, Ross
On Thu, 2014-07-31 at 14:30 -0600, Ross Zwisler wrote:
 On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote:
  On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
   On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
   Matthew what is your opinion about this, do we need to push for removal
   of the partition dead code which never worked for brd, or we need to push
   for fixing and implementing new partition support for brd?
   
   Fixing the code gets my vote.  brd is useful for testing things ... and
   sometimes we need to test things that involve partitions.
   
  
  OK I'm on it, its what I'm doing today.
  
  rrr I manged to completely trash my vm by doing 'make install' of
  util-linux and after reboot it never recovered, I remember that
  mount complained about a now missing library and I forgot and rebooted,
  that was the end of that. Anyway I installed a new fc20 system wanted
  that for a long time over my old fc18
 
 Ah, I'm already working on this as well.  :)  If you want you can wait for my
 patches to BRD  test - they should be out this week.
 
 I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe.

Ugh, it turns out that if you remove the *part = 0 bit from brd_probe(),
attempts to create new BRD devices using mknod hit a deadlock.  Removal of
that code, ie:

@@ -550,7 +549,6 @@ static struct kobject *brd_probe(dev_t dev, int *part, void 
*data)
kobj = brd ? get_disk(brd-brd_disk) : NULL;
mutex_unlock(brd_devices_mutex);

-   *part = 0;
return kobj;
 }

is necessary if we want to do any sort of partitioning.

This isn't a use case for PRD, so I'll move over to that and try to add
dynamic minors there instead.  If we really do want partitions to work in BRD,
though, eventually we'll have to debug the deadlock.

- Ross



Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Zwisler, Ross
On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote:
> On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
> > On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
> >> Matthew what is your opinion about this, do we need to push for removal
> >> of the partition dead code which never worked for brd, or we need to push
> >> for fixing and implementing new partition support for brd?
> > 
> > Fixing the code gets my vote.  brd is useful for testing things ... and
> > sometimes we need to test things that involve partitions.
> > 
> 
> OK I'm on it, its what I'm doing today.
> 
> rrr I manged to completely trash my vm by doing 'make install' of
> util-linux and after reboot it never recovered, I remember that
> mount complained about a now missing library and I forgot and rebooted,
> that was the end of that. Anyway I installed a new fc20 system wanted
> that for a long time over my old fc18

Ah, I'm already working on this as well.  :)  If you want you can wait for my
patches to BRD & test - they should be out this week.

I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe.

- Ross

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
> On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
>> Matthew what is your opinion about this, do we need to push for removal
>> of the partition dead code which never worked for brd, or we need to push
>> for fixing and implementing new partition support for brd?
> 
> Fixing the code gets my vote.  brd is useful for testing things ... and
> sometimes we need to test things that involve partitions.
> 

OK I'm on it, its what I'm doing today.

rrr I manged to completely trash my vm by doing 'make install' of
util-linux and after reboot it never recovered, I remember that
mount complained about a now missing library and I forgot and rebooted,
that was the end of that. Anyway I installed a new fc20 system wanted
that for a long time over my old fc18

>> Also another thing I saw is that if we leave the flag 
>>  GENHD_FL_SUPPRESS_PARTITION_INFO
>>
>> then mount -U UUID stops to work, regardless of partitions or not,
>> this is because Kernel will not put us on /proc/patitions.
>> I'll submit another patch to remove it.
> 
> Yes, we should probably fix that too.
> 

Yes this is good stuff. I found out about the gpt option in fdisk
that's really good stuff because it gives you a PARTUUID even before
the mkfs, and the partitions are so mach more logical. 
But only without that flag

>> BTW I hit another funny bug where the partition beginning was not
>> 4K aligned apparently fdisk lets you do this if the total size is small
>> enough  (like 4096 which is default for brd) so I ended up with accessing
>> sec zero, the supper-block, failing because of the alignment check at
>> direct_access().
> 
> That's why I added on the partition start before doing the alignment
> check :-)
> 
Yes, exactly, I had very similar code to yours. I moved to your code
now First patch in the set is your patch 4/22 squashed with the modifications
you sent, then my fix, then the getgeo patch, then the remove of the flag.

But I'm still fighting fdisk's sector math, I can't for the life of me
figure out fdisk math, and it is all too easy to create a partition schema
that has an unaligned first/last sector.

I can observe and see the dis-alignment when the partitions are first
created, I can detect that at prd_probe time.

I can probably fix it by this logic:
  When first detecting a new partition ie if bd_part->start_sect
is not aligned round-up to PAGE_SIZE. Then subtract from bd_part->nr_sects the
fixed up size and round-down bd_part->nr_sects to PAGE_SIZE
This way I still live inside the confined space that fdisk gave me but only IO
within largest aligned space. The leftover sectors are just wasted space.


>> Do you know of any API that brd/prd can do to not let fdisk do this?
>> I'm looking at it right now I just thought it is worth asking.
> 
> I think it's enough to refuse the mount.  That feels like a patch to
> ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different
> block boundary?)
> 

We should not leave this to the FSs to do again and again all over. I wonder
if there is some getgeo or some disk properties info somewhere that I can
set to force the core block layer to do this for me, I'm surprised that this
is the first place we have this problem?

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Matthew Wilcox
On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
> Matthew what is your opinion about this, do we need to push for removal
> of the partition dead code which never worked for brd, or we need to push
> for fixing and implementing new partition support for brd?

Fixing the code gets my vote.  brd is useful for testing things ... and
sometimes we need to test things that involve partitions.

> Also another thing I saw is that if we leave the flag 
>   GENHD_FL_SUPPRESS_PARTITION_INFO
> 
> then mount -U UUID stops to work, regardless of partitions or not,
> this is because Kernel will not put us on /proc/patitions.
> I'll submit another patch to remove it.

Yes, we should probably fix that too.

> BTW I hit another funny bug where the partition beginning was not
> 4K aligned apparently fdisk lets you do this if the total size is small
> enough  (like 4096 which is default for brd) so I ended up with accessing
> sec zero, the supper-block, failing because of the alignment check at
> direct_access().

That's why I added on the partition start before doing the alignment
check :-)

> Do you know of any API that brd/prd can do to not let fdisk do this?
> I'm looking at it right now I just thought it is worth asking.

I think it's enough to refuse the mount.  That feels like a patch to
ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different
block boundary?)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/31/2014 05:13 PM, Matthew Wilcox wrote:
> On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote:
> + if (size < 0)

if(size < PAGE_SIZE), No?
>>>
>>> No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
>>> my C integer promotions correctly) means that 'size' gets promoted to
>>> an unsigned long, and we compare them unsigned, so errors will never be
>>> caught by this check.
>>
>> Good point I agree that you need a cast ie.
>>
>>  if(size < (long)PAGE_SIZE)
>>
>> The reason I'm saying this is because of a bug I actually hit when
>> playing with partitioning and fdisk, it came out that the last partition's
>> size was not page aligned, and code that checked for (< 0) crashed because
>> prd returned the last two sectors of the partition, since your API is sector
>> based this can happen for you here, before you are memseting a PAGE_SIZE
>> you need to test there is space, No? 
> 
> Not in ext2/ext4.  It requires block size == PAGE_SIZE, so it's never
> going to request the last partial block in a partition.
> 

OK cool. then.

Matthew what is your opinion about this, do we need to push for removal
of the partition dead code which never worked for brd, or we need to push
for fixing and implementing new partition support for brd?

Also another thing I saw is that if we leave the flag 
GENHD_FL_SUPPRESS_PARTITION_INFO

then mount -U UUID stops to work, regardless of partitions or not,
this is because Kernel will not put us on /proc/patitions.
I'll submit another patch to remove it.

BTW I hit another funny bug where the partition beginning was not
4K aligned apparently fdisk lets you do this if the total size is small
enough  (like 4096 which is default for brd) so I ended up with accessing
sec zero, the supper-block, failing because of the alignment check at
direct_access().
Do you know of any API that brd/prd can do to not let fdisk do this?
I'm looking at it right now I just thought it is worth asking.

Thanks for everything
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Matthew Wilcox
On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote:
> >>> + if (size < 0)
> >>
> >>if(size < PAGE_SIZE), No?
> > 
> > No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
> > my C integer promotions correctly) means that 'size' gets promoted to
> > an unsigned long, and we compare them unsigned, so errors will never be
> > caught by this check.
> 
> Good point I agree that you need a cast ie.
> 
>   if(size < (long)PAGE_SIZE)
> 
> The reason I'm saying this is because of a bug I actually hit when
> playing with partitioning and fdisk, it came out that the last partition's
> size was not page aligned, and code that checked for (< 0) crashed because
> prd returned the last two sectors of the partition, since your API is sector
> based this can happen for you here, before you are memseting a PAGE_SIZE
> you need to test there is space, No? 

Not in ext2/ext4.  It requires block size == PAGE_SIZE, so it's never
going to request the last partial block in a partition.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/30/2014 11:34 PM, Matthew Wilcox wrote:
> On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote:
>> Off course I was wrong here size is in bytes not in sectors. Which points
>> out that maybe this API needs to be in sectors.
>>
>> [Actually it needs to be in pages both size and offset, because of return of
>> pfn, but its your call.]
> 
> I considered a number of options here.  The VM wants things to be in pages.
> The filesystem wants things to be in block size.  The block device wants
> things to be in sectors.  It's all a mess when you try and converge.
> Everybody understands bytes.
> 
> Here's what I've come up with on top of patch 4/22.  Let me know what
> you think.
> 
> 
> diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
> index 3ee1c08..741293f 100644
> --- a/arch/powerpc/sysdev/axonram.c
> +++ b/arch/powerpc/sysdev/axonram.c
> @@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, 
> sector_t sector,
>   struct axon_ram_bank *bank = device->bd_disk->private_data;
>   loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
>  
> - if (offset >= bank->size) {
> - dev_err(>device->dev, "Access outside of address 
> space\n");
> - return -ERANGE;
> - }
> -
>   *kaddr = (void *)(bank->ph_addr + offset);
>   *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
>  
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index 33a39e7..3483458 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, 
> sector_t sector,
>  
>   if (!brd)
>   return -ENODEV;
> - if (sector & (PAGE_SECTORS-1))
> - return -EINVAL;
> - if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
> - return -ERANGE;
>   page = brd_insert_page(brd, sector);
>   if (!page)
>   return -ENOSPC;
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 58958d1..2ee5556 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, 
> sector_t secnum,
>   if (!dev_info)
>   return -ENODEV;
>   dev_sz = dev_info->end - dev_info->start;
> - if (secnum % (PAGE_SIZE/512))
> - return -EINVAL;
>   offset = secnum * 512;
> - if (offset > dev_sz)
> - return -ERANGE;
>   *kaddr = (void *) (dev_info->start + offset);
>   *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
>  
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index f1a158e..93ebdd53 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, 
> sector_t sector,
>   const struct block_device_operations *ops = bdev->bd_disk->fops;
>   if (!ops->direct_access)
>   return -EOPNOTSUPP;
> - return ops->direct_access(bdev, sector + get_start_sect(bdev), addr,
> - pfn, size);
> + if ((sector + DIV_ROUND_UP(size, 512)) >
> + part_nr_sects_read(bdev->bd_part))
> + return -ERANGE;
> + sector += get_start_sect(bdev);
> + if (sector % (PAGE_SIZE / 512))
> + return -EINVAL;
> + size = ops->direct_access(bdev, sector, addr, pfn, size);
> + return size ? size : -ERANGE;
>  }
>  EXPORT_SYMBOL_GPL(bdev_direct_access);
>  
> 

Very nice yes.

Only the check at ext2 I think is missing. Once you send the complete patch
I'll review it again. It would be very helpful if you send this patch a head
of Q, even for 3.17. It will help with merging later I think

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/30/2014 10:45 PM, Matthew Wilcox wrote:
<>
>> +if (sector & (PAGE_SECTORS-1))
>> +return -EINVAL;
> 
> Mmm.  PAGE_SECTORS is private to brd (and also private to bcache!) at
> this point.  We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE,
> SECTOR_SHIFT and so on, dotted throughout various random include files.
> I am not the river to flush those Augean stables today.
> 
> I'll go with this, from the dcssblk driver:
> 
> if (sector % (PAGE_SIZE / 512))
> return -EINVAL;
> 

Sigh, right, sure I did not mean to make that fight. Works as well

<>
>> Style: Need a space between declaration and code (have you check-patch)
> 
> That's a bullshit check.  I don't know why it's in checkpatch.
> 

I did not invent the rules. But I do respect them. I think the merit
of sticking to some common style is much higher then any particular
style choice. Though this particular one I do like, because of the
C rule that forces all declarations before code, so it makes it easier
on the maintenance. In any way Maintainers are suppose to run checkpatch
before submission, some do ;-)

<>
>>> +   if (size < 0)
>>
>>  if(size < PAGE_SIZE), No?
> 
> No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
> my C integer promotions correctly) means that 'size' gets promoted to
> an unsigned long, and we compare them unsigned, so errors will never be
> caught by this check.

Good point I agree that you need a cast ie.

if(size < (long)PAGE_SIZE)

The reason I'm saying this is because of a bug I actually hit when
playing with partitioning and fdisk, it came out that the last partition's
size was not page aligned, and code that checked for (< 0) crashed because
prd returned the last two sectors of the partition, since your API is sector
based this can happen for you here, before you are memseting a PAGE_SIZE
you need to test there is space, No? 

> 
> 

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/30/2014 10:45 PM, Matthew Wilcox wrote:

 +if (sector  (PAGE_SECTORS-1))
 +return -EINVAL;
 
 Mmm.  PAGE_SECTORS is private to brd (and also private to bcache!) at
 this point.  We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE,
 SECTOR_SHIFT and so on, dotted throughout various random include files.
 I am not the river to flush those Augean stables today.
 
 I'll go with this, from the dcssblk driver:
 
 if (sector % (PAGE_SIZE / 512))
 return -EINVAL;
 

Sigh, right, sure I did not mean to make that fight. Works as well


 Style: Need a space between declaration and code (have you check-patch)
 
 That's a bullshit check.  I don't know why it's in checkpatch.
 

I did not invent the rules. But I do respect them. I think the merit
of sticking to some common style is much higher then any particular
style choice. Though this particular one I do like, because of the
C rule that forces all declarations before code, so it makes it easier
on the maintenance. In any way Maintainers are suppose to run checkpatch
before submission, some do ;-)


 +   if (size  0)

  if(size  PAGE_SIZE), No?
 
 No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
 my C integer promotions correctly) means that 'size' gets promoted to
 an unsigned long, and we compare them unsigned, so errors will never be
 caught by this check.

Good point I agree that you need a cast ie.

if(size  (long)PAGE_SIZE)

The reason I'm saying this is because of a bug I actually hit when
playing with partitioning and fdisk, it came out that the last partition's
size was not page aligned, and code that checked for ( 0) crashed because
prd returned the last two sectors of the partition, since your API is sector
based this can happen for you here, before you are memseting a PAGE_SIZE
you need to test there is space, No? 

 
 

Thanks
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/30/2014 11:34 PM, Matthew Wilcox wrote:
 On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote:
 Off course I was wrong here size is in bytes not in sectors. Which points
 out that maybe this API needs to be in sectors.

 [Actually it needs to be in pages both size and offset, because of return of
 pfn, but its your call.]
 
 I considered a number of options here.  The VM wants things to be in pages.
 The filesystem wants things to be in block size.  The block device wants
 things to be in sectors.  It's all a mess when you try and converge.
 Everybody understands bytes.
 
 Here's what I've come up with on top of patch 4/22.  Let me know what
 you think.
 
 
 diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
 index 3ee1c08..741293f 100644
 --- a/arch/powerpc/sysdev/axonram.c
 +++ b/arch/powerpc/sysdev/axonram.c
 @@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, 
 sector_t sector,
   struct axon_ram_bank *bank = device-bd_disk-private_data;
   loff_t offset = (loff_t)sector  AXON_RAM_SECTOR_SHIFT;
  
 - if (offset = bank-size) {
 - dev_err(bank-device-dev, Access outside of address 
 space\n);
 - return -ERANGE;
 - }
 -
   *kaddr = (void *)(bank-ph_addr + offset);
   *pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
  
 diff --git a/drivers/block/brd.c b/drivers/block/brd.c
 index 33a39e7..3483458 100644
 --- a/drivers/block/brd.c
 +++ b/drivers/block/brd.c
 @@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, 
 sector_t sector,
  
   if (!brd)
   return -ENODEV;
 - if (sector  (PAGE_SECTORS-1))
 - return -EINVAL;
 - if (sector + PAGE_SECTORS  get_capacity(bdev-bd_disk))
 - return -ERANGE;
   page = brd_insert_page(brd, sector);
   if (!page)
   return -ENOSPC;
 diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
 index 58958d1..2ee5556 100644
 --- a/drivers/s390/block/dcssblk.c
 +++ b/drivers/s390/block/dcssblk.c
 @@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, 
 sector_t secnum,
   if (!dev_info)
   return -ENODEV;
   dev_sz = dev_info-end - dev_info-start;
 - if (secnum % (PAGE_SIZE/512))
 - return -EINVAL;
   offset = secnum * 512;
 - if (offset  dev_sz)
 - return -ERANGE;
   *kaddr = (void *) (dev_info-start + offset);
   *pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
  
 diff --git a/fs/block_dev.c b/fs/block_dev.c
 index f1a158e..93ebdd53 100644
 --- a/fs/block_dev.c
 +++ b/fs/block_dev.c
 @@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, 
 sector_t sector,
   const struct block_device_operations *ops = bdev-bd_disk-fops;
   if (!ops-direct_access)
   return -EOPNOTSUPP;
 - return ops-direct_access(bdev, sector + get_start_sect(bdev), addr,
 - pfn, size);
 + if ((sector + DIV_ROUND_UP(size, 512)) 
 + part_nr_sects_read(bdev-bd_part))
 + return -ERANGE;
 + sector += get_start_sect(bdev);
 + if (sector % (PAGE_SIZE / 512))
 + return -EINVAL;
 + size = ops-direct_access(bdev, sector, addr, pfn, size);
 + return size ? size : -ERANGE;
  }
  EXPORT_SYMBOL_GPL(bdev_direct_access);
  
 

Very nice yes.

Only the check at ext2 I think is missing. Once you send the complete patch
I'll review it again. It would be very helpful if you send this patch a head
of Q, even for 3.17. It will help with merging later I think

Thanks
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Matthew Wilcox
On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote:
  + if (size  0)
 
 if(size  PAGE_SIZE), No?
  
  No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
  my C integer promotions correctly) means that 'size' gets promoted to
  an unsigned long, and we compare them unsigned, so errors will never be
  caught by this check.
 
 Good point I agree that you need a cast ie.
 
   if(size  (long)PAGE_SIZE)
 
 The reason I'm saying this is because of a bug I actually hit when
 playing with partitioning and fdisk, it came out that the last partition's
 size was not page aligned, and code that checked for ( 0) crashed because
 prd returned the last two sectors of the partition, since your API is sector
 based this can happen for you here, before you are memseting a PAGE_SIZE
 you need to test there is space, No? 

Not in ext2/ext4.  It requires block size == PAGE_SIZE, so it's never
going to request the last partial block in a partition.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/31/2014 05:13 PM, Matthew Wilcox wrote:
 On Thu, Jul 31, 2014 at 01:11:42PM +0300, Boaz Harrosh wrote:
 + if (size  0)

if(size  PAGE_SIZE), No?

 No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
 my C integer promotions correctly) means that 'size' gets promoted to
 an unsigned long, and we compare them unsigned, so errors will never be
 caught by this check.

 Good point I agree that you need a cast ie.

  if(size  (long)PAGE_SIZE)

 The reason I'm saying this is because of a bug I actually hit when
 playing with partitioning and fdisk, it came out that the last partition's
 size was not page aligned, and code that checked for ( 0) crashed because
 prd returned the last two sectors of the partition, since your API is sector
 based this can happen for you here, before you are memseting a PAGE_SIZE
 you need to test there is space, No? 
 
 Not in ext2/ext4.  It requires block size == PAGE_SIZE, so it's never
 going to request the last partial block in a partition.
 

OK cool. then.

Matthew what is your opinion about this, do we need to push for removal
of the partition dead code which never worked for brd, or we need to push
for fixing and implementing new partition support for brd?

Also another thing I saw is that if we leave the flag 
GENHD_FL_SUPPRESS_PARTITION_INFO

then mount -U UUID stops to work, regardless of partitions or not,
this is because Kernel will not put us on /proc/patitions.
I'll submit another patch to remove it.

BTW I hit another funny bug where the partition beginning was not
4K aligned apparently fdisk lets you do this if the total size is small
enough  (like 4096 which is default for brd) so I ended up with accessing
sec zero, the supper-block, failing because of the alignment check at
direct_access().
Do you know of any API that brd/prd can do to not let fdisk do this?
I'm looking at it right now I just thought it is worth asking.

Thanks for everything
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Matthew Wilcox
On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
 Matthew what is your opinion about this, do we need to push for removal
 of the partition dead code which never worked for brd, or we need to push
 for fixing and implementing new partition support for brd?

Fixing the code gets my vote.  brd is useful for testing things ... and
sometimes we need to test things that involve partitions.

 Also another thing I saw is that if we leave the flag 
   GENHD_FL_SUPPRESS_PARTITION_INFO
 
 then mount -U UUID stops to work, regardless of partitions or not,
 this is because Kernel will not put us on /proc/patitions.
 I'll submit another patch to remove it.

Yes, we should probably fix that too.

 BTW I hit another funny bug where the partition beginning was not
 4K aligned apparently fdisk lets you do this if the total size is small
 enough  (like 4096 which is default for brd) so I ended up with accessing
 sec zero, the supper-block, failing because of the alignment check at
 direct_access().

That's why I added on the partition start before doing the alignment
check :-)

 Do you know of any API that brd/prd can do to not let fdisk do this?
 I'm looking at it right now I just thought it is worth asking.

I think it's enough to refuse the mount.  That feels like a patch to
ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different
block boundary?)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Boaz Harrosh
On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
 On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
 Matthew what is your opinion about this, do we need to push for removal
 of the partition dead code which never worked for brd, or we need to push
 for fixing and implementing new partition support for brd?
 
 Fixing the code gets my vote.  brd is useful for testing things ... and
 sometimes we need to test things that involve partitions.
 

OK I'm on it, its what I'm doing today.

rrr I manged to completely trash my vm by doing 'make install' of
util-linux and after reboot it never recovered, I remember that
mount complained about a now missing library and I forgot and rebooted,
that was the end of that. Anyway I installed a new fc20 system wanted
that for a long time over my old fc18

 Also another thing I saw is that if we leave the flag 
  GENHD_FL_SUPPRESS_PARTITION_INFO

 then mount -U UUID stops to work, regardless of partitions or not,
 this is because Kernel will not put us on /proc/patitions.
 I'll submit another patch to remove it.
 
 Yes, we should probably fix that too.
 

Yes this is good stuff. I found out about the gpt option in fdisk
that's really good stuff because it gives you a PARTUUID even before
the mkfs, and the partitions are so mach more logical. 
But only without that flag

 BTW I hit another funny bug where the partition beginning was not
 4K aligned apparently fdisk lets you do this if the total size is small
 enough  (like 4096 which is default for brd) so I ended up with accessing
 sec zero, the supper-block, failing because of the alignment check at
 direct_access().
 
 That's why I added on the partition start before doing the alignment
 check :-)
 
Yes, exactly, I had very similar code to yours. I moved to your code
now First patch in the set is your patch 4/22 squashed with the modifications
you sent, then my fix, then the getgeo patch, then the remove of the flag.

But I'm still fighting fdisk's sector math, I can't for the life of me
figure out fdisk math, and it is all too easy to create a partition schema
that has an unaligned first/last sector.

I can observe and see the dis-alignment when the partitions are first
created, I can detect that at prd_probe time.

I can probably fix it by this logic:
  When first detecting a new partition ie if bd_part-start_sect
is not aligned round-up to PAGE_SIZE. Then subtract from bd_part-nr_sects the
fixed up size and round-down bd_part-nr_sects to PAGE_SIZE
This way I still live inside the confined space that fdisk gave me but only IO
within largest aligned space. The leftover sectors are just wasted space.


 Do you know of any API that brd/prd can do to not let fdisk do this?
 I'm looking at it right now I just thought it is worth asking.
 
 I think it's enough to refuse the mount.  That feels like a patch to
 ext2/4 (or maybe ext2/4 has a way to start the filesystem on a different
 block boundary?)
 

We should not leave this to the FSs to do again and again all over. I wonder
if there is some getgeo or some disk properties info somewhere that I can
set to force the core block layer to do this for me, I'm surprised that this
is the first place we have this problem?

Thanks
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-31 Thread Zwisler, Ross
On Thu, 2014-07-31 at 21:04 +0300, Boaz Harrosh wrote:
 On 07/31/2014 08:19 PM, Matthew Wilcox wrote:
  On Thu, Jul 31, 2014 at 06:28:37PM +0300, Boaz Harrosh wrote:
  Matthew what is your opinion about this, do we need to push for removal
  of the partition dead code which never worked for brd, or we need to push
  for fixing and implementing new partition support for brd?
  
  Fixing the code gets my vote.  brd is useful for testing things ... and
  sometimes we need to test things that involve partitions.
  
 
 OK I'm on it, its what I'm doing today.
 
 rrr I manged to completely trash my vm by doing 'make install' of
 util-linux and after reboot it never recovered, I remember that
 mount complained about a now missing library and I forgot and rebooted,
 that was the end of that. Anyway I installed a new fc20 system wanted
 that for a long time over my old fc18

Ah, I'm already working on this as well.  :)  If you want you can wait for my
patches to BRD  test - they should be out this week.

I'm planning on adding get_geo() and doing dynamic minors as is done in NVMe.

- Ross

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Matthew Wilcox
On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote:
> Off course I was wrong here size is in bytes not in sectors. Which points
> out that maybe this API needs to be in sectors.
> 
> [Actually it needs to be in pages both size and offset, because of return of
> pfn, but its your call.]

I considered a number of options here.  The VM wants things to be in pages.
The filesystem wants things to be in block size.  The block device wants
things to be in sectors.  It's all a mess when you try and converge.
Everybody understands bytes.

Here's what I've come up with on top of patch 4/22.  Let me know what
you think.


diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 3ee1c08..741293f 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, 
sector_t sector,
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-   if (offset >= bank->size) {
-   dev_err(>device->dev, "Access outside of address 
space\n");
-   return -ERANGE;
-   }
-
*kaddr = (void *)(bank->ph_addr + offset);
*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 33a39e7..3483458 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, 
sector_t sector,
 
if (!brd)
return -ENODEV;
-   if (sector & (PAGE_SECTORS-1))
-   return -EINVAL;
-   if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
-   return -ERANGE;
page = brd_insert_page(brd, sector);
if (!page)
return -ENOSPC;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 58958d1..2ee5556 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, sector_t 
secnum,
if (!dev_info)
return -ENODEV;
dev_sz = dev_info->end - dev_info->start;
-   if (secnum % (PAGE_SIZE/512))
-   return -EINVAL;
offset = secnum * 512;
-   if (offset > dev_sz)
-   return -ERANGE;
*kaddr = (void *) (dev_info->start + offset);
*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index f1a158e..93ebdd53 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, 
sector_t sector,
const struct block_device_operations *ops = bdev->bd_disk->fops;
if (!ops->direct_access)
return -EOPNOTSUPP;
-   return ops->direct_access(bdev, sector + get_start_sect(bdev), addr,
-   pfn, size);
+   if ((sector + DIV_ROUND_UP(size, 512)) >
+   part_nr_sects_read(bdev->bd_part))
+   return -ERANGE;
+   sector += get_start_sect(bdev);
+   if (sector % (PAGE_SIZE / 512))
+   return -EINVAL;
+   size = ops->direct_access(bdev, sector, addr, pfn, size);
+   return size ? size : -ERANGE;
 }
 EXPORT_SYMBOL_GPL(bdev_direct_access);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Matthew Wilcox
On Wed, Jul 30, 2014 at 07:03:24PM +0300, Boaz Harrosh wrote:
> > +long bdev_direct_access(struct block_device *bdev, sector_t sector,
> > +   void **addr, unsigned long *pfn, long size)
> > +{
> > +   const struct block_device_operations *ops = bdev->bd_disk->fops;
> > +   if (!ops->direct_access)
> > +   return -EOPNOTSUPP;
> 
> You need to check alignment on PAGE_SIZE since this API requires it, do
> to pfn defined to a page_number.
> 
> (Unless you want to define another output-param like page_offset.
>  but this exercise can be left to the caller)
> 
> You also need to check against the partition boundary. so something like:
> 
> + if (sector & (PAGE_SECTORS-1))
> + return -EINVAL;

Mmm.  PAGE_SECTORS is private to brd (and also private to bcache!) at
this point.  We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE,
SECTOR_SHIFT and so on, dotted throughout various random include files.
I am not the river to flush those Augean stables today.

I'll go with this, from the dcssblk driver:

if (sector % (PAGE_SIZE / 512))
return -EINVAL;

> + if (unlikely(sector + size > part_nr_sects_read(bdev->bd_part)))
> + return -ERANGE;
> 
> Then perhaps you can remove that check from drivers

As noted in your followup, size is in terms of bytes.  Perhaps it should
be named 'length' to make it more clear that it's a byte count, not a
sector count?

In any case, this looks best to me:

if ((sector + DIV_ROUND_UP(size, 512)) >
part_nr_sects_read(bdev->bd_part))
return -ERANGE;


> Style: Need a space between declaration and code (have you check-patch)

That's a bullshit check.  I don't know why it's in checkpatch.

> > +   if (size < 0)
> 
>   if(size < PAGE_SIZE), No?

No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
my C integer promotions correctly) means that 'size' gets promoted to
an unsigned long, and we compare them unsigned, so errors will never be
caught by this check.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Boaz Harrosh
On 07/30/2014 07:03 PM, Boaz Harrosh wrote:
<>
>> +long bdev_direct_access(struct block_device *bdev, sector_t sector,
>> +void **addr, unsigned long *pfn, long size)
>> +{
>> +const struct block_device_operations *ops = bdev->bd_disk->fops;
>> +if (!ops->direct_access)
>> +return -EOPNOTSUPP;
> 
> You need to check alignment on PAGE_SIZE since this API requires it, do
> to pfn defined to a page_number.
> 
> (Unless you want to define another output-param like page_offset.
>  but this exercise can be left to the caller)
> 
> You also need to check against the partition boundary. so something like:
> 
> + if (sector & (PAGE_SECTORS-1))
> + return -EINVAL;
> + if (unlikely(sector + size > part_nr_sects_read(bdev->bd_part)))

Off course I was wrong here size is in bytes not in sectors. Which points
out that maybe this API needs to be in sectors.

[Actually it needs to be in pages both size and offset, because of return of
pfn, but its your call.]

Anyway my code above needs to be fixed with
(size + 512 -1) / 512

Thanks
Boaz

> + return -ERANGE;
> 
> Then perhaps you can remove that check from drivers
> 
>> +return ops->direct_access(bdev, sector + get_start_sect(bdev), addr,
>> +pfn, size);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_direct_access);
<>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Boaz Harrosh
On 07/22/2014 10:47 PM, Matthew Wilcox wrote:
> In order to support accesses to larger chunks of memory, pass in a
> 'size' parameter (counted in bytes), and return the amount available at
> that address.
> 
> Support partitioning the underlying block device through a new helper
> function, bdev_direct_access(), since partition handling should be done
> in the block layer, not the filesystem, nor device driver.
> 
> Signed-off-by: Matthew Wilcox 
> Reviewed-by: Jan Kara 
> ---
>  Documentation/filesystems/xip.txt | 15 +--
>  arch/powerpc/sysdev/axonram.c | 12 
>  drivers/block/brd.c   |  8 +---
>  drivers/s390/block/dcssblk.c  | 19 ++-
>  fs/block_dev.c| 28 
>  fs/ext2/xip.c | 31 +--
>  include/linux/blkdev.h|  6 --
>  7 files changed, 73 insertions(+), 46 deletions(-)
> 
> diff --git a/Documentation/filesystems/xip.txt 
> b/Documentation/filesystems/xip.txt
> index 0466ee5..b62eabf 100644
> --- a/Documentation/filesystems/xip.txt
> +++ b/Documentation/filesystems/xip.txt
> @@ -28,12 +28,15 @@ Implementation
>  Execute-in-place is implemented in three steps: block device operation,
>  address space operation, and file operations.
>  
> -A block device operation named direct_access is used to retrieve a
> -reference (pointer) to a block on-disk. The reference is supposed to be
> -cpu-addressable, physical address and remain valid until the release 
> operation
> -is performed. A struct block_device reference is used to address the device,
> -and a sector_t argument is used to identify the individual block. As an
> -alternative, memory technology devices can be used for this.
> +A block device operation named direct_access is used to translate the
> +block device sector number to a page frame number (pfn) that identifies
> +the physical page for the memory.  It also returns a kernel virtual
> +address that can be used to access the memory.
> +
> +The direct_access method takes a 'size' parameter that indicates the
> +number of bytes being requested.  The function should return the number
> +of bytes that it can provide, although it must not exceed the number of
> +bytes requested.  It may also return a negative errno if an error occurs.
>  
>  The block device operation is optional, these block devices support it as of
>  today:
> diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
> index 830edc8..3ee1c08 100644
> --- a/arch/powerpc/sysdev/axonram.c
> +++ b/arch/powerpc/sysdev/axonram.c
> @@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, 
> struct bio *bio)
>   * axon_ram_direct_access - direct_access() method for block device
>   * @device, @sector, @data: see block_device_operations method
>   */
> -static int
> +static long
>  axon_ram_direct_access(struct block_device *device, sector_t sector,
> -void **kaddr, unsigned long *pfn)
> +void **kaddr, unsigned long *pfn, long size)
>  {
>   struct axon_ram_bank *bank = device->bd_disk->private_data;
> - loff_t offset;
> + loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
>  
> - offset = sector;
> - if (device->bd_part != NULL)
> - offset += device->bd_part->start_sect;
> - offset <<= AXON_RAM_SECTOR_SHIFT;
>   if (offset >= bank->size) {
>   dev_err(>device->dev, "Access outside of address 
> space\n");
>   return -ERANGE;
> @@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, 
> sector_t sector,
>   *kaddr = (void *)(bank->ph_addr + offset);
>   *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
>  
> - return 0;
> + return min_t(long, size, bank->size - offset);
>  }
>  
>  static const struct block_device_operations axon_ram_devops = {
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index c7d138e..96e4c96 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, 
> sector_t sector,
>  }
>  
>  #ifdef CONFIG_BLK_DEV_XIP
> -static int brd_direct_access(struct block_device *bdev, sector_t sector,
> - void **kaddr, unsigned long *pfn)
> +static long brd_direct_access(struct block_device *bdev, sector_t sector,
> + void **kaddr, unsigned long *pfn, long size)
>  {
>   struct brd_device *brd = bdev->bd_disk->private_data;
>   struct page *page;

Mathew hi You need to remove the wrong checks both alignment and
size from this driver.
(alignment is redundant but size is wrong)

These checks should be guarantied by the blk wrapper. (see below)

> @@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, 
> sector_t sector,
>   *kaddr = page_address(page);
>   *pfn = page_to_pfn(page);
>  
> - return 0;
> + /* Could optimistically check to see 

Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Boaz Harrosh
On 07/22/2014 10:47 PM, Matthew Wilcox wrote:
 In order to support accesses to larger chunks of memory, pass in a
 'size' parameter (counted in bytes), and return the amount available at
 that address.
 
 Support partitioning the underlying block device through a new helper
 function, bdev_direct_access(), since partition handling should be done
 in the block layer, not the filesystem, nor device driver.
 
 Signed-off-by: Matthew Wilcox matthew.r.wil...@intel.com
 Reviewed-by: Jan Kara j...@suse.cz
 ---
  Documentation/filesystems/xip.txt | 15 +--
  arch/powerpc/sysdev/axonram.c | 12 
  drivers/block/brd.c   |  8 +---
  drivers/s390/block/dcssblk.c  | 19 ++-
  fs/block_dev.c| 28 
  fs/ext2/xip.c | 31 +--
  include/linux/blkdev.h|  6 --
  7 files changed, 73 insertions(+), 46 deletions(-)
 
 diff --git a/Documentation/filesystems/xip.txt 
 b/Documentation/filesystems/xip.txt
 index 0466ee5..b62eabf 100644
 --- a/Documentation/filesystems/xip.txt
 +++ b/Documentation/filesystems/xip.txt
 @@ -28,12 +28,15 @@ Implementation
  Execute-in-place is implemented in three steps: block device operation,
  address space operation, and file operations.
  
 -A block device operation named direct_access is used to retrieve a
 -reference (pointer) to a block on-disk. The reference is supposed to be
 -cpu-addressable, physical address and remain valid until the release 
 operation
 -is performed. A struct block_device reference is used to address the device,
 -and a sector_t argument is used to identify the individual block. As an
 -alternative, memory technology devices can be used for this.
 +A block device operation named direct_access is used to translate the
 +block device sector number to a page frame number (pfn) that identifies
 +the physical page for the memory.  It also returns a kernel virtual
 +address that can be used to access the memory.
 +
 +The direct_access method takes a 'size' parameter that indicates the
 +number of bytes being requested.  The function should return the number
 +of bytes that it can provide, although it must not exceed the number of
 +bytes requested.  It may also return a negative errno if an error occurs.
  
  The block device operation is optional, these block devices support it as of
  today:
 diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
 index 830edc8..3ee1c08 100644
 --- a/arch/powerpc/sysdev/axonram.c
 +++ b/arch/powerpc/sysdev/axonram.c
 @@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, 
 struct bio *bio)
   * axon_ram_direct_access - direct_access() method for block device
   * @device, @sector, @data: see block_device_operations method
   */
 -static int
 +static long
  axon_ram_direct_access(struct block_device *device, sector_t sector,
 -void **kaddr, unsigned long *pfn)
 +void **kaddr, unsigned long *pfn, long size)
  {
   struct axon_ram_bank *bank = device-bd_disk-private_data;
 - loff_t offset;
 + loff_t offset = (loff_t)sector  AXON_RAM_SECTOR_SHIFT;
  
 - offset = sector;
 - if (device-bd_part != NULL)
 - offset += device-bd_part-start_sect;
 - offset = AXON_RAM_SECTOR_SHIFT;
   if (offset = bank-size) {
   dev_err(bank-device-dev, Access outside of address 
 space\n);
   return -ERANGE;
 @@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, 
 sector_t sector,
   *kaddr = (void *)(bank-ph_addr + offset);
   *pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
  
 - return 0;
 + return min_t(long, size, bank-size - offset);
  }
  
  static const struct block_device_operations axon_ram_devops = {
 diff --git a/drivers/block/brd.c b/drivers/block/brd.c
 index c7d138e..96e4c96 100644
 --- a/drivers/block/brd.c
 +++ b/drivers/block/brd.c
 @@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, 
 sector_t sector,
  }
  
  #ifdef CONFIG_BLK_DEV_XIP
 -static int brd_direct_access(struct block_device *bdev, sector_t sector,
 - void **kaddr, unsigned long *pfn)
 +static long brd_direct_access(struct block_device *bdev, sector_t sector,
 + void **kaddr, unsigned long *pfn, long size)
  {
   struct brd_device *brd = bdev-bd_disk-private_data;
   struct page *page;

Mathew hi You need to remove the wrong checks both alignment and
size from this driver.
(alignment is redundant but size is wrong)

These checks should be guarantied by the blk wrapper. (see below)

 @@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, 
 sector_t sector,
   *kaddr = page_address(page);
   *pfn = page_to_pfn(page);
  
 - return 0;
 + /* Could optimistically check to see if the next page in the
 +  * file is mapped to the next page of physical RAM */
 

Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Boaz Harrosh
On 07/30/2014 07:03 PM, Boaz Harrosh wrote:

 +long bdev_direct_access(struct block_device *bdev, sector_t sector,
 +void **addr, unsigned long *pfn, long size)
 +{
 +const struct block_device_operations *ops = bdev-bd_disk-fops;
 +if (!ops-direct_access)
 +return -EOPNOTSUPP;
 
 You need to check alignment on PAGE_SIZE since this API requires it, do
 to pfn defined to a page_number.
 
 (Unless you want to define another output-param like page_offset.
  but this exercise can be left to the caller)
 
 You also need to check against the partition boundary. so something like:
 
 + if (sector  (PAGE_SECTORS-1))
 + return -EINVAL;
 + if (unlikely(sector + size  part_nr_sects_read(bdev-bd_part)))

Off course I was wrong here size is in bytes not in sectors. Which points
out that maybe this API needs to be in sectors.

[Actually it needs to be in pages both size and offset, because of return of
pfn, but its your call.]

Anyway my code above needs to be fixed with
(size + 512 -1) / 512

Thanks
Boaz

 + return -ERANGE;
 
 Then perhaps you can remove that check from drivers
 
 +return ops-direct_access(bdev, sector + get_start_sect(bdev), addr,
 +pfn, size);
 +}
 +EXPORT_SYMBOL_GPL(bdev_direct_access);



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Matthew Wilcox
On Wed, Jul 30, 2014 at 07:03:24PM +0300, Boaz Harrosh wrote:
  +long bdev_direct_access(struct block_device *bdev, sector_t sector,
  +   void **addr, unsigned long *pfn, long size)
  +{
  +   const struct block_device_operations *ops = bdev-bd_disk-fops;
  +   if (!ops-direct_access)
  +   return -EOPNOTSUPP;
 
 You need to check alignment on PAGE_SIZE since this API requires it, do
 to pfn defined to a page_number.
 
 (Unless you want to define another output-param like page_offset.
  but this exercise can be left to the caller)
 
 You also need to check against the partition boundary. so something like:
 
 + if (sector  (PAGE_SECTORS-1))
 + return -EINVAL;

Mmm.  PAGE_SECTORS is private to brd (and also private to bcache!) at
this point.  We've got a real mess of defines of SECTOR_SIZE, SECTORSIZE,
SECTOR_SHIFT and so on, dotted throughout various random include files.
I am not the river to flush those Augean stables today.

I'll go with this, from the dcssblk driver:

if (sector % (PAGE_SIZE / 512))
return -EINVAL;

 + if (unlikely(sector + size  part_nr_sects_read(bdev-bd_part)))
 + return -ERANGE;
 
 Then perhaps you can remove that check from drivers

As noted in your followup, size is in terms of bytes.  Perhaps it should
be named 'length' to make it more clear that it's a byte count, not a
sector count?

In any case, this looks best to me:

if ((sector + DIV_ROUND_UP(size, 512)) 
part_nr_sects_read(bdev-bd_part))
return -ERANGE;


 Style: Need a space between declaration and code (have you check-patch)

That's a bullshit check.  I don't know why it's in checkpatch.

  +   if (size  0)
 
   if(size  PAGE_SIZE), No?

No, absolutely not.  PAGE_SIZE is unsigned long, which (if I understand
my C integer promotions correctly) means that 'size' gets promoted to
an unsigned long, and we compare them unsigned, so errors will never be
caught by this check.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 04/22] Change direct_access calling convention

2014-07-30 Thread Matthew Wilcox
On Wed, Jul 30, 2014 at 07:12:44PM +0300, Boaz Harrosh wrote:
 Off course I was wrong here size is in bytes not in sectors. Which points
 out that maybe this API needs to be in sectors.
 
 [Actually it needs to be in pages both size and offset, because of return of
 pfn, but its your call.]

I considered a number of options here.  The VM wants things to be in pages.
The filesystem wants things to be in block size.  The block device wants
things to be in sectors.  It's all a mess when you try and converge.
Everybody understands bytes.

Here's what I've come up with on top of patch 4/22.  Let me know what
you think.


diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 3ee1c08..741293f 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -146,11 +146,6 @@ axon_ram_direct_access(struct block_device *device, 
sector_t sector,
struct axon_ram_bank *bank = device-bd_disk-private_data;
loff_t offset = (loff_t)sector  AXON_RAM_SECTOR_SHIFT;
 
-   if (offset = bank-size) {
-   dev_err(bank-device-dev, Access outside of address 
space\n);
-   return -ERANGE;
-   }
-
*kaddr = (void *)(bank-ph_addr + offset);
*pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 33a39e7..3483458 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -378,10 +378,6 @@ static long brd_direct_access(struct block_device *bdev, 
sector_t sector,
 
if (!brd)
return -ENODEV;
-   if (sector  (PAGE_SECTORS-1))
-   return -EINVAL;
-   if (sector + PAGE_SECTORS  get_capacity(bdev-bd_disk))
-   return -ERANGE;
page = brd_insert_page(brd, sector);
if (!page)
return -ENOSPC;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 58958d1..2ee5556 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -877,11 +877,7 @@ dcssblk_direct_access (struct block_device *bdev, sector_t 
secnum,
if (!dev_info)
return -ENODEV;
dev_sz = dev_info-end - dev_info-start;
-   if (secnum % (PAGE_SIZE/512))
-   return -EINVAL;
offset = secnum * 512;
-   if (offset  dev_sz)
-   return -ERANGE;
*kaddr = (void *) (dev_info-start + offset);
*pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index f1a158e..93ebdd53 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -450,8 +450,14 @@ long bdev_direct_access(struct block_device *bdev, 
sector_t sector,
const struct block_device_operations *ops = bdev-bd_disk-fops;
if (!ops-direct_access)
return -EOPNOTSUPP;
-   return ops-direct_access(bdev, sector + get_start_sect(bdev), addr,
-   pfn, size);
+   if ((sector + DIV_ROUND_UP(size, 512)) 
+   part_nr_sects_read(bdev-bd_part))
+   return -ERANGE;
+   sector += get_start_sect(bdev);
+   if (sector % (PAGE_SIZE / 512))
+   return -EINVAL;
+   size = ops-direct_access(bdev, sector, addr, pfn, size);
+   return size ? size : -ERANGE;
 }
 EXPORT_SYMBOL_GPL(bdev_direct_access);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 04/22] Change direct_access calling convention

2014-07-22 Thread Matthew Wilcox
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.

Support partitioning the underlying block device through a new helper
function, bdev_direct_access(), since partition handling should be done
in the block layer, not the filesystem, nor device driver.

Signed-off-by: Matthew Wilcox 
Reviewed-by: Jan Kara 
---
 Documentation/filesystems/xip.txt | 15 +--
 arch/powerpc/sysdev/axonram.c | 12 
 drivers/block/brd.c   |  8 +---
 drivers/s390/block/dcssblk.c  | 19 ++-
 fs/block_dev.c| 28 
 fs/ext2/xip.c | 31 +--
 include/linux/blkdev.h|  6 --
 7 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/Documentation/filesystems/xip.txt 
b/Documentation/filesystems/xip.txt
index 0466ee5..b62eabf 100644
--- a/Documentation/filesystems/xip.txt
+++ b/Documentation/filesystems/xip.txt
@@ -28,12 +28,15 @@ Implementation
 Execute-in-place is implemented in three steps: block device operation,
 address space operation, and file operations.
 
-A block device operation named direct_access is used to retrieve a
-reference (pointer) to a block on-disk. The reference is supposed to be
-cpu-addressable, physical address and remain valid until the release operation
-is performed. A struct block_device reference is used to address the device,
-and a sector_t argument is used to identify the individual block. As an
-alternative, memory technology devices can be used for this.
+A block device operation named direct_access is used to translate the
+block device sector number to a page frame number (pfn) that identifies
+the physical page for the memory.  It also returns a kernel virtual
+address that can be used to access the memory.
+
+The direct_access method takes a 'size' parameter that indicates the
+number of bytes being requested.  The function should return the number
+of bytes that it can provide, although it must not exceed the number of
+bytes requested.  It may also return a negative errno if an error occurs.
 
 The block device operation is optional, these block devices support it as of
 today:
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 830edc8..3ee1c08 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  * axon_ram_direct_access - direct_access() method for block device
  * @device, @sector, @data: see block_device_operations method
  */
-static int
+static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void **kaddr, unsigned long *pfn)
+  void **kaddr, unsigned long *pfn, long size)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
-   loff_t offset;
+   loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-   offset = sector;
-   if (device->bd_part != NULL)
-   offset += device->bd_part->start_sect;
-   offset <<= AXON_RAM_SECTOR_SHIFT;
if (offset >= bank->size) {
dev_err(>device->dev, "Access outside of address 
space\n");
return -ERANGE;
@@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, 
sector_t sector,
*kaddr = (void *)(bank->ph_addr + offset);
*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
 
-   return 0;
+   return min_t(long, size, bank->size - offset);
 }
 
 static const struct block_device_operations axon_ram_devops = {
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c7d138e..96e4c96 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 }
 
 #ifdef CONFIG_BLK_DEV_XIP
-static int brd_direct_access(struct block_device *bdev, sector_t sector,
-   void **kaddr, unsigned long *pfn)
+static long brd_direct_access(struct block_device *bdev, sector_t sector,
+   void **kaddr, unsigned long *pfn, long size)
 {
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
@@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, 
sector_t sector,
*kaddr = page_address(page);
*pfn = page_to_pfn(page);
 
-   return 0;
+   /* Could optimistically check to see if the next page in the
+* file is mapped to the next page of physical RAM */
+   return min_t(long, PAGE_SIZE, size);
 }
 #endif
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 0f47175..58958d1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -28,8 +28,8 @@
 static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void 

[PATCH v8 04/22] Change direct_access calling convention

2014-07-22 Thread Matthew Wilcox
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.

Support partitioning the underlying block device through a new helper
function, bdev_direct_access(), since partition handling should be done
in the block layer, not the filesystem, nor device driver.

Signed-off-by: Matthew Wilcox matthew.r.wil...@intel.com
Reviewed-by: Jan Kara j...@suse.cz
---
 Documentation/filesystems/xip.txt | 15 +--
 arch/powerpc/sysdev/axonram.c | 12 
 drivers/block/brd.c   |  8 +---
 drivers/s390/block/dcssblk.c  | 19 ++-
 fs/block_dev.c| 28 
 fs/ext2/xip.c | 31 +--
 include/linux/blkdev.h|  6 --
 7 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/Documentation/filesystems/xip.txt 
b/Documentation/filesystems/xip.txt
index 0466ee5..b62eabf 100644
--- a/Documentation/filesystems/xip.txt
+++ b/Documentation/filesystems/xip.txt
@@ -28,12 +28,15 @@ Implementation
 Execute-in-place is implemented in three steps: block device operation,
 address space operation, and file operations.
 
-A block device operation named direct_access is used to retrieve a
-reference (pointer) to a block on-disk. The reference is supposed to be
-cpu-addressable, physical address and remain valid until the release operation
-is performed. A struct block_device reference is used to address the device,
-and a sector_t argument is used to identify the individual block. As an
-alternative, memory technology devices can be used for this.
+A block device operation named direct_access is used to translate the
+block device sector number to a page frame number (pfn) that identifies
+the physical page for the memory.  It also returns a kernel virtual
+address that can be used to access the memory.
+
+The direct_access method takes a 'size' parameter that indicates the
+number of bytes being requested.  The function should return the number
+of bytes that it can provide, although it must not exceed the number of
+bytes requested.  It may also return a negative errno if an error occurs.
 
 The block device operation is optional, these block devices support it as of
 today:
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 830edc8..3ee1c08 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,17 +139,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  * axon_ram_direct_access - direct_access() method for block device
  * @device, @sector, @data: see block_device_operations method
  */
-static int
+static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void **kaddr, unsigned long *pfn)
+  void **kaddr, unsigned long *pfn, long size)
 {
struct axon_ram_bank *bank = device-bd_disk-private_data;
-   loff_t offset;
+   loff_t offset = (loff_t)sector  AXON_RAM_SECTOR_SHIFT;
 
-   offset = sector;
-   if (device-bd_part != NULL)
-   offset += device-bd_part-start_sect;
-   offset = AXON_RAM_SECTOR_SHIFT;
if (offset = bank-size) {
dev_err(bank-device-dev, Access outside of address 
space\n);
return -ERANGE;
@@ -158,7 +154,7 @@ axon_ram_direct_access(struct block_device *device, 
sector_t sector,
*kaddr = (void *)(bank-ph_addr + offset);
*pfn = virt_to_phys(*kaddr)  PAGE_SHIFT;
 
-   return 0;
+   return min_t(long, size, bank-size - offset);
 }
 
 static const struct block_device_operations axon_ram_devops = {
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c7d138e..96e4c96 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -370,8 +370,8 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 }
 
 #ifdef CONFIG_BLK_DEV_XIP
-static int brd_direct_access(struct block_device *bdev, sector_t sector,
-   void **kaddr, unsigned long *pfn)
+static long brd_direct_access(struct block_device *bdev, sector_t sector,
+   void **kaddr, unsigned long *pfn, long size)
 {
struct brd_device *brd = bdev-bd_disk-private_data;
struct page *page;
@@ -388,7 +388,9 @@ static int brd_direct_access(struct block_device *bdev, 
sector_t sector,
*kaddr = page_address(page);
*pfn = page_to_pfn(page);
 
-   return 0;
+   /* Could optimistically check to see if the next page in the
+* file is mapped to the next page of physical RAM */
+   return min_t(long, PAGE_SIZE, size);
 }
 #endif
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 0f47175..58958d1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -28,8 +28,8 @@
 static int dcssblk_open(struct block_device *bdev,