Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-11 Thread Jim Harris
On Fri, Mar 11, 2016 at 9:31 AM, Warner Losh  wrote:

>
>
>
> And keep in mind the original description was this:
>
> Quote:
>
> Intel NVMe controllers have a slow path for I/Os that span
> a 128KB stripe boundary but ZFS limits ashift, which is derived
> from d_stripesize, to 13 (8KB) so we limit the stripesize
> reported to geom(8) to 4KB.
>
> This may result in a small number of additional I/Os
> to require splitting in nvme(4), however the NVMe I/O
> path is very efficient so these additional I/Os will cause
> very minimal (if any) difference in performance or
> CPU utilisation.
>
> unquote
>
> so the issue seems to being blown up a bit. It's better if you
> don't generate these I/Os, but the driver copes by splitting them
> on the affected drives causing a small inefficiency because you're
> increasing the IOs needed to do the I/O, cutting into the IOPS budget.
>
> Warner
>
>

Warner is correct.  This is something specific to some of the Intel NVMe
controllers.  The core nvme(4) driver detects Intel controllers that
benefit from splitting I/O crossing 128KB stripe boundaries, and will do
the splitting internal to the driver.  Reporting this stripe size further
up the stack is only to reduce the number of I/O that require this
splitting.

In practice, there is no noticeable impact to performance or latency when
splitting I/O on 128KB boundaries.  Larger I/O are more likely to require
splitting, but for larger I/O you will hit overall bandwidth limitations
before getting close to IOPs limitations.

-Jim
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-11 Thread Warner Losh
On Fri, Mar 11, 2016 at 9:24 AM, Warner Losh  wrote:

>
>
> On Fri, Mar 11, 2016 at 9:15 AM, Alan Somers  wrote:
>
>> Interesting.  I didn't know about the alternate meaning of stripesize.  I
>> agree then that there's currently no way to tune ZFS to respect NVME's
>> 128KB boundaries.  One could set zfs.vfs.vdev.aggregation_limit to 128KB,
>> but that would only halfway solve the problem, because allocations could be
>> unaligned.  Frankly, I'm surprised that NVME drives should have such a
>> small limit when SATA and SAS devices commonly handle single commands that
>> span multiple MB.  I don't think there's any way to adapt ZFS to this limit
>> without hurting it in other ways; for example by restricting its ability to
>> use large _or_ small record sizes.
>>
>> Hopefully the NVME slow path isn't _too_ slow.
>>
>
> Let's be clear here: this is purely an Intel controller issue, not an nvme
> issue. Most other nvme drives don't have any issues with this at all. At
> least for the drives I've been testing from well known NAND players (I'm
> unsure if they are released yet, so I can't name names, other than to say
> that they aren't OCZ). All these NVMe drives handle 1MB I/Os with
> approximately the same performance as 128k or 64k I/Os. The enterprise
> grade drives are quite fast and quite nice. It's the lower end, consumer
> drives that have more issues. Since those have been eliminated from our
> detailed consideration, I'm unsure if they have issues.
>
> And the Intel issue is a more subtle one having to do with PCIe burst
> sizes than necessarily crossing the 128k boundary. I've asked my contacts
> inside of Intel that I don't think read these lists for the exact details.
>

And keep in mind the original description was this:

Quote:

Intel NVMe controllers have a slow path for I/Os that span
a 128KB stripe boundary but ZFS limits ashift, which is derived
from d_stripesize, to 13 (8KB) so we limit the stripesize
reported to geom(8) to 4KB.

This may result in a small number of additional I/Os
to require splitting in nvme(4), however the NVMe I/O
path is very efficient so these additional I/Os will cause
very minimal (if any) difference in performance or
CPU utilisation.

unquote

so the issue seems to being blown up a bit. It's better if you
don't generate these I/Os, but the driver copes by splitting them
on the affected drives causing a small inefficiency because you're
increasing the IOs needed to do the I/O, cutting into the IOPS budget.

Warner


> Warner
>
>
>> On Fri, Mar 11, 2016 at 2:07 AM, Alexander Motin  wrote:
>>
>>> On 11.03.16 06:58, Alan Somers wrote:
>>> > Do they behave badly for writes that cross a 128KB boundary, but are
>>> > nonetheless aligned to 128KB boundaries?  Then I don't understand how
>>> > this change (or mav's replacement) is supposed to help.  The stripesize
>>> > is supposed to be the minimum write that the device can accept without
>>> > requiring a read-modify-write.  ZFS guarantees that it will never issue
>>> > a write smaller than the stripesize, nor will it ever issue a write
>>> that
>>> > is not aligned to a stripesize-boundary.  But even if ZFS worked with
>>> > 128KB stripesizes, it would still happily issue writes a multiple of
>>> > 128KB in size, and these would cross those boundaries.  Am I not
>>> > understanding something here?
>>>
>>> stripesize is not necessary related to read-modify-write.  It reports
>>> "some" native boundaries of the device.  For example, RAID0 array has
>>> stripes, crossing which does not cause read-modify-write cycles, but
>>> causes I/O split and head seeks for extra disks.  This, as I understand,
>>> is the case for some Intel's NVMe device models here, and is the reason
>>> why 128KB stripesize was originally reported.
>>>
>>> We can not demand all file systems to never issue I/Os of less then
>>> stripesize, since it can be 128KB, 1MB or even more (and since then it
>>> would be called sectorsize).  If ZFS (in this case) doesn't support
>>> allocation block sizes above 8K (and even that is very
>>> space-inefficient), and it has no other mechanisms to optimize I/O
>>> alignment, then it is not a problem of the NVMe device or driver, but
>>> only of ZFS itself.  So what I have done here is moved workaround from
>>> improper place (NVMe) to proper one (ZFS): NVMe now correctly reports
>>> its native 128K bondaries, that will be respected, for example, by
>>> gpart, that help, for example UFS align its 32K blocks, while ZFS will
>>> correctly ignore values for which it can't optimize, falling back to
>>> efficient 512 bytes allocations.
>>>
>>> PS about the meaning of stripesize not limited to read-modify-write: For
>>> example, RAID5 of 5 512e disks actually has three stripe sizes: 4K, 64K
>>> and 256K: aligned writes of 4K allow to avoid read-modify-write inside
>>> the drive, I/Os not crossing 64K boundaries without reason improve
>>> parallel performance, aligned writes of 256K allow to avoid
>>> read-modify-write on the

Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-11 Thread Warner Losh
On Fri, Mar 11, 2016 at 9:15 AM, Alan Somers  wrote:

> Interesting.  I didn't know about the alternate meaning of stripesize.  I
> agree then that there's currently no way to tune ZFS to respect NVME's
> 128KB boundaries.  One could set zfs.vfs.vdev.aggregation_limit to 128KB,
> but that would only halfway solve the problem, because allocations could be
> unaligned.  Frankly, I'm surprised that NVME drives should have such a
> small limit when SATA and SAS devices commonly handle single commands that
> span multiple MB.  I don't think there's any way to adapt ZFS to this limit
> without hurting it in other ways; for example by restricting its ability to
> use large _or_ small record sizes.
>
> Hopefully the NVME slow path isn't _too_ slow.
>

Let's be clear here: this is purely an Intel controller issue, not an nvme
issue. Most other nvme drives don't have any issues with this at all. At
least for the drives I've been testing from well known NAND players (I'm
unsure if they are released yet, so I can't name names, other than to say
that they aren't OCZ). All these NVMe drives handle 1MB I/Os with
approximately the same performance as 128k or 64k I/Os. The enterprise
grade drives are quite fast and quite nice. It's the lower end, consumer
drives that have more issues. Since those have been eliminated from our
detailed consideration, I'm unsure if they have issues.

And the Intel issue is a more subtle one having to do with PCIe burst sizes
than necessarily crossing the 128k boundary. I've asked my contacts inside
of Intel that I don't think read these lists for the exact details.

Warner


> On Fri, Mar 11, 2016 at 2:07 AM, Alexander Motin  wrote:
>
>> On 11.03.16 06:58, Alan Somers wrote:
>> > Do they behave badly for writes that cross a 128KB boundary, but are
>> > nonetheless aligned to 128KB boundaries?  Then I don't understand how
>> > this change (or mav's replacement) is supposed to help.  The stripesize
>> > is supposed to be the minimum write that the device can accept without
>> > requiring a read-modify-write.  ZFS guarantees that it will never issue
>> > a write smaller than the stripesize, nor will it ever issue a write that
>> > is not aligned to a stripesize-boundary.  But even if ZFS worked with
>> > 128KB stripesizes, it would still happily issue writes a multiple of
>> > 128KB in size, and these would cross those boundaries.  Am I not
>> > understanding something here?
>>
>> stripesize is not necessary related to read-modify-write.  It reports
>> "some" native boundaries of the device.  For example, RAID0 array has
>> stripes, crossing which does not cause read-modify-write cycles, but
>> causes I/O split and head seeks for extra disks.  This, as I understand,
>> is the case for some Intel's NVMe device models here, and is the reason
>> why 128KB stripesize was originally reported.
>>
>> We can not demand all file systems to never issue I/Os of less then
>> stripesize, since it can be 128KB, 1MB or even more (and since then it
>> would be called sectorsize).  If ZFS (in this case) doesn't support
>> allocation block sizes above 8K (and even that is very
>> space-inefficient), and it has no other mechanisms to optimize I/O
>> alignment, then it is not a problem of the NVMe device or driver, but
>> only of ZFS itself.  So what I have done here is moved workaround from
>> improper place (NVMe) to proper one (ZFS): NVMe now correctly reports
>> its native 128K bondaries, that will be respected, for example, by
>> gpart, that help, for example UFS align its 32K blocks, while ZFS will
>> correctly ignore values for which it can't optimize, falling back to
>> efficient 512 bytes allocations.
>>
>> PS about the meaning of stripesize not limited to read-modify-write: For
>> example, RAID5 of 5 512e disks actually has three stripe sizes: 4K, 64K
>> and 256K: aligned writes of 4K allow to avoid read-modify-write inside
>> the drive, I/Os not crossing 64K boundaries without reason improve
>> parallel performance, aligned writes of 256K allow to avoid
>> read-modify-write on the RAID5 level.  Obviously not all of those
>> optimizations achievable in all environments, and the bigger the stripe
>> size the harder optimize for it, but it does not mean that such
>> optimization is impossible.  It would be good to be able to report all
>> of them, allowing each consumer to use as many of them as it can.
>>
>> > On Thu, Mar 10, 2016 at 9:34 PM, Warner Losh > > > wrote:
>> >
>> > Some Intel NVMe drives behave badly when the LBA range crosses a
>> > 128k boundary. Their
>> > performance is worse for those transactions than for ones that don't
>> > cross the 128k boundary.
>> >
>> > Warner
>> >
>> > On Thu, Mar 10, 2016 at 11:01 AM, Alan Somers > > > wrote:
>> >
>> > Are you saying that Intel NVMe controllers perform poorly for
>> > all I/Os that are less than 128KB, or just for I/Os of any size
>> > that cr

Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-11 Thread Alan Somers
Interesting.  I didn't know about the alternate meaning of stripesize.  I
agree then that there's currently no way to tune ZFS to respect NVME's
128KB boundaries.  One could set zfs.vfs.vdev.aggregation_limit to 128KB,
but that would only halfway solve the problem, because allocations could be
unaligned.  Frankly, I'm surprised that NVME drives should have such a
small limit when SATA and SAS devices commonly handle single commands that
span multiple MB.  I don't think there's any way to adapt ZFS to this limit
without hurting it in other ways; for example by restricting its ability to
use large _or_ small record sizes.

Hopefully the NVME slow path isn't _too_ slow.

On Fri, Mar 11, 2016 at 2:07 AM, Alexander Motin  wrote:

> On 11.03.16 06:58, Alan Somers wrote:
> > Do they behave badly for writes that cross a 128KB boundary, but are
> > nonetheless aligned to 128KB boundaries?  Then I don't understand how
> > this change (or mav's replacement) is supposed to help.  The stripesize
> > is supposed to be the minimum write that the device can accept without
> > requiring a read-modify-write.  ZFS guarantees that it will never issue
> > a write smaller than the stripesize, nor will it ever issue a write that
> > is not aligned to a stripesize-boundary.  But even if ZFS worked with
> > 128KB stripesizes, it would still happily issue writes a multiple of
> > 128KB in size, and these would cross those boundaries.  Am I not
> > understanding something here?
>
> stripesize is not necessary related to read-modify-write.  It reports
> "some" native boundaries of the device.  For example, RAID0 array has
> stripes, crossing which does not cause read-modify-write cycles, but
> causes I/O split and head seeks for extra disks.  This, as I understand,
> is the case for some Intel's NVMe device models here, and is the reason
> why 128KB stripesize was originally reported.
>
> We can not demand all file systems to never issue I/Os of less then
> stripesize, since it can be 128KB, 1MB or even more (and since then it
> would be called sectorsize).  If ZFS (in this case) doesn't support
> allocation block sizes above 8K (and even that is very
> space-inefficient), and it has no other mechanisms to optimize I/O
> alignment, then it is not a problem of the NVMe device or driver, but
> only of ZFS itself.  So what I have done here is moved workaround from
> improper place (NVMe) to proper one (ZFS): NVMe now correctly reports
> its native 128K bondaries, that will be respected, for example, by
> gpart, that help, for example UFS align its 32K blocks, while ZFS will
> correctly ignore values for which it can't optimize, falling back to
> efficient 512 bytes allocations.
>
> PS about the meaning of stripesize not limited to read-modify-write: For
> example, RAID5 of 5 512e disks actually has three stripe sizes: 4K, 64K
> and 256K: aligned writes of 4K allow to avoid read-modify-write inside
> the drive, I/Os not crossing 64K boundaries without reason improve
> parallel performance, aligned writes of 256K allow to avoid
> read-modify-write on the RAID5 level.  Obviously not all of those
> optimizations achievable in all environments, and the bigger the stripe
> size the harder optimize for it, but it does not mean that such
> optimization is impossible.  It would be good to be able to report all
> of them, allowing each consumer to use as many of them as it can.
>
> > On Thu, Mar 10, 2016 at 9:34 PM, Warner Losh  > > wrote:
> >
> > Some Intel NVMe drives behave badly when the LBA range crosses a
> > 128k boundary. Their
> > performance is worse for those transactions than for ones that don't
> > cross the 128k boundary.
> >
> > Warner
> >
> > On Thu, Mar 10, 2016 at 11:01 AM, Alan Somers  > > wrote:
> >
> > Are you saying that Intel NVMe controllers perform poorly for
> > all I/Os that are less than 128KB, or just for I/Os of any size
> > that cross a 128KB boundary?
> >
> > On Thu, Dec 10, 2015 at 7:06 PM, Steven Hartland
> > mailto:s...@freebsd.org>> wrote:
> >
> > Author: smh
> > Date: Fri Dec 11 02:06:03 2015
> > New Revision: 292074
> > URL: https://svnweb.freebsd.org/changeset/base/292074
> >
> > Log:
> >   Limit stripesize reported from nvd(4) to 4K
> >
> >   Intel NVMe controllers have a slow path for I/Os that span
> > a 128KB stripe boundary but ZFS limits ashift, which is
> > derived from d_stripesize, to 13 (8KB) so we limit the
> > stripesize reported to geom(8) to 4KB.
> >
> >   This may result in a small number of additional I/Os to
> > require splitting in nvme(4), however the NVMe I/O path is
> > very efficient so these additional I/Os will cause very
> > minimal (if any) difference in performance or CPU
> utilisation.
> >
> > 

Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-11 Thread Alexander Motin
On 11.03.16 06:58, Alan Somers wrote:
> Do they behave badly for writes that cross a 128KB boundary, but are
> nonetheless aligned to 128KB boundaries?  Then I don't understand how
> this change (or mav's replacement) is supposed to help.  The stripesize
> is supposed to be the minimum write that the device can accept without
> requiring a read-modify-write.  ZFS guarantees that it will never issue
> a write smaller than the stripesize, nor will it ever issue a write that
> is not aligned to a stripesize-boundary.  But even if ZFS worked with
> 128KB stripesizes, it would still happily issue writes a multiple of
> 128KB in size, and these would cross those boundaries.  Am I not
> understanding something here?

stripesize is not necessary related to read-modify-write.  It reports
"some" native boundaries of the device.  For example, RAID0 array has
stripes, crossing which does not cause read-modify-write cycles, but
causes I/O split and head seeks for extra disks.  This, as I understand,
is the case for some Intel's NVMe device models here, and is the reason
why 128KB stripesize was originally reported.

We can not demand all file systems to never issue I/Os of less then
stripesize, since it can be 128KB, 1MB or even more (and since then it
would be called sectorsize).  If ZFS (in this case) doesn't support
allocation block sizes above 8K (and even that is very
space-inefficient), and it has no other mechanisms to optimize I/O
alignment, then it is not a problem of the NVMe device or driver, but
only of ZFS itself.  So what I have done here is moved workaround from
improper place (NVMe) to proper one (ZFS): NVMe now correctly reports
its native 128K bondaries, that will be respected, for example, by
gpart, that help, for example UFS align its 32K blocks, while ZFS will
correctly ignore values for which it can't optimize, falling back to
efficient 512 bytes allocations.

PS about the meaning of stripesize not limited to read-modify-write: For
example, RAID5 of 5 512e disks actually has three stripe sizes: 4K, 64K
and 256K: aligned writes of 4K allow to avoid read-modify-write inside
the drive, I/Os not crossing 64K boundaries without reason improve
parallel performance, aligned writes of 256K allow to avoid
read-modify-write on the RAID5 level.  Obviously not all of those
optimizations achievable in all environments, and the bigger the stripe
size the harder optimize for it, but it does not mean that such
optimization is impossible.  It would be good to be able to report all
of them, allowing each consumer to use as many of them as it can.

> On Thu, Mar 10, 2016 at 9:34 PM, Warner Losh  > wrote:
> 
> Some Intel NVMe drives behave badly when the LBA range crosses a
> 128k boundary. Their
> performance is worse for those transactions than for ones that don't
> cross the 128k boundary.
> 
> Warner
> 
> On Thu, Mar 10, 2016 at 11:01 AM, Alan Somers  > wrote:
> 
> Are you saying that Intel NVMe controllers perform poorly for
> all I/Os that are less than 128KB, or just for I/Os of any size
> that cross a 128KB boundary?
> 
> On Thu, Dec 10, 2015 at 7:06 PM, Steven Hartland
> mailto:s...@freebsd.org>> wrote:
> 
> Author: smh
> Date: Fri Dec 11 02:06:03 2015
> New Revision: 292074
> URL: https://svnweb.freebsd.org/changeset/base/292074
> 
> Log:
>   Limit stripesize reported from nvd(4) to 4K
> 
>   Intel NVMe controllers have a slow path for I/Os that span
> a 128KB stripe boundary but ZFS limits ashift, which is
> derived from d_stripesize, to 13 (8KB) so we limit the
> stripesize reported to geom(8) to 4KB.
> 
>   This may result in a small number of additional I/Os to
> require splitting in nvme(4), however the NVMe I/O path is
> very efficient so these additional I/Os will cause very
> minimal (if any) difference in performance or CPU utilisation.
> 
>   This can be controller by the new sysctl
> kern.nvme.max_optimal_sectorsize.
> 
>   MFC after:1 week
>   Sponsored by: Multiplay
>   Differential Revision:   
> https://reviews.freebsd.org/D4446
> 
> Modified:
>   head/sys/dev/nvd/nvd.c
>   head/sys/dev/nvme/nvme.h
>   head/sys/dev/nvme/nvme_ns.c
>   head/sys/dev/nvme/nvme_sysctl.c
> 
> 
> 


-- 
Alexander Motin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-10 Thread Alan Somers
Do they behave badly for writes that cross a 128KB boundary, but are
nonetheless aligned to 128KB boundaries?  Then I don't understand how this
change (or mav's replacement) is supposed to help.  The stripesize is
supposed to be the minimum write that the device can accept without
requiring a read-modify-write.  ZFS guarantees that it will never issue a
write smaller than the stripesize, nor will it ever issue a write that is
not aligned to a stripesize-boundary.  But even if ZFS worked with 128KB
stripesizes, it would still happily issue writes a multiple of 128KB in
size, and these would cross those boundaries.  Am I not understanding
something here?

-Alan

On Thu, Mar 10, 2016 at 9:34 PM, Warner Losh  wrote:

> Some Intel NVMe drives behave badly when the LBA range crosses a 128k
> boundary. Their
> performance is worse for those transactions than for ones that don't cross
> the 128k boundary.
>
> Warner
>
> On Thu, Mar 10, 2016 at 11:01 AM, Alan Somers  wrote:
>
>> Are you saying that Intel NVMe controllers perform poorly for all I/Os
>> that are less than 128KB, or just for I/Os of any size that cross a 128KB
>> boundary?
>>
>> On Thu, Dec 10, 2015 at 7:06 PM, Steven Hartland  wrote:
>>
>>> Author: smh
>>> Date: Fri Dec 11 02:06:03 2015
>>> New Revision: 292074
>>> URL: https://svnweb.freebsd.org/changeset/base/292074
>>>
>>> Log:
>>>   Limit stripesize reported from nvd(4) to 4K
>>>
>>>   Intel NVMe controllers have a slow path for I/Os that span a 128KB
>>> stripe boundary but ZFS limits ashift, which is derived from d_stripesize,
>>> to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.
>>>
>>>   This may result in a small number of additional I/Os to require
>>> splitting in nvme(4), however the NVMe I/O path is very efficient so these
>>> additional I/Os will cause very minimal (if any) difference in performance
>>> or CPU utilisation.
>>>
>>>   This can be controller by the new sysctl
>>> kern.nvme.max_optimal_sectorsize.
>>>
>>>   MFC after:1 week
>>>   Sponsored by: Multiplay
>>>   Differential Revision:https://reviews.freebsd.org/D4446
>>>
>>> Modified:
>>>   head/sys/dev/nvd/nvd.c
>>>   head/sys/dev/nvme/nvme.h
>>>   head/sys/dev/nvme/nvme_ns.c
>>>   head/sys/dev/nvme/nvme_sysctl.c
>>>
>>>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-10 Thread Warner Losh
Some Intel NVMe drives behave badly when the LBA range crosses a 128k
boundary. Their
performance is worse for those transactions than for ones that don't cross
the 128k boundary.

Warner

On Thu, Mar 10, 2016 at 11:01 AM, Alan Somers  wrote:

> Are you saying that Intel NVMe controllers perform poorly for all I/Os
> that are less than 128KB, or just for I/Os of any size that cross a 128KB
> boundary?
>
> On Thu, Dec 10, 2015 at 7:06 PM, Steven Hartland  wrote:
>
>> Author: smh
>> Date: Fri Dec 11 02:06:03 2015
>> New Revision: 292074
>> URL: https://svnweb.freebsd.org/changeset/base/292074
>>
>> Log:
>>   Limit stripesize reported from nvd(4) to 4K
>>
>>   Intel NVMe controllers have a slow path for I/Os that span a 128KB
>> stripe boundary but ZFS limits ashift, which is derived from d_stripesize,
>> to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.
>>
>>   This may result in a small number of additional I/Os to require
>> splitting in nvme(4), however the NVMe I/O path is very efficient so these
>> additional I/Os will cause very minimal (if any) difference in performance
>> or CPU utilisation.
>>
>>   This can be controller by the new sysctl
>> kern.nvme.max_optimal_sectorsize.
>>
>>   MFC after:1 week
>>   Sponsored by: Multiplay
>>   Differential Revision:https://reviews.freebsd.org/D4446
>>
>> Modified:
>>   head/sys/dev/nvd/nvd.c
>>   head/sys/dev/nvme/nvme.h
>>   head/sys/dev/nvme/nvme_ns.c
>>   head/sys/dev/nvme/nvme_sysctl.c
>>
>>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r292074 - in head/sys/dev: nvd nvme

2016-03-10 Thread Alan Somers
Are you saying that Intel NVMe controllers perform poorly for all I/Os that
are less than 128KB, or just for I/Os of any size that cross a 128KB
boundary?

On Thu, Dec 10, 2015 at 7:06 PM, Steven Hartland  wrote:

> Author: smh
> Date: Fri Dec 11 02:06:03 2015
> New Revision: 292074
> URL: https://svnweb.freebsd.org/changeset/base/292074
>
> Log:
>   Limit stripesize reported from nvd(4) to 4K
>
>   Intel NVMe controllers have a slow path for I/Os that span a 128KB
> stripe boundary but ZFS limits ashift, which is derived from d_stripesize,
> to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.
>
>   This may result in a small number of additional I/Os to require
> splitting in nvme(4), however the NVMe I/O path is very efficient so these
> additional I/Os will cause very minimal (if any) difference in performance
> or CPU utilisation.
>
>   This can be controller by the new sysctl
> kern.nvme.max_optimal_sectorsize.
>
>   MFC after:1 week
>   Sponsored by: Multiplay
>   Differential Revision:https://reviews.freebsd.org/D4446
>
> Modified:
>   head/sys/dev/nvd/nvd.c
>   head/sys/dev/nvme/nvme.h
>   head/sys/dev/nvme/nvme_ns.c
>   head/sys/dev/nvme/nvme_sysctl.c
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"