Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Steve McCoy

Hi Adrian,

I've submitted the PR as kern/170021.

Thanks!

On 7/19/12 11:29 AM, Adrian Chadd wrote:

Oh, and would you please file a PR for this? I've been looking into
ACPI related slowdowns for a while and I'm glad you found a culprit.



Adrian


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Steve McCoy

On 7/19/12 10:12 AM, Eric van Gyzen wrote:


You might simply try a different idle function.  See these sysctls:

machdep.idle: acpi
machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi,

Eric


I've tried your suggestion (with mwait) and the problem went away. Thanks a lot!
This seems like a good workaround, but I am worried about whether it could
negatively affect something that I don't know about which also depends on this 
sysctl.
If you have any ideas on other areas I could test, I'd greatly appreciate the 
info.

Thanks again!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Alexander Motin

On 19.07.2012 18:28, Adrian Chadd wrote:

Hm! A timer related bug?

I'll CC mav@ on this, as it was his commit (and work in his general area.)

I wonder what's going on - is it something to do with the two ACPI
calls inserted there, or is it something to do with the change in
event timer values?

mav? Any ideas?


I can just agree with earlier made guess that for some reason ACPI timer 
on that system is very slow. Unless user explicitly enabled deeper 
C-states, values returned by the timer are not really used for anything, 
so there is just no place for other bug.


When doing this change I was expecting that it may have cost, but on 
most systems that cost makes effect only during high interrupt rates, 
where it is covered by automatic fallback to using faster MWAIT as idle 
method. Unluckily, that code still was not merged to 8-STABLE (only 9). 
I will recheck is there problem to merge it now.


Manual switching to MWAIT via sysctl is correct workaround for this 
situation. It may give slightly higher power consumption, but for this 
workload with many interrupts probably the best possible performance.



On 17 July 2012 13:39, Steve McCoy smc...@greatbaysoftware.com wrote:


Alright, I've finally narrowed it down to r209897, which only affects
acpi_cpu_idle():

--- stable/8/sys/dev/acpica/acpi_cpu.c  2010/06/23 17:04:42 209471
+++ stable/8/sys/dev/acpica/acpi_cpu.c  2010/07/11 11:58:46 209897
@@ -930,12 +930,16 @@

  /*
   * Execute HLT (or equivalent) and wait for an interrupt.  We can't
- * calculate the time spent in C1 since the place we wake up is an
- * ISR.  Assume we slept half of quantum and return.
+ * precisely calculate the time spent in C1 since the place we wake up
+ * is an ISR.  Assume we slept no more then half of quantum.
   */
  if (cx_next-type == ACPI_STATE_C1) {
-   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 + 50 / hz) / 4;
+   AcpiHwRead(start_time, AcpiGbl_FADT.XPmTimerBlock);
 acpi_cpu_c1();
+   AcpiHwRead(end_time, AcpiGbl_FADT.XPmTimerBlock);
+end_time = acpi_TimerDelta(end_time, start_time);
+   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 +
+   min(PM_USEC(end_time), 50 / hz)) / 4;
 return;
  }

My current guess is that AcpiHwRead() is a problem on our hardware. It's an
isolated change and, to my desperate eyes, the commit message implies that
it isn't critical — Do you think we could buy ourselves some time by pulling
it out of our version of the kernel? Or is this essential for correctness?
Any thoughts are appreciated, thanks!



--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Adrian Chadd
Hi Alexander,

I'm worried that this won't be the only source of freebsd is slower
than linux issues.

What can we add to the timer path to make identifying and root causing
this issue easy? I'd just like to be absolutely sure that we're not
only doing the best job possible, but we can provide some tools and
statistics to the user/administrator so as to make debugging much
easier.

Thanks,



Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Alexander Motin

On 20.07.2012 16:38, Alexander Motin wrote:

On 19.07.2012 18:28, Adrian Chadd wrote:

Hm! A timer related bug?

I'll CC mav@ on this, as it was his commit (and work in his general
area.)

I wonder what's going on - is it something to do with the two ACPI
calls inserted there, or is it something to do with the change in
event timer values?

mav? Any ideas?


I can just agree with earlier made guess that for some reason ACPI timer
on that system is very slow. Unless user explicitly enabled deeper
C-states, values returned by the timer are not really used for anything,
so there is just no place for other bug.

When doing this change I was expecting that it may have cost, but on
most systems that cost makes effect only during high interrupt rates,
where it is covered by automatic fallback to using faster MWAIT as idle
method. Unluckily, that code still was not merged to 8-STABLE (only 9).
I will recheck is there problem to merge it now.


I've just merged that to 8-STABLE at r238658. Testers are welcome.


Manual switching to MWAIT via sysctl is correct workaround for this
situation. It may give slightly higher power consumption, but for this
workload with many interrupts probably the best possible performance.


On 17 July 2012 13:39, Steve McCoy smc...@greatbaysoftware.com wrote:


Alright, I've finally narrowed it down to r209897, which only affects
acpi_cpu_idle():

--- stable/8/sys/dev/acpica/acpi_cpu.c  2010/06/23 17:04:42 209471
+++ stable/8/sys/dev/acpica/acpi_cpu.c  2010/07/11 11:58:46 209897
@@ -930,12 +930,16 @@

  /*
   * Execute HLT (or equivalent) and wait for an interrupt.  We
can't
- * calculate the time spent in C1 since the place we wake up is an
- * ISR.  Assume we slept half of quantum and return.
+ * precisely calculate the time spent in C1 since the place we
wake up
+ * is an ISR.  Assume we slept no more then half of quantum.
   */
  if (cx_next-type == ACPI_STATE_C1) {
-   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 + 50 / hz) / 4;
+   AcpiHwRead(start_time, AcpiGbl_FADT.XPmTimerBlock);
 acpi_cpu_c1();
+   AcpiHwRead(end_time, AcpiGbl_FADT.XPmTimerBlock);
+end_time = acpi_TimerDelta(end_time, start_time);
+   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 +
+   min(PM_USEC(end_time), 50 / hz)) / 4;
 return;
  }

My current guess is that AcpiHwRead() is a problem on our hardware.
It's an
isolated change and, to my desperate eyes, the commit message implies
that
it isn't critical — Do you think we could buy ourselves some time by
pulling
it out of our version of the kernel? Or is this essential for
correctness?
Any thoughts are appreciated, thanks!






--
Alexander Motin


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-20 Thread Alexander Motin

Hi.

On 20.07.2012 22:38, Adrian Chadd wrote:

I'm worried that this won't be the only source of freebsd is slower
than linux issues.

What can we add to the timer path to make identifying and root causing
this issue easy? I'd just like to be absolutely sure that we're not
only doing the best job possible, but we can provide some tools and
statistics to the user/administrator so as to make debugging much
easier.


The only instrument to diagnose this problem without provided input I 
could propose is hwpmc profiling. It should be able to show that we are 
spending much time in those timer routines. If we guessed somehow that 
reason is in slow ACPI timer, it is easy to write respective benchmark, 
but we can't write tests for everything, and even if we could, users 
won't be able to run/analyze output of them without some level of knowledge.


I've spent much time profiling that on hardware I have, but the only way 
to be sure in general case I see is more testing and feedbacks. For this 
specific area I am using very simple test, that effectively depends on 
interrupt latency and CPUs wakeup times: `dd if=/dev/ada0 of=/dev/null 
bs=512`. Depending on device, controller and other factors, gives me 
about 20-30K IOPS.


If you have some ideas what and how could we test automatically -- welcome.

--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-19 Thread Steve McCoy

On 7/13/12 9:39 AM, John Baldwin wrote:

On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote:

On 7/12/12 4:34 PM, Steve McCoy wrote:

John Baldwin wrote:


Barring that, can you do a binary search of kernels from stable/8
between 8.1
and 8.2 on an 8.1 world to see which commit caused the change in write
performance?



Hi John, I'm working with Charles to narrow this down.

Looks like revision 212229 is the culprit, or at least around the same
time to it, if this change isn't what slowed things down. The change to
sys/kern/vfs_bio.c modifies some synchronization in dev_strategy():



Actually, hold that thought. I had a hunch that I wasn't thorough
enough, so I decided to try 212228 — the performance is the same as with
212229, so vfs_bio seems to be out of the picture. I'm going to binary
search between 209459 and 212229, and see what I find.


Ok.  Please let me know what you find.  Thanks!



Alright, I've finally narrowed it down to r209897, which only affects 
acpi_cpu_idle():


--- stable/8/sys/dev/acpica/acpi_cpu.c  2010/06/23 17:04:42 209471
+++ stable/8/sys/dev/acpica/acpi_cpu.c  2010/07/11 11:58:46 209897
@@ -930,12 +930,16 @@

 /*
  * Execute HLT (or equivalent) and wait for an interrupt.  We can't
- * calculate the time spent in C1 since the place we wake up is an
- * ISR.  Assume we slept half of quantum and return.
+ * precisely calculate the time spent in C1 since the place we wake up
+ * is an ISR.  Assume we slept no more then half of quantum.
  */
 if (cx_next-type == ACPI_STATE_C1) {
-   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 + 50 / hz) / 4;
+   AcpiHwRead(start_time, AcpiGbl_FADT.XPmTimerBlock);
acpi_cpu_c1();
+   AcpiHwRead(end_time, AcpiGbl_FADT.XPmTimerBlock);
+end_time = acpi_TimerDelta(end_time, start_time);
+   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 +
+   min(PM_USEC(end_time), 50 / hz)) / 4;
return;
 }

My current guess is that AcpiHwRead() is a problem on our hardware. It's 
an isolated change and, to my desperate eyes, the commit message implies 
that it isn't critical — Do you think we could buy ourselves some time 
by pulling it out of our version of the kernel? Or is this essential for 
correctness? Any thoughts are appreciated, thanks!

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-19 Thread Eric van Gyzen

On 07/17/12 15:39, Steve McCoy wrote:

On 7/13/12 9:39 AM, John Baldwin wrote:

On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote:

On 7/12/12 4:34 PM, Steve McCoy wrote:

John Baldwin wrote:


Barring that, can you do a binary search of kernels from stable/8
between 8.1
and 8.2 on an 8.1 world to see which commit caused the change in write
performance?



Hi John, I'm working with Charles to narrow this down.

Looks like revision 212229 is the culprit, or at least around the same
time to it, if this change isn't what slowed things down. The change to
sys/kern/vfs_bio.c modifies some synchronization in dev_strategy():



Actually, hold that thought. I had a hunch that I wasn't thorough
enough, so I decided to try 212228 — the performance is the same as with
212229, so vfs_bio seems to be out of the picture. I'm going to binary
search between 209459 and 212229, and see what I find.


Ok. Please let me know what you find. Thanks!



Alright, I've finally narrowed it down to r209897, which only affects
acpi_cpu_idle():

--- stable/8/sys/dev/acpica/acpi_cpu.c 2010/06/23 17:04:42 209471
+++ stable/8/sys/dev/acpica/acpi_cpu.c 2010/07/11 11:58:46 209897
@@ -930,12 +930,16 @@

/*
* Execute HLT (or equivalent) and wait for an interrupt. We can't
- * calculate the time spent in C1 since the place we wake up is an
- * ISR. Assume we slept half of quantum and return.
+ * precisely calculate the time spent in C1 since the place we wake up
+ * is an ISR. Assume we slept no more then half of quantum.
*/
if (cx_next-type == ACPI_STATE_C1) {
- sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 + 50 / hz) / 4;
+ AcpiHwRead(start_time, AcpiGbl_FADT.XPmTimerBlock);
acpi_cpu_c1();
+ AcpiHwRead(end_time, AcpiGbl_FADT.XPmTimerBlock);
+ end_time = acpi_TimerDelta(end_time, start_time);
+ sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 +
+ min(PM_USEC(end_time), 50 / hz)) / 4;
return;
}

My current guess is that AcpiHwRead() is a problem on our hardware. It's
an isolated change and, to my desperate eyes, the commit message implies
that it isn't critical — Do you think we could buy ourselves some time
by pulling it out of our version of the kernel? Or is this essential for
correctness? Any thoughts are appreciated, thanks!


You might simply try a different idle function.  See these sysctls:

machdep.idle: acpi
machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi,

Eric
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-19 Thread Adrian Chadd
Hm! A timer related bug?

I'll CC mav@ on this, as it was his commit (and work in his general area.)

I wonder what's going on - is it something to do with the two ACPI
calls inserted there, or is it something to do with the change in
event timer values?

mav? Any ideas?


Adrian

On 17 July 2012 13:39, Steve McCoy smc...@greatbaysoftware.com wrote:

 Alright, I've finally narrowed it down to r209897, which only affects
 acpi_cpu_idle():

 --- stable/8/sys/dev/acpica/acpi_cpu.c  2010/06/23 17:04:42 209471
 +++ stable/8/sys/dev/acpica/acpi_cpu.c  2010/07/11 11:58:46 209897
 @@ -930,12 +930,16 @@

  /*
   * Execute HLT (or equivalent) and wait for an interrupt.  We can't
 - * calculate the time spent in C1 since the place we wake up is an
 - * ISR.  Assume we slept half of quantum and return.
 + * precisely calculate the time spent in C1 since the place we wake up
 + * is an ISR.  Assume we slept no more then half of quantum.
   */
  if (cx_next-type == ACPI_STATE_C1) {
 -   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 + 50 / hz) / 4;
 +   AcpiHwRead(start_time, AcpiGbl_FADT.XPmTimerBlock);
 acpi_cpu_c1();
 +   AcpiHwRead(end_time, AcpiGbl_FADT.XPmTimerBlock);
 +end_time = acpi_TimerDelta(end_time, start_time);
 +   sc-cpu_prev_sleep = (sc-cpu_prev_sleep * 3 +
 +   min(PM_USEC(end_time), 50 / hz)) / 4;
 return;
  }

 My current guess is that AcpiHwRead() is a problem on our hardware. It's an
 isolated change and, to my desperate eyes, the commit message implies that
 it isn't critical — Do you think we could buy ourselves some time by pulling
 it out of our version of the kernel? Or is this essential for correctness?
 Any thoughts are appreciated, thanks!

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-19 Thread Adrian Chadd
Oh, and would you please file a PR for this? I've been looking into
ACPI related slowdowns for a while and I'm glad you found a culprit.



Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-07-13 Thread John Baldwin
On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote:
 On 7/12/12 4:34 PM, Steve McCoy wrote:
  On 7/12/12 4:14 PM, Charles Owens wrote:
  On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
 
  On 6/15/12 8:04 AM, John Baldwin wrote:
   On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
   Hello FreeBSD folk,
  
   We're seeing what appears to be a storage performance regression
  as we
   try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
   appears that the regression happened between 8.1 and 8.2.
  
   Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
  CPUs.
   Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
   configuration, using UFS + geom_journal for filesystem.
  
   Postgresql performance, as seen via pgbench, dropped by approx 20%.
   This testing was done with our usual PAE-enabled kernels.  We then
  went
   back to GENERIC kernels and did comparisons using bonnie, results
   below.  Following that is a kernel boot log.
  
   Notably, we're seeing this regression only with our RAID mfi(4) based
   systems.  Notably, from looking at FreeBSD source changelogs it
  appears
   that the mfi(4) code has seen some changes since 8.1.
   Between 8.1 and 8.2 mfi has not had any significant changes.  The
  only changes
   made to sys/dev/mfi were to add a new constant:
  
   svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
   svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
   Index: mfireg.h
   ===
   --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
   +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
   @@ -975,7 +975,9 @@
MFI_PD_STATE_OFFLINE = 0x10,
MFI_PD_STATE_FAILED = 0x11,
MFI_PD_STATE_REBUILD = 0x14,
   -   MFI_PD_STATE_ONLINE = 0x18
   +   MFI_PD_STATE_ONLINE = 0x18,
   +   MFI_PD_STATE_COPYBACK = 0x20,
   +   MFI_PD_STATE_SYSTEM = 0x40
 };
  
 union mfi_ld_ref {
  
   The difference in write performance must be due to something else.
  You
   mentioned you are using UFS + gjournal.  I think gjournal uses
  BIO_FLUSH, so I
   wonder if this is related:
  
  
  
   r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
  lines
  
   MFC 212160:
  
   Correct bioq_disksort so that bioq_insert_tail() offers barrier
  semantic.
   Add the BIO_ORDERED flag for struct bio and update bio clients to
  use it.
  
   The barrier semantics of bioq_insert_tail() were broken in two ways:
  
 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.
  
 o The last_offset used to generate the sort key for newly queued
  bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates
  last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   blocked bios are released, so using the barrier's offset for
   last_offset is the optimal choice.
  
   sys/geom/sched/subr_disk.c:
   sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().
  
o Only update last_offset in bioq_remove() if the removed
  bio is
  at the head of the queue (typically due to a call via
  bioq_takefirst()) and no barrier is active.
  
o In bioq_disksort(), if we have a barrier (insert_point is
  non-NULL),
  set prev to the barrier and cur to it's next element.
  Now that
  last_offset is kept at the barrier position, this change
  isn't
  strictly necessary, but since we have to take a decision
  branch
  anyway, it does avoid one, no-op, loop iteration in the
  while
  loop that immediately follows.
  
o In bioq_disksort(), bypass the normal sort for bios with
  the
  BIO_ORDERED attribute and instead insert them into the
  queue
  with bioq_insert_tail().  bioq_insert_tail() not only gives
  the desired command order during insertion, but also
  provides
  barrier semantics so that commands disksorted in the future
  cannot pass the just enqueued transaction.
  
   sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct
  bio.
  
   sys/cam/ata/ata_da.c:
   sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.
  
   sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.
  
Wrap some lines to 80 columns.
  
   

Re: mfi(4) IO performance regression, post 8.1

2012-07-12 Thread Steve McCoy

On 7/12/12 4:14 PM, Charles Owens wrote:

On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:


On 6/15/12 8:04 AM, John Baldwin wrote:
 On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
 Hello FreeBSD folk,

 We're seeing what appears to be a storage performance regression as we
 try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
 appears that the regression happened between 8.1 and 8.2.

 Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
CPUs.
 Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
 configuration, using UFS + geom_journal for filesystem.

 Postgresql performance, as seen via pgbench, dropped by approx 20%.
 This testing was done with our usual PAE-enabled kernels.  We then
went
 back to GENERIC kernels and did comparisons using bonnie, results
 below.  Following that is a kernel boot log.

 Notably, we're seeing this regression only with our RAID mfi(4) based
 systems.  Notably, from looking at FreeBSD source changelogs it
appears
 that the mfi(4) code has seen some changes since 8.1.
 Between 8.1 and 8.2 mfi has not had any significant changes.  The
only changes
 made to sys/dev/mfi were to add a new constant:

 svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
 svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
 Index: mfireg.h
 ===
 --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
 +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
 @@ -975,7 +975,9 @@
  MFI_PD_STATE_OFFLINE = 0x10,
  MFI_PD_STATE_FAILED = 0x11,
  MFI_PD_STATE_REBUILD = 0x14,
 -   MFI_PD_STATE_ONLINE = 0x18
 +   MFI_PD_STATE_ONLINE = 0x18,
 +   MFI_PD_STATE_COPYBACK = 0x20,
 +   MFI_PD_STATE_SYSTEM = 0x40
   };

   union mfi_ld_ref {

 The difference in write performance must be due to something else.  You
 mentioned you are using UFS + gjournal.  I think gjournal uses
BIO_FLUSH, so I
 wonder if this is related:



 r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
lines

 MFC 212160:

 Correct bioq_disksort so that bioq_insert_tail() offers barrier
semantic.
 Add the BIO_ORDERED flag for struct bio and update bio clients to
use it.

 The barrier semantics of bioq_insert_tail() were broken in two ways:

   o In bioq_disksort(), an added bio could be inserted at the head of
 the queue, even when a barrier was present, if the sort key for
 the new entry was less than that of the last queued barrier bio.

   o The last_offset used to generate the sort key for newly queued bios
 did not stay at the position of the barrier until either the
 barrier was de-queued, or a new barrier (which updates last_offset)
 was queued.  When a barrier is in effect, we know that the disk
 will pass through the barrier position just before the
 blocked bios are released, so using the barrier's offset for
 last_offset is the optimal choice.

 sys/geom/sched/subr_disk.c:
 sys/kern/subr_disk.c:
  o Update last_offset in bioq_insert_tail().

  o Only update last_offset in bioq_remove() if the removed
bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.

  o In bioq_disksort(), if we have a barrier (insert_point is
non-NULL),
set prev to the barrier and cur to it's next element.
Now that
last_offset is kept at the barrier position, this change
isn't
strictly necessary, but since we have to take a decision
branch
anyway, it does avoid one, no-op, loop iteration in the
while
loop that immediately follows.

  o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail().  bioq_insert_tail() not only gives
the desired command order during insertion, but also
provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.

 sys/sys/bio.h:
  Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

 sys/cam/ata/ata_da.c:
 sys/cam/scsi/scsi_da.c
  Use an ordered command for SCSI/ATA-NCQ commands issued in
  response to bios with the BIO_ORDERED flag set.

 sys/cam/scsi/scsi_da.c
  Use an ordered tag when issuing a synchronize cache command.

  Wrap some lines to 80 columns.

 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 sys/geom/geom_io.c
  Mark bios with the BIO_FLUSH command as BIO_ORDERED.

 Sponsored by:   Spectra Logic Corporation



 Can you try perhaps commenting out the 'bp-bio_flags |=
BIO_ORDERED' line
 changed in geom_io.c in 8.2?  That 

Re: mfi(4) IO performance regression, post 8.1

2012-07-12 Thread Steve McCoy

On 7/12/12 4:34 PM, Steve McCoy wrote:

On 7/12/12 4:14 PM, Charles Owens wrote:

On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:


On 6/15/12 8:04 AM, John Baldwin wrote:
 On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
 Hello FreeBSD folk,

 We're seeing what appears to be a storage performance regression
as we
 try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
 appears that the regression happened between 8.1 and 8.2.

 Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
CPUs.
 Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
 configuration, using UFS + geom_journal for filesystem.

 Postgresql performance, as seen via pgbench, dropped by approx 20%.
 This testing was done with our usual PAE-enabled kernels.  We then
went
 back to GENERIC kernels and did comparisons using bonnie, results
 below.  Following that is a kernel boot log.

 Notably, we're seeing this regression only with our RAID mfi(4) based
 systems.  Notably, from looking at FreeBSD source changelogs it
appears
 that the mfi(4) code has seen some changes since 8.1.
 Between 8.1 and 8.2 mfi has not had any significant changes.  The
only changes
 made to sys/dev/mfi were to add a new constant:

 svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
 svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
 Index: mfireg.h
 ===
 --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
 +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
 @@ -975,7 +975,9 @@
  MFI_PD_STATE_OFFLINE = 0x10,
  MFI_PD_STATE_FAILED = 0x11,
  MFI_PD_STATE_REBUILD = 0x14,
 -   MFI_PD_STATE_ONLINE = 0x18
 +   MFI_PD_STATE_ONLINE = 0x18,
 +   MFI_PD_STATE_COPYBACK = 0x20,
 +   MFI_PD_STATE_SYSTEM = 0x40
   };

   union mfi_ld_ref {

 The difference in write performance must be due to something else.
You
 mentioned you are using UFS + gjournal.  I think gjournal uses
BIO_FLUSH, so I
 wonder if this is related:



 r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
lines

 MFC 212160:

 Correct bioq_disksort so that bioq_insert_tail() offers barrier
semantic.
 Add the BIO_ORDERED flag for struct bio and update bio clients to
use it.

 The barrier semantics of bioq_insert_tail() were broken in two ways:

   o In bioq_disksort(), an added bio could be inserted at the head of
 the queue, even when a barrier was present, if the sort key for
 the new entry was less than that of the last queued barrier bio.

   o The last_offset used to generate the sort key for newly queued
bios
 did not stay at the position of the barrier until either the
 barrier was de-queued, or a new barrier (which updates
last_offset)
 was queued.  When a barrier is in effect, we know that the disk
 will pass through the barrier position just before the
 blocked bios are released, so using the barrier's offset for
 last_offset is the optimal choice.

 sys/geom/sched/subr_disk.c:
 sys/kern/subr_disk.c:
  o Update last_offset in bioq_insert_tail().

  o Only update last_offset in bioq_remove() if the removed
bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.

  o In bioq_disksort(), if we have a barrier (insert_point is
non-NULL),
set prev to the barrier and cur to it's next element.
Now that
last_offset is kept at the barrier position, this change
isn't
strictly necessary, but since we have to take a decision
branch
anyway, it does avoid one, no-op, loop iteration in the
while
loop that immediately follows.

  o In bioq_disksort(), bypass the normal sort for bios with
the
BIO_ORDERED attribute and instead insert them into the
queue
with bioq_insert_tail().  bioq_insert_tail() not only gives
the desired command order during insertion, but also
provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.

 sys/sys/bio.h:
  Add BIO_ORDERED as bit 4 of the bio_flags field in struct
bio.

 sys/cam/ata/ata_da.c:
 sys/cam/scsi/scsi_da.c
  Use an ordered command for SCSI/ATA-NCQ commands issued in
  response to bios with the BIO_ORDERED flag set.

 sys/cam/scsi/scsi_da.c
  Use an ordered tag when issuing a synchronize cache command.

  Wrap some lines to 80 columns.

 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 sys/geom/geom_io.c
  Mark bios with the BIO_FLUSH command as BIO_ORDERED.

 Sponsored by:   Spectra Logic Corporation



 Can you try perhaps commenting out the 'bp-bio_flags |=
BIO_ORDERED' line
 

Re: mfi(4) IO performance regression, post 8.1

2012-06-22 Thread John Baldwin
On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
 
 On 6/15/12 8:04 AM, John Baldwin wrote:
  On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
  Hello FreeBSD folk,
 
  We're seeing what appears to be a storage performance regression as we
  try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
  appears that the regression happened between 8.1 and 8.2.
 
  Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
  Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
  configuration, using UFS + geom_journal for filesystem.
 
  Postgresql performance, as seen via pgbench, dropped by approx 20%.
  This testing was done with our usual PAE-enabled kernels.  We then went
  back to GENERIC kernels and did comparisons using bonnie, results
  below.  Following that is a kernel boot log.
 
  Notably, we're seeing this regression only with our RAID mfi(4) based
  systems.  Notably, from looking at FreeBSD source changelogs it appears
  that the mfi(4) code has seen some changes since 8.1.
  Between 8.1 and 8.2 mfi has not had any significant changes.  The only 
  changes
  made to sys/dev/mfi were to add a new constant:
 
  svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
  svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
  Index: mfireg.h
  ===
  --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
  +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
  @@ -975,7 +975,9 @@
   MFI_PD_STATE_OFFLINE = 0x10,
   MFI_PD_STATE_FAILED = 0x11,
   MFI_PD_STATE_REBUILD = 0x14,
  -   MFI_PD_STATE_ONLINE = 0x18
  +   MFI_PD_STATE_ONLINE = 0x18,
  +   MFI_PD_STATE_COPYBACK = 0x20,
  +   MFI_PD_STATE_SYSTEM = 0x40
};

union mfi_ld_ref {
 
  The difference in write performance must be due to something else.  You
  mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, 
  so I
  wonder if this is related:
 
  
  r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines
 
  MFC 212160:
 
  Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
  Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
 
  The barrier semantics of bioq_insert_tail() were broken in two ways:
 
o In bioq_disksort(), an added bio could be inserted at the head of
  the queue, even when a barrier was present, if the sort key for
  the new entry was less than that of the last queued barrier bio.
 
o The last_offset used to generate the sort key for newly queued bios
  did not stay at the position of the barrier until either the
  barrier was de-queued, or a new barrier (which updates last_offset)
  was queued.  When a barrier is in effect, we know that the disk
  will pass through the barrier position just before the
  blocked bios are released, so using the barrier's offset for
  last_offset is the optimal choice.
 
  sys/geom/sched/subr_disk.c:
  sys/kern/subr_disk.c:
   o Update last_offset in bioq_insert_tail().
 
   o Only update last_offset in bioq_remove() if the removed bio is
 at the head of the queue (typically due to a call via
 bioq_takefirst()) and no barrier is active.
 
   o In bioq_disksort(), if we have a barrier (insert_point is 
  non-NULL),
 set prev to the barrier and cur to it's next element.  Now that
 last_offset is kept at the barrier position, this change isn't
 strictly necessary, but since we have to take a decision branch
 anyway, it does avoid one, no-op, loop iteration in the while
 loop that immediately follows.
 
   o In bioq_disksort(), bypass the normal sort for bios with the
 BIO_ORDERED attribute and instead insert them into the queue
 with bioq_insert_tail().  bioq_insert_tail() not only gives
 the desired command order during insertion, but also provides
 barrier semantics so that commands disksorted in the future
 cannot pass the just enqueued transaction.
 
  sys/sys/bio.h:
   Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
 
  sys/cam/ata/ata_da.c:
  sys/cam/scsi/scsi_da.c
   Use an ordered command for SCSI/ATA-NCQ commands issued in
   response to bios with the BIO_ORDERED flag set.
 
  sys/cam/scsi/scsi_da.c
   Use an ordered tag when issuing a synchronize cache command.
 
   Wrap some lines to 80 columns.
 
  sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
  sys/geom/geom_io.c
   Mark bios with the BIO_FLUSH command as BIO_ORDERED.
 
  Sponsored by:   Spectra Logic Corporation
  
 
  Can you try perhaps commenting out 

Re: mfi(4) IO performance regression, post 8.1

2012-06-21 Thread Charles Owens


On 6/15/12 8:04 AM, John Baldwin wrote:

On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:

Hello FreeBSD folk,

We're seeing what appears to be a storage performance regression as we
try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
appears that the regression happened between 8.1 and 8.2.

Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
configuration, using UFS + geom_journal for filesystem.

Postgresql performance, as seen via pgbench, dropped by approx 20%.
This testing was done with our usual PAE-enabled kernels.  We then went
back to GENERIC kernels and did comparisons using bonnie, results
below.  Following that is a kernel boot log.

Notably, we're seeing this regression only with our RAID mfi(4) based
systems.  Notably, from looking at FreeBSD source changelogs it appears
that the mfi(4) code has seen some changes since 8.1.

Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
made to sys/dev/mfi were to add a new constant:


svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi

svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
Index: mfireg.h
===
--- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
+++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
@@ -975,7 +975,9 @@
 MFI_PD_STATE_OFFLINE = 0x10,
 MFI_PD_STATE_FAILED = 0x11,
 MFI_PD_STATE_REBUILD = 0x14,
-   MFI_PD_STATE_ONLINE = 0x18
+   MFI_PD_STATE_ONLINE = 0x18,
+   MFI_PD_STATE_COPYBACK = 0x20,
+   MFI_PD_STATE_SYSTEM = 0x40
  };
  
  union mfi_ld_ref {


The difference in write performance must be due to something else.  You
mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I
wonder if this is related:


r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines

MFC 212160:

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

  o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.

  o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued.  When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
blocked bios are released, so using the barrier's offset for
last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
 o Update last_offset in bioq_insert_tail().

 o Only update last_offset in bioq_remove() if the removed bio is
   at the head of the queue (typically due to a call via
   bioq_takefirst()) and no barrier is active.

 o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
   set prev to the barrier and cur to it's next element.  Now that
   last_offset is kept at the barrier position, this change isn't
   strictly necessary, but since we have to take a decision branch
   anyway, it does avoid one, no-op, loop iteration in the while
   loop that immediately follows.

 o In bioq_disksort(), bypass the normal sort for bios with the
   BIO_ORDERED attribute and instead insert them into the queue
   with bioq_insert_tail().  bioq_insert_tail() not only gives
   the desired command order during insertion, but also provides
   barrier semantics so that commands disksorted in the future
   cannot pass the just enqueued transaction.

sys/sys/bio.h:
 Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
 Use an ordered command for SCSI/ATA-NCQ commands issued in
 response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
 Use an ordered tag when issuing a synchronize cache command.

 Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
 Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:   Spectra Logic Corporation


Can you try perhaps commenting out the 'bp-bio_flags |= BIO_ORDERED' line
changed in geom_io.c in 8.2?  That would be effectively reverting this
portion of the diff:

Index: geom_io.c
===
--- geom_io.c   (.../8.1/sys/geom)  

Re: mfi(4) IO performance regression, post 8.1

2012-06-15 Thread Charles Owens
Yes, of course. So far I can say that the major shift appears to have occurred 
between 8.1 and 8.2 .

Thanks,

Charles Owens 
Great Bay Software,  Inc.

Sent from my phone

- Reply message -
From: Adrian Chadd adr...@freebsd.org
To: Charles Owens cow...@greatbaysoftware.com
Cc: sta...@freebsd.org
Subject: mfi(4) IO performance regression, post 8.1
Date: Fri, Jun 15, 2012 1:55 am


Hm, can you try different subversion checkouts of the kernel tree
between 8.1 and 8.3, to pinpoint which commit(s) broke things?



ADrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: mfi(4) IO performance regression, post 8.1

2012-06-15 Thread Brian W.
Curious why you are preferring i386 +PAE as opposed to amd64?
On Jun 15, 2012 4:09 AM, Charles Owens cow...@greatbaysoftware.com
wrote:

 Yes, of course. So far I can say that the major shift appears to have
 occurred between 8.1 and 8.2 .

 Thanks,

 Charles Owens
 Great Bay Software,  Inc.

 Sent from my phone

 - Reply message -
 From: Adrian Chadd adr...@freebsd.org
 To: Charles Owens cow...@greatbaysoftware.com
 Cc: sta...@freebsd.org
 Subject: mfi(4) IO performance regression, post 8.1
 Date: Fri, Jun 15, 2012 1:55 am


 Hm, can you try different subversion checkouts of the kernel tree
 between 8.1 and 8.3, to pinpoint which commit(s) broke things?



 ADrian

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) IO performance regression, post 8.1

2012-06-15 Thread John Baldwin
On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
 Hello FreeBSD folk,
 
 We're seeing what appears to be a storage performance regression as we 
 try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it 
 appears that the regression happened between 8.1 and 8.2.
 
 Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.  
 Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 
 configuration, using UFS + geom_journal for filesystem.
 
 Postgresql performance, as seen via pgbench, dropped by approx 20%.  
 This testing was done with our usual PAE-enabled kernels.  We then went 
 back to GENERIC kernels and did comparisons using bonnie, results 
 below.  Following that is a kernel boot log.
 
 Notably, we're seeing this regression only with our RAID mfi(4) based 
 systems.  Notably, from looking at FreeBSD source changelogs it appears 
 that the mfi(4) code has seen some changes since 8.1.

Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
made to sys/dev/mfi were to add a new constant:

 svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi 
svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
Index: mfireg.h
===
--- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
+++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
@@ -975,7 +975,9 @@
MFI_PD_STATE_OFFLINE = 0x10,
MFI_PD_STATE_FAILED = 0x11,
MFI_PD_STATE_REBUILD = 0x14,
-   MFI_PD_STATE_ONLINE = 0x18
+   MFI_PD_STATE_ONLINE = 0x18,
+   MFI_PD_STATE_COPYBACK = 0x20,
+   MFI_PD_STATE_SYSTEM = 0x40
 };
 
 union mfi_ld_ref {

The difference in write performance must be due to something else.  You 
mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I 
wonder if this is related:


r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines

MFC 212160:

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.

 o The last_offset used to generate the sort key for newly queued bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   blocked bios are released, so using the barrier's offset for
   last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().

o Only update last_offset in bioq_remove() if the removed bio is
  at the head of the queue (typically due to a call via
  bioq_takefirst()) and no barrier is active.

o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
  set prev to the barrier and cur to it's next element.  Now that
  last_offset is kept at the barrier position, this change isn't
  strictly necessary, but since we have to take a decision branch
  anyway, it does avoid one, no-op, loop iteration in the while
  loop that immediately follows.

o In bioq_disksort(), bypass the normal sort for bios with the
  BIO_ORDERED attribute and instead insert them into the queue
  with bioq_insert_tail().  bioq_insert_tail() not only gives
  the desired command order during insertion, but also provides
  barrier semantics so that commands disksorted in the future
  cannot pass the just enqueued transaction.

sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.

Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:   Spectra Logic Corporation


Can you try perhaps commenting out the 'bp-bio_flags |= BIO_ORDERED' line
changed in geom_io.c in 8.2?  That would be effectively reverting this
portion of the diff:

Index: geom_io.c
===
--- geom_io.c   (.../8.1/sys/geom)  (revision 237134)
+++ geom_io.c   (.../8.2/sys/geom)  

Re: mfi(4) IO performance regression, post 8.1

2012-06-15 Thread Charles Owens

No reason other than history... will be changing over at some point.


Charles Owens
Great Bay Software, Inc.


On 6/15/12 10:38 AM, Brian W. wrote:


Curious why you are preferring i386 +PAE as opposed to amd64?

On Jun 15, 2012 4:09 AM, Charles Owens cow...@greatbaysoftware.com 
mailto:cow...@greatbaysoftware.com wrote:


Yes, of course. So far I can say that the major shift appears to
have occurred between 8.1 and 8.2 .

Thanks,

Charles Owens
Great Bay Software,  Inc.

Sent from my phone

- Reply message -
From: Adrian Chadd adr...@freebsd.org mailto:adr...@freebsd.org
To: Charles Owens cow...@greatbaysoftware.com
mailto:cow...@greatbaysoftware.com
Cc: sta...@freebsd.org mailto:sta...@freebsd.org
Subject: mfi(4) IO performance regression, post 8.1
Date: Fri, Jun 15, 2012 1:55 am


Hm, can you try different subversion checkouts of the kernel tree
between 8.1 and 8.3, to pinpoint which commit(s) broke things?



ADrian

___
freebsd-stable@freebsd.org mailto:freebsd-stable@freebsd.org
mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
freebsd-stable-unsubscr...@freebsd.org
mailto:freebsd-stable-unsubscr...@freebsd.org


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


mfi(4) IO performance regression, post 8.1

2012-06-14 Thread Charles Owens

Hello FreeBSD folk,

We're seeing what appears to be a storage performance regression as we 
try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it 
appears that the regression happened between 8.1 and 8.2.


Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.  
Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 
configuration, using UFS + geom_journal for filesystem.


Postgresql performance, as seen via pgbench, dropped by approx 20%.  
This testing was done with our usual PAE-enabled kernels.  We then went 
back to GENERIC kernels and did comparisons using bonnie, results 
below.  Following that is a kernel boot log.


Notably, we're seeing this regression only with our RAID mfi(4) based 
systems.  Notably, from looking at FreeBSD source changelogs it appears 
that the mfi(4) code has seen some changes since 8.1.


How can I investigate further?  Assistance with sorting this out would 
be greatly appreciated.


Thanks much,

Charles



Bonnie comparison

8.3 GENERIC
  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
  100 93580 98.3 82136 26.6 113709 78.8 152081 98.8 3223876 100.0 
233590.3 240.4

8.2 GENERIC
  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
  100 95923 99.1 84042 56.7 110568 69.8 152088 100.6 4290802 82.8 
239779.4 234.5

8.1 GENERIC
  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
  100 140708 100.0 164261 44.4 208553 48.5 153472 100.0 3298756 100.0 
270325.1 238.4



Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-RELEASE-p8 #1: Fri Jan  6 12:13:34 EST 2012

cow...@newcastle.greatbaysoftware.com:/usr/obj/usr/relbuild/os/RELENG_8_1/sys/GENERIC
 i386
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU   E5530  @ 2.40GHz (2394.28-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x106a5  Family = 6  Model = 1a  Stepping = 5
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  
Features2=0x9ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT
  AMD Features=0x2810NX,RDTSCP,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
real memory  = 12884901888 (12288 MB)
avail memory = 2289147904 (2183 MB)
ACPI APIC Table:INTEL  S5520UR
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
 cpu8 (AP): APIC ID: 16
 cpu9 (AP): APIC ID: 17
 cpu10 (AP): APIC ID: 18
 cpu11 (AP): APIC ID: 19
 cpu12 (AP): APIC ID: 20
 cpu13 (AP): APIC ID: 21
 cpu14 (AP): APIC ID: 22
 cpu15 (AP): APIC ID: 23
ioapic0Version 2.0  irqs 0-23 on motherboard
ioapic1Version 2.0  irqs 24-47 on motherboard
lapic0: Forcing LINT1 to edge trigger
kbd0 at kbdmux0
acpi0:INTEL S5520UR  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0:24-bit timer at 3.579545MHz  port 0x408-0x40b on acpi0
cpu0:ACPI CPU  on acpi0
cpu1:ACPI CPU  on acpi0
cpu2:ACPI CPU  on acpi0
cpu3:ACPI CPU  on acpi0
cpu4:ACPI CPU  on acpi0
cpu5:ACPI CPU  on acpi0
cpu6:ACPI CPU  on acpi0
cpu7:ACPI CPU  on acpi0
cpu8:ACPI CPU  on acpi0
cpu9:ACPI CPU  on acpi0
cpu10:ACPI CPU  on acpi0
cpu11:ACPI CPU  on acpi0
cpu12:ACPI CPU  on acpi0
cpu13:ACPI CPU  on acpi0
cpu14:ACPI CPU  on acpi0
cpu15:ACPI CPU  on acpi0
acpi_hpet0:High Precision Event Timer  iomem 0xfed0-0xfed003ff on acpi0
Timecounter HPET frequency 14318180 Hz quality 900
pcib0:ACPI Host-PCI bridge  port 0xcf8-0xcff on acpi0
pci0:ACPI PCI bus  on pcib0
pcib1:ACPI PCI-PCI bridge  irq 28 at device 1.0 on pci0
pci1:ACPI PCI bus  on pcib1
igb0:Intel(R) PRO/1000 Network Connection version - 1.9.5  port 0x4020-0x403f 
mem 0xb1f2-0xb1f3,0xb1f44000-0xb1f47fff irq 40 at device 0.0 on pci1
igb0: Using MSIX interrupts with 5 vectors
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: Ethernet address: 00:15:17:f2:1b:a0
igb1:Intel(R) PRO/1000 Network Connection version 

Re: mfi(4) IO performance regression, post 8.1

2012-06-14 Thread Adrian Chadd
Hm, can you try different subversion checkouts of the kernel tree
between 8.1 and 8.3, to pinpoint which commit(s) broke things?



ADrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org