RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo
Robert wrote:
> Kuan Luo wrote:
> > Robert worte.
> >> Kuan, does this patch (using the notifiers to see if the 
> command is 
> >> really done) still work if one port on the controller has 
> >> ADMA disabled 
> >> because it's in ATAPI mode? I seem to recall Allen Martin 
> mentioning 
> >> that notifiers wouldn't work in this case.
> >>
> > 
> > I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
> one cdrom in
> > the same controller. 
> > I mkfs hd and mounted the cdrom and no error happened.
> > 
> > Allen,  is there anything about notifier that we should pay 
> attention
> > to?
> 
> Assuming not, then this patch should be applied..
> 
> 

The patch should be applied.
We use the notifier register  and there is nothing to do with our
notifier register in atapi mode.

Allen wrote:
I think that's one of the cases where memory notifiers don't work (one
of the drives is not in ADMA mode either because it's ATAPI or it's in
legacy mode).  There's no issue with the notifier registers though. 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo
robert wrote:
> Kuan Luo wrote:
> > Robert worte.
> >> Kuan, does this patch (using the notifiers to see if the 
> command is 
> >> really done) still work if one port on the controller has 
> >> ADMA disabled 
> >> because it's in ATAPI mode? I seem to recall Allen Martin 
> mentioning 
> >> that notifiers wouldn't work in this case.
> >>
> > 
> > I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
> one cdrom in
> > the same controller. 
> > I mkfs hd and mounted the cdrom and no error happened.
> > 
> > Allen,  is there anything about notifier that we should pay 
> attention
> > to?
> 
> Assuming not, then this patch should be applied..
> 
> 

I am asking someone about the issue.
Soon i will be getting a concrete response.
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Robert Hancock

Kuan Luo wrote:

Robert worte.
Kuan, does this patch (using the notifiers to see if the command is 
really done) still work if one port on the controller has 
ADMA disabled 
because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
that notifiers wouldn't work in this case.




I just tried the 2.6.24-rc7 sata_nv driver with one hd and  one cdrom in
the same controller. 
I mkfs hd and mounted the cdrom and no error happened.


Allen,  is there anything about notifier that we should pay attention
to?


Assuming not, then this patch should be applied..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Robert Hancock

Kuan Luo wrote:

Robert worte.
Kuan, does this patch (using the notifiers to see if the command is 
really done) still work if one port on the controller has 
ADMA disabled 
because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
that notifiers wouldn't work in this case.




I just tried the 2.6.24-rc7 sata_nv driver with one hd and  one cdrom in
the same controller. 
I mkfs hd and mounted the cdrom and no error happened.


Allen,  is there anything about notifier that we should pay attention
to?


Assuming not, then this patch should be applied..

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo
robert wrote:
 Kuan Luo wrote:
  Robert worte.
  Kuan, does this patch (using the notifiers to see if the 
 command is 
  really done) still work if one port on the controller has 
  ADMA disabled 
  because it's in ATAPI mode? I seem to recall Allen Martin 
 mentioning 
  that notifiers wouldn't work in this case.
 
  
  I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
 one cdrom in
  the same controller. 
  I mkfs hd and mounted the cdrom and no error happened.
  
  Allen,  is there anything about notifier that we should pay 
 attention
  to?
 
 Assuming not, then this patch should be applied..
 
 

I am asking someone about the issue.
Soon i will be getting a concrete response.
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo
Robert wrote:
 Kuan Luo wrote:
  Robert worte.
  Kuan, does this patch (using the notifiers to see if the 
 command is 
  really done) still work if one port on the controller has 
  ADMA disabled 
  because it's in ATAPI mode? I seem to recall Allen Martin 
 mentioning 
  that notifiers wouldn't work in this case.
 
  
  I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
 one cdrom in
  the same controller. 
  I mkfs hd and mounted the cdrom and no error happened.
  
  Allen,  is there anything about notifier that we should pay 
 attention
  to?
 
 Assuming not, then this patch should be applied..
 
 

The patch should be applied.
We use the notifier register  and there is nothing to do with our
notifier register in atapi mode.

Allen wrote:
I think that's one of the cases where memory notifiers don't work (one
of the drives is not in ADMA mode either because it's ATAPI or it's in
legacy mode).  There's no issue with the notifier registers though. 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-23 Thread Kuan Luo
Robert worte.
> 
> Kuan, does this patch (using the notifiers to see if the command is 
> really done) still work if one port on the controller has 
> ADMA disabled 
> because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
> that notifiers wouldn't work in this case.
> 

I just tried the 2.6.24-rc7 sata_nv driver with one hd and  one cdrom in
the same controller. 
I mkfs hd and mounted the cdrom and no error happened.

Allen,  is there anything about notifier that we should pay attention
to?

> 
> > 
> > * it sure seems like there are other open sata_nv ADMA 
> issues -- can we 
> > hard-confirm or deny this?  bugzilla wasn't very helpful 
> for me.  It 
> > doesn't seem like we can disable ADMA (to solve those 
> issues) and get 
> > enough test time in (which is what I said a week (or more?) 
> ago too...)
> 
> The NCQ/non-NCQ command switching issue is still hitting some people 
> (last I heard Kuan was looking into this), also there's a 
> hotplug issue 
> that Tejun reported..
> 
I have not yet reproduced the switching issue even if i removed the
udelay function according to your metholds.
I tried the 2.6.24-rc7. 
 I don't know what kernel version can easily reproduce the issue or
mabye i omit some steps during test.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would 
like to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  
can someone work up a good patch for 2.6.24?  The only one I ever 
received was badly word-wrapped, and at the time, Robert seemed 
uncertain of it, so I waited.


I can get you one later today hopefully.


A question came up on this patch, whether it will cause problems with 
ATAPI mode - waiting for a response from the NVIDIA guys.






* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to 
me seems to be hardcoding the consistent DMA mask to 32-bit, but 
using 64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the 
consistent  allocations (it appears that blk_queue_bounce_limit 
setting to 32-bit doesn't prevent this on x86_64). Either we play some 
funky games with changing the DMA mask of the entire device to 32-bit 
if either port is in ATAPI mode (which blew up when I tried it) or we 
add the ability to set the DMA mask independently on each port (like 
by setting the mask on the SCSI device and using that for DMA mapping 
instead) which requires core changes.


Its all funky games that no other driver is doing...  There is one 
guaranteed to work scenario -- set all masks and bounce limits etc. to 
32-bit.  There is also one highly-likely-to-work scenario, disabling 
ADMA by default.


Sure, if you don't mind a potentially significant performance 
regression. All the DMA mask problems are due to the fact that the mask 
settings for both ports are ganged together on the PCI device. If we 
could set the DMA masks on the SCSI device or something else that was 
port-specific, and do the command DMA mapping against that device, then 
most of the wierdness goes away.


It does seem like we're starting to get a bit of NVIDIA interest in 
looking into ADMA issues, which is definitely welcome.





* it sure seems like there are other open sata_nv ADMA issues -- can 
we hard-confirm or deny this?  bugzilla wasn't very helpful for me.  
It doesn't seem like we can disable ADMA (to solve those issues) and 
get enough test time in (which is what I said a week (or more?) ago 
too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug 
issue that Tejun reported..


The former implies we need to disable swncq for 2.6.24, if it's not 
stable yet.


Huh? Nothing to do with SWNCQ, which last I checked was still off by 
default.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Jeff Garzik

Robert Hancock wrote:

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would 
like to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  
can someone work up a good patch for 2.6.24?  The only one I ever 
received was badly word-wrapped, and at the time, Robert seemed 
uncertain of it, so I waited.


I can get you one later today hopefully.



* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to 
me seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the consistent 
 allocations (it appears that blk_queue_bounce_limit setting to 32-bit 
doesn't prevent this on x86_64). Either we play some funky games with 
changing the DMA mask of the entire device to 32-bit if either port is 
in ATAPI mode (which blew up when I tried it) or we add the ability to 
set the DMA mask independently on each port (like by setting the mask on 
the SCSI device and using that for DMA mapping instead) which requires 
core changes.


Its all funky games that no other driver is doing...  There is one 
guaranteed to work scenario -- set all masks and bounce limits etc. to 
32-bit.  There is also one highly-likely-to-work scenario, disabling 
ADMA by default.



* it sure seems like there are other open sata_nv ADMA issues -- can 
we hard-confirm or deny this?  bugzilla wasn't very helpful for me.  
It doesn't seem like we can disable ADMA (to solve those issues) and 
get enough test time in (which is what I said a week (or more?) ago 
too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug issue 
that Tejun reported..


The former implies we need to disable swncq for 2.6.24, if it's not 
stable yet.


Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-23 Thread Robert Hancock

Kuan Luo wrote:

First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 


I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


Kuan, does this patch (using the notifiers to see if the command is 
really done) still work if one port on the controller has ADMA disabled 
because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
that notifiers wouldn't work in this case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Robert Hancock

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would like 
to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  can 
someone work up a good patch for 2.6.24?  The only one I ever received 
was badly word-wrapped, and at the time, Robert seemed uncertain of it, 
so I waited.


I can get you one later today hopefully.



* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to me 
seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the consistent 
 allocations (it appears that blk_queue_bounce_limit setting to 32-bit 
doesn't prevent this on x86_64). Either we play some funky games with 
changing the DMA mask of the entire device to 32-bit if either port is 
in ATAPI mode (which blew up when I tried it) or we add the ability to 
set the DMA mask independently on each port (like by setting the mask on 
the SCSI device and using that for DMA mapping instead) which requires 
core changes.




* it sure seems like there are other open sata_nv ADMA issues -- can we 
hard-confirm or deny this?  bugzilla wasn't very helpful for me.  It 
doesn't seem like we can disable ADMA (to solve those issues) and get 
enough test time in (which is what I said a week (or more?) ago too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug issue 
that Tejun reported..




It seems like we should be able to tackle the first two issues promptly, 
at least.


Jeff





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Jeff Garzik

Robert Hancock wrote:

Kuan Luo wrote:

Robert hancock wrote:
What problem does this resolve? I tested it against the cache 
flush/NCQ write switching problem we've been trying to solve, and it 
doesn't look like it fixes that one - if I apply this patch and then 
remove the udelay(20) in sata_nv.c that I added which prevented me 
from seeing this problem before, it shows up.




First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up.
I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


That looks like a good fix then. (Though a possible optimization would 
be to and the check_commands value with the notifier clear value rather 
than testing against the notifier on each loop. That's fairly minor 
though.)


As I mentioned, this doesn't seem to resolve the problem we're seeing 
with rapidly intermixed NCQ commands and cache flushes (at least, if I 
take out the arbitrary 20usec delay from the driver and add this patch, 
the problem still shows up). It could be a similar problem, though, of 
commands being issued before the controller is really ready for them. If 
you or others at NVIDIA could assist in tracking down that problem it 
would be appreciated..


Ping...  sata_nv status is still a bit open for 2.6.24, and I would like 
to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  can 
someone work up a good patch for 2.6.24?  The only one I ever received 
was badly word-wrapped, and at the time, Robert seemed uncertain of it, 
so I waited.


* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to me 
seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


* it sure seems like there are other open sata_nv ADMA issues -- can we 
hard-confirm or deny this?  bugzilla wasn't very helpful for me.  It 
doesn't seem like we can disable ADMA (to solve those issues) and get 
enough test time in (which is what I said a week (or more?) ago too...)


It seems like we should be able to tackle the first two issues promptly, 
at least.


Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Jeff Garzik

Robert Hancock wrote:

Kuan Luo wrote:

Robert hancock wrote:
What problem does this resolve? I tested it against the cache 
flush/NCQ write switching problem we've been trying to solve, and it 
doesn't look like it fixes that one - if I apply this patch and then 
remove the udelay(20) in sata_nv.c that I added which prevented me 
from seeing this problem before, it shows up.




First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message ata1: CPB flags CMD err,
flags=0x11 when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up.
I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb-resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


That looks like a good fix then. (Though a possible optimization would 
be to and the check_commands value with the notifier clear value rather 
than testing against the notifier on each loop. That's fairly minor 
though.)


As I mentioned, this doesn't seem to resolve the problem we're seeing 
with rapidly intermixed NCQ commands and cache flushes (at least, if I 
take out the arbitrary 20usec delay from the driver and add this patch, 
the problem still shows up). It could be a similar problem, though, of 
commands being issued before the controller is really ready for them. If 
you or others at NVIDIA could assist in tracking down that problem it 
would be appreciated..


Ping...  sata_nv status is still a bit open for 2.6.24, and I would like 
to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  can 
someone work up a good patch for 2.6.24?  The only one I ever received 
was badly word-wrapped, and at the time, Robert seemed uncertain of it, 
so I waited.


* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to me 
seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


* it sure seems like there are other open sata_nv ADMA issues -- can we 
hard-confirm or deny this?  bugzilla wasn't very helpful for me.  It 
doesn't seem like we can disable ADMA (to solve those issues) and get 
enough test time in (which is what I said a week (or more?) ago too...)


It seems like we should be able to tackle the first two issues promptly, 
at least.


Jeff



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Robert Hancock

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would like 
to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  can 
someone work up a good patch for 2.6.24?  The only one I ever received 
was badly word-wrapped, and at the time, Robert seemed uncertain of it, 
so I waited.


I can get you one later today hopefully.



* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to me 
seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the consistent 
 allocations (it appears that blk_queue_bounce_limit setting to 32-bit 
doesn't prevent this on x86_64). Either we play some funky games with 
changing the DMA mask of the entire device to 32-bit if either port is 
in ATAPI mode (which blew up when I tried it) or we add the ability to 
set the DMA mask independently on each port (like by setting the mask on 
the SCSI device and using that for DMA mapping instead) which requires 
core changes.




* it sure seems like there are other open sata_nv ADMA issues -- can we 
hard-confirm or deny this?  bugzilla wasn't very helpful for me.  It 
doesn't seem like we can disable ADMA (to solve those issues) and get 
enough test time in (which is what I said a week (or more?) ago too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug issue 
that Tejun reported..




It seems like we should be able to tackle the first two issues promptly, 
at least.


Jeff





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-23 Thread Robert Hancock

Kuan Luo wrote:

First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message ata1: CPB flags CMD err,
flags=0x11 when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 


I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb-resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


Kuan, does this patch (using the notifiers to see if the command is 
really done) still work if one port on the controller has ADMA disabled 
because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
that notifiers wouldn't work in this case.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Jeff Garzik

Robert Hancock wrote:

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would 
like to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  
can someone work up a good patch for 2.6.24?  The only one I ever 
received was badly word-wrapped, and at the time, Robert seemed 
uncertain of it, so I waited.


I can get you one later today hopefully.



* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to 
me seems to be hardcoding the consistent DMA mask to 32-bit, but using 
64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the consistent 
 allocations (it appears that blk_queue_bounce_limit setting to 32-bit 
doesn't prevent this on x86_64). Either we play some funky games with 
changing the DMA mask of the entire device to 32-bit if either port is 
in ATAPI mode (which blew up when I tried it) or we add the ability to 
set the DMA mask independently on each port (like by setting the mask on 
the SCSI device and using that for DMA mapping instead) which requires 
core changes.


Its all funky games that no other driver is doing...  There is one 
guaranteed to work scenario -- set all masks and bounce limits etc. to 
32-bit.  There is also one highly-likely-to-work scenario, disabling 
ADMA by default.



* it sure seems like there are other open sata_nv ADMA issues -- can 
we hard-confirm or deny this?  bugzilla wasn't very helpful for me.  
It doesn't seem like we can disable ADMA (to solve those issues) and 
get enough test time in (which is what I said a week (or more?) ago 
too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug issue 
that Tejun reported..


The former implies we need to disable swncq for 2.6.24, if it's not 
stable yet.


Jeff


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and 2.6.24 (was Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.)

2008-01-23 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:

Jeff Garzik wrote:
Ping...  sata_nv status is still a bit open for 2.6.24, and I would 
like to move us forward a bit.


* Kuan's patch...  it has been confirmed (and is needed), correct?  
can someone work up a good patch for 2.6.24?  The only one I ever 
received was badly word-wrapped, and at the time, Robert seemed 
uncertain of it, so I waited.


I can get you one later today hopefully.


A question came up on this patch, whether it will cause problems with 
ATAPI mode - waiting for a response from the NVIDIA guys.






* ADMA ATAPI 4GB issues...  playing tricks with the ordering of 
allocations and DMA masks is just way too fragile.  We just cannot 
guarantee that all allocators work that way.  The obvious solution to 
me seems to be hardcoding the consistent DMA mask to 32-bit, but 
using 64-bit for regular dma mask if-and-only-if ADMA is enabled.


That's not enough to fix the problem since there's issues with actual 
transfer data being allocated above 4GB as well, not just the 
consistent  allocations (it appears that blk_queue_bounce_limit 
setting to 32-bit doesn't prevent this on x86_64). Either we play some 
funky games with changing the DMA mask of the entire device to 32-bit 
if either port is in ATAPI mode (which blew up when I tried it) or we 
add the ability to set the DMA mask independently on each port (like 
by setting the mask on the SCSI device and using that for DMA mapping 
instead) which requires core changes.


Its all funky games that no other driver is doing...  There is one 
guaranteed to work scenario -- set all masks and bounce limits etc. to 
32-bit.  There is also one highly-likely-to-work scenario, disabling 
ADMA by default.


Sure, if you don't mind a potentially significant performance 
regression. All the DMA mask problems are due to the fact that the mask 
settings for both ports are ganged together on the PCI device. If we 
could set the DMA masks on the SCSI device or something else that was 
port-specific, and do the command DMA mapping against that device, then 
most of the wierdness goes away.


It does seem like we're starting to get a bit of NVIDIA interest in 
looking into ADMA issues, which is definitely welcome.





* it sure seems like there are other open sata_nv ADMA issues -- can 
we hard-confirm or deny this?  bugzilla wasn't very helpful for me.  
It doesn't seem like we can disable ADMA (to solve those issues) and 
get enough test time in (which is what I said a week (or more?) ago 
too...)


The NCQ/non-NCQ command switching issue is still hitting some people 
(last I heard Kuan was looking into this), also there's a hotplug 
issue that Tejun reported..


The former implies we need to disable swncq for 2.6.24, if it's not 
stable yet.


Huh? Nothing to do with SWNCQ, which last I checked was still off by 
default.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-23 Thread Kuan Luo
Robert worte.
 
 Kuan, does this patch (using the notifiers to see if the command is 
 really done) still work if one port on the controller has 
 ADMA disabled 
 because it's in ATAPI mode? I seem to recall Allen Martin mentioning 
 that notifiers wouldn't work in this case.
 

I just tried the 2.6.24-rc7 sata_nv driver with one hd and  one cdrom in
the same controller. 
I mkfs hd and mounted the cdrom and no error happened.

Allen,  is there anything about notifier that we should pay attention
to?

 
  
  * it sure seems like there are other open sata_nv ADMA 
 issues -- can we 
  hard-confirm or deny this?  bugzilla wasn't very helpful 
 for me.  It 
  doesn't seem like we can disable ADMA (to solve those 
 issues) and get 
  enough test time in (which is what I said a week (or more?) 
 ago too...)
 
 The NCQ/non-NCQ command switching issue is still hitting some people 
 (last I heard Kuan was looking into this), also there's a 
 hotplug issue 
 that Tejun reported..
 
I have not yet reproduced the switching issue even if i removed the
udelay function according to your metholds.
I tried the 2.6.24-rc7. 
 I don't know what kernel version can easily reproduce the issue or
mabye i omit some steps during test.

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Kuan Luo
Robert Hancock wrote:
> As I mentioned, this doesn't seem to resolve the problem we're seeing 
> with rapidly intermixed NCQ commands and cache flushes (at 
> least, if I 
> take out the arbitrary 20usec delay from the driver and add 
> this patch, 
> the problem still shows up). It could be a similar problem, 
> though, of 
> commands being issued before the controller is really ready 
> for them. If 
> you or others at NVIDIA could assist in tracking down that problem it 
> would be appreciated..
>
Ok , i will track down that problem. 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Robert Hancock

Kuan Luo wrote:

Robert hancock wrote:
What problem does this resolve? I tested it against the cache 
flush/NCQ 
write switching problem we've been trying to solve, and it 
doesn't look 
like it fixes that one - if I apply this patch and then remove the 
udelay(20) in sata_nv.c that I added which prevented me from 
seeing this 
problem before, it shows up.




First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 


I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


That looks like a good fix then. (Though a possible optimization would 
be to and the check_commands value with the notifier clear value rather 
than testing against the notifier on each loop. That's fairly minor though.)


As I mentioned, this doesn't seem to resolve the problem we're seeing 
with rapidly intermixed NCQ commands and cache flushes (at least, if I 
take out the arbitrary 20usec delay from the driver and add this patch, 
the problem still shows up). It could be a similar problem, though, of 
commands being issued before the controller is really ready for them. If 
you or others at NVIDIA could assist in tracking down that problem it 
would be appreciated..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Kuan Luo
Robert hancock wrote:
> What problem does this resolve? I tested it against the cache 
> flush/NCQ 
> write switching problem we've been trying to solve, and it 
> doesn't look 
> like it fixes that one - if I apply this patch and then remove the 
> udelay(20) in sata_nv.c that I added which prevented me from 
> seeing this 
> problem before, it shows up.
>

First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 

I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Kuan Luo
Robert hancock wrote:
 What problem does this resolve? I tested it against the cache 
 flush/NCQ 
 write switching problem we've been trying to solve, and it 
 doesn't look 
 like it fixes that one - if I apply this patch and then remove the 
 udelay(20) in sata_nv.c that I added which prevented me from 
 seeing this 
 problem before, it shows up.


First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message ata1: CPB flags CMD err,
flags=0x11 when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 

I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb-resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Robert Hancock

Kuan Luo wrote:

Robert hancock wrote:
What problem does this resolve? I tested it against the cache 
flush/NCQ 
write switching problem we've been trying to solve, and it 
doesn't look 
like it fixes that one - if I apply this patch and then remove the 
udelay(20) in sata_nv.c that I added which prevented me from 
seeing this 
problem before, it shows up.




First thank  davide to help to send the attachment.

Robert,
The patch is to solve the error message ata1: CPB flags CMD err,
flags=0x11 when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 


I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb-resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code.


That looks like a good fix then. (Though a possible optimization would 
be to and the check_commands value with the notifier clear value rather 
than testing against the notifier on each loop. That's fairly minor though.)


As I mentioned, this doesn't seem to resolve the problem we're seeing 
with rapidly intermixed NCQ commands and cache flushes (at least, if I 
take out the arbitrary 20usec delay from the driver and add this patch, 
the problem still shows up). It could be a similar problem, though, of 
commands being issued before the controller is really ready for them. If 
you or others at NVIDIA could assist in tracking down that problem it 
would be appreciated..

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-13 Thread Kuan Luo
Robert Hancock wrote:
 As I mentioned, this doesn't seem to resolve the problem we're seeing 
 with rapidly intermixed NCQ commands and cache flushes (at 
 least, if I 
 take out the arbitrary 20usec delay from the driver and add 
 this patch, 
 the problem still shows up). It could be a similar problem, 
 though, of 
 commands being issued before the controller is really ready 
 for them. If 
 you or others at NVIDIA could assist in tracking down that problem it 
 would be appreciated..

Ok , i will track down that problem. 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-11 Thread Robert Hancock

Kuan Luo wrote:

hi robert,
I have fixed a bug in rhel4u5 2.6.9-55 when running adma mode
with HDS7250SASUN500G.
Could you check this code and if no problem,  then help me to
submit to the newest kernel.



What problem does this resolve? I tested it against the cache flush/NCQ 
write switching problem we've been trying to solve, and it doesn't look 
like it fixes that one - if I apply this patch and then remove the 
udelay(20) in sata_nv.c that I added which prevented me from seeing this 
problem before, it shows up.


If you want to try and reproduce that problem, you can take out this 
udelay(20) from the current version:


if (curr_ncq != pp->last_issue_ncq) {
/* Seems to need some delay before switching between NCQ and
   non-NCQ commands, else we get command timeouts and such. */
udelay(20);
pp->last_issue_ncq = curr_ncq;
}

then run 2 instances of this C program, with different output files as 
the argument:


#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char* argv[])
{
int i;
int fd = open( argv[1], O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | 
S_IWUSR);
if(fd == -1)
{
perror("open");
return 1;
}
for(i=0;i<100;i++)
{
int rc = write(fd, "0", 1);
if( rc != 1 )
{
perror("write");
return 2;
}
rc = fsync(fd);
if(rc)
{
perror("fsync");
return 2;
}
}
return 0;
}

and one instance of this:

dd if=/dev/zero of=blankfile bs=512 count=10 oflag=direct

and one of this:

while /bin/true; do sdparm --command=sync /dev/sdb; done

all at the same time. In my experience, it helps to disable cpufreq (on 
Red Hat/Fedora, /sbin/service cpuspeed stop) to force the CPU to run at 
max frequency all the time. After a few minutes I got this:


ata4: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 
status 0x400 next cpb count 0x2 next cpb idx 0x0

ata4: CPB 0: ctl_flags 0x1f, resp_flags 0x0
ata4: CPB 1: ctl_flags 0x1f, resp_flags 0x0
ata4: CPB 2: ctl_flags 0x1f, resp_flags 0x0
ata4: timeout waiting for ADMA IDLE, stat=0x400
ata4: timeout waiting for ADMA LEGACY, stat=0x400
ata4.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x2 frozen
ata4.00: cmd 61/08:00:e0:74:64/00:00:0a:00:00/40 tag 0 ncq 4096 out
 res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: cmd 61/08:08:30:5b:76/00:00:0c:00:00/40 tag 1 ncq 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: cmd 61/01:10:ba:51:77/00:00:0c:00:00/40 tag 2 ncq 512 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-11 Thread Robert Hancock

Kuan Luo wrote:

hi robert,
I have fixed a bug in rhel4u5 2.6.9-55 when running adma mode
with HDS7250SASUN500G.
Could you check this code and if no problem,  then help me to
submit to the newest kernel.



What problem does this resolve? I tested it against the cache flush/NCQ 
write switching problem we've been trying to solve, and it doesn't look 
like it fixes that one - if I apply this patch and then remove the 
udelay(20) in sata_nv.c that I added which prevented me from seeing this 
problem before, it shows up.


If you want to try and reproduce that problem, you can take out this 
udelay(20) from the current version:


if (curr_ncq != pp-last_issue_ncq) {
/* Seems to need some delay before switching between NCQ and
   non-NCQ commands, else we get command timeouts and such. */
udelay(20);
pp-last_issue_ncq = curr_ncq;
}

then run 2 instances of this C program, with different output files as 
the argument:


#include stdio.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include unistd.h
#include string.h

int main(int argc, char* argv[])
{
int i;
int fd = open( argv[1], O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | 
S_IWUSR);
if(fd == -1)
{
perror(open);
return 1;
}
for(i=0;i100;i++)
{
int rc = write(fd, 0, 1);
if( rc != 1 )
{
perror(write);
return 2;
}
rc = fsync(fd);
if(rc)
{
perror(fsync);
return 2;
}
}
return 0;
}

and one instance of this:

dd if=/dev/zero of=blankfile bs=512 count=10 oflag=direct

and one of this:

while /bin/true; do sdparm --command=sync /dev/sdb; done

all at the same time. In my experience, it helps to disable cpufreq (on 
Red Hat/Fedora, /sbin/service cpuspeed stop) to force the CPU to run at 
max frequency all the time. After a few minutes I got this:


ata4: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 
status 0x400 next cpb count 0x2 next cpb idx 0x0

ata4: CPB 0: ctl_flags 0x1f, resp_flags 0x0
ata4: CPB 1: ctl_flags 0x1f, resp_flags 0x0
ata4: CPB 2: ctl_flags 0x1f, resp_flags 0x0
ata4: timeout waiting for ADMA IDLE, stat=0x400
ata4: timeout waiting for ADMA LEGACY, stat=0x400
ata4.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x2 frozen
ata4.00: cmd 61/08:00:e0:74:64/00:00:0a:00:00/40 tag 0 ncq 4096 out
 res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: cmd 61/08:08:30:5b:76/00:00:0c:00:00/40 tag 1 ncq 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: cmd 61/01:10:ba:51:77/00:00:0c:00:00/40 tag 2 ncq 512 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/