On Tue, 15 Mar 2016, Hannes Reinecke wrote:

> On 03/15/2016 04:27 AM, Finn Thain wrote:
> > 
> > On Mon, 14 Mar 2016, Hannes Reinecke wrote:
> > 
> >> On 03/14/2016 05:27 AM, Finn Thain wrote:
> >>> This setting does not need to be conditional on Atari ST or TT.
> >>>
> >>> Without TCQ support, cmd_per_lun == 2 is probably reasonable...
> >>>
> >>> Signed-off-by: Finn Thain <fth...@telegraphics.com.au>
> >>>
> >>> ---
> >>>  drivers/scsi/atari_scsi.c |    3 +--
> >>>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>>
> >>> Index: linux/drivers/scsi/atari_scsi.c
> >>> ===================================================================
> >>> --- linux.orig/drivers/scsi/atari_scsi.c  2016-03-14 15:26:45.000000000 
> >>> +1100
> >>> +++ linux/drivers/scsi/atari_scsi.c       2016-03-14 15:26:55.000000000 
> >>> +1100
> >>> @@ -750,6 +750,7 @@ static struct scsi_host_template atari_s
> >>>   .eh_abort_handler       = atari_scsi_abort,
> >>>   .eh_bus_reset_handler   = atari_scsi_bus_reset,
> >>>   .this_id                = 7,
> >>> + .cmd_per_lun            = 2,
> >>>   .use_clustering         = DISABLE_CLUSTERING,
> >>>   .cmd_size               = NCR5380_CMD_SIZE,
> >>>  };
> >> _2_ ? Are you being overly cheeky here?
> >> I sincerely doubt the driver is capable of submitting two 
> >> simultaneous commands ...
> > 
> > Right. The LLD has LU busy flags to prevent a LU from being issued 
> > more than one command.
> > 
> >> Care to explain?
> > 
> > It seemed harmless and it is consistent with the all of the other 5380 
> > drivers.
> > 
> > I don't know why it was done that way. Perhaps it was done to create a 
> > pipeline. That is, to keep a small number of commands in the LLD issue 
> > queue so that the NCR5380_main() work item does not have to terminate 
> > and then get requeued needlessly.
> > 
> Like I suspected.
> While I'm aware of the reasoning, I sincerely doubt whether it makes any 
> difference in real life.
> After all, a 'BUSY' return value still relies on someone kicking the 
> queue so that the next command can be submitted.

Well, it is not queuecommand returning SCSI_MLQUEUE_HOST_BUSY. I assume it 
is scsi_request_fn() bailing out when !scsi_dev_queue_ready().

> So it's not much different from using a queuedepth of '1' and use the 
> 'official' way.
> 
> Have you done any benchmarking here?

I have now.

> Would be very interesting to check if it makes a difference in real
> life ...

It seems that the work item startup and shutdown overhead does make a 
difference on machines where cycles are scarce.

Using mac_scsi on a 25 MHz 68030 I made some test runs of
# dd if=/dev/sda of=/dev/null count=4096
and the difference is measurable (effect size is 9+ standard deviations).

cmd_per_lun = 2 is about 3.0% faster than cmd_per_lun = 1.
cmd_per_lun = 4 is about 3.7% faster than cmd_per_lun = 1.

Increasing cmd_per_lun to 16 (which equals can_queue) doesn't improve the 
timing.

I think the 'official' way (the default cmd_per_lun) would not hurt if the 
CPU was a bit faster.

Now that you've got me to test it I think 4 is probably the best for 
mac_scsi and atari_scsi. When I send v2 I will change patch 20 and 22 
accordingly.

-- 

> 
> Cheers,
> 
> Hannes
> 

Reply via email to