subject:"\"\\\[PATCH v2 0\\\/5\\\] g_NCR5380\\\: PDMA fixes and cleanup\""

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

2017-06-26 Thread Ondrej Zary

On Monday 26 June 2017, Finn Thain wrote:
> On Sun, 25 Jun 2017, Ondrej Zary wrote:
> > It mostly works, but there are some problems:
> >
> > It's not reliable - we continue the data transfer after poll_politely2
> > returns zero but we don't know if it returned because of host buffer
> > being ready of because of an IRQ. So if a device disconnects during
> > write, we continue to fill the buffer and only then find out that wait
> > for 53c80 registers timed out. Then PDMA gets disabled:
> > [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible,
> > device will be reset [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to
> > slow handshake
>
> Sorry about that. I messed up the Gated-IRQ-is-asserted case when I
> changed the algorithm to avoid adding more polling loops.
>
> > We can just reset and continue with a new PDMA transfer.
>
> We should only reset the 53c400 logic when there is a real failure (like
> no access to 53c80 registers). If a reset is needed to handle normal
> target behaviour (like disconnection during a slow transfer) then we are
> doing something wrong.

The reset is probably the only way to terminate a PDMA transfer.

> > Found no problems with reads. But when this happens during a write, we
> > might have lost some data buffers that we need to transfer again. The
> > chip's PDMA block counter does not seem to be very helpful here -
> > testing shows that either one buffer is missing in the file or is
> > duplicated.
> >
> > That's why my code had separate host buffer ready and IRQ checks. Host
> > buffer first - if it's ready, transfer the data. If not, check for IRQ -
> > if it was an error, rollback 2 buffers (the same if the host buffer is
> > not ready in time).
>
> In v3 of the patch series I've fixed the Gated IRQ logic so the transfer
> loop will terminate early.
>
> > There's also a performance regression on DTC436 - the sg_tablesize limit
> > affects performance badly.
> > Before:
> > # hdparm -t --direct /dev/sdb
> >
> > /dev/sdb:
> >  Timing O_DIRECT disk reads:   4 MB in  3.21 seconds =   1.25 MB/sec
> >
> > Now:
> > # hdparm -t --direct /dev/sdb
> >
> > /dev/sdb:
> >  Timing O_DIRECT disk reads:   4 MB in  4.89 seconds = 837.69 kB/sec
>
> The lost throughput can perhaps be explained by extra kernel-mode CPU
> overhead. Next time maybe check for that with "time hdparm". Anyway, since
> you have a patch, we should try to avoid this regression.

I don't think the CPU is that slow. It probably decreases the SCSI read 
command length and we must wait for the drive to process each command (it 
probably does no read-ahead).

> > We should limit the transfer size instead:
> > --- a/drivers/scsi/g_NCR5380.c
> > +++ b/drivers/scsi/g_NCR5380.c
> > @@ -45,7 +45,8 @@
> > int c400_blk_cnt; \
> > int c400_host_buf; \
> > int io_width; \
> > -   int pdma_residual
> > +   int pdma_residual; \
> > +   int board;
> >
> >  #define NCR5380_dma_xfer_lengeneric_NCR5380_dma_xfer_len
> >  #define NCR5380_dma_recv_setup  generic_NCR5380_pread
> > @@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct
> > scsi_host_template *tpnt, case BOARD_DTC3181E:
> > ports = dtc_3181e_ports;
> > magic = ncr_53c400a_magic;
> > -   tpnt->sg_tablesize = 1;
> > break;
> > }
>
> I've dropped my "sg_tablesize = 1" patch from the v3 series.
>
> > @@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct
> > scsi_host_template *tpnt, }
> > hostdata = shost_priv(instance);
> >
> > +   hostdata->board = board;
> > hostdata->io = iomem;
> > hostdata->region_size = region_size;
>
> I've added the hostdata->board variable in v3. It looks like we are going
> to need it one way or another.
>
> > @@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct
> > NCR5380_hostdata *hostdata, /* 53C400 datasheet: non-modulo-128-byte
> > transfers should use PIO */ if (transfersize % 128)
> > transfersize = 0;
> > +   /* Limit transfers to 512B to prevent random write corruption on
> > DTC */ +   if (hostdata->board == BOARD_DTC3181E && transfersize >
> > 512) +   transfersize = 512;
> >
> > return min(transfersize, DMA_MAX_SIZE);
> >  }
> >
> >
> > No data corruption
>
> How did you confirm that? What about a 1024 byte limit? 2048? 4096? Does
> it make any difference?
>
> Do you have any theories that might explain the need for a 512 B limit?

Tested it by copying a large file (230 MB of random data) to a slow Quantum 
240MB HDD, then read it back and compare. No corruption with 512B, corruption 
with anything more.

My theory is that with 512 B size, no disconnect occurs during a transfer 
(because the drive has 512 B sectors).

> My theory about the "read overrun" issue doesn't seem applicable, given
> that this is actually the CMOS version, which has different errata than
> the NMOS version.
>
> The

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

2017-06-26 Thread Finn Thain

On Sun, 25 Jun 2017, Ondrej Zary wrote:
> 
> It mostly works, but there are some problems:
> 
> It's not reliable - we continue the data transfer after poll_politely2
> returns zero but we don't know if it returned because of host buffer
> being ready of because of an IRQ. So if a device disconnects during write,
> we continue to fill the buffer and only then find out that wait for
> 53c80 registers timed out. Then PDMA gets disabled:
> [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible, device 
> will be reset
> [ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to slow handshake
> 

Sorry about that. I messed up the Gated-IRQ-is-asserted case when I 
changed the algorithm to avoid adding more polling loops.

> We can just reset and continue with a new PDMA transfer.

We should only reset the 53c400 logic when there is a real failure (like 
no access to 53c80 registers). If a reset is needed to handle normal 
target behaviour (like disconnection during a slow transfer) then we are 
doing something wrong.

> Found no problems with reads. But when this happens during a write, we 
> might have lost some data buffers that we need to transfer again. The 
> chip's PDMA block counter does not seem to be very helpful here - 
> testing shows that either one buffer is missing in the file or is 
> duplicated.
> 
> That's why my code had separate host buffer ready and IRQ checks. Host 
> buffer first - if it's ready, transfer the data. If not, check for IRQ - 
> if it was an error, rollback 2 buffers (the same if the host buffer is 
> not ready in time).
> 

In v3 of the patch series I've fixed the Gated IRQ logic so the transfer 
loop will terminate early.

> 
> There's also a performance regression on DTC436 - the sg_tablesize limit
> affects performance badly.
> Before:
> # hdparm -t --direct /dev/sdb
> 
> /dev/sdb:
>  Timing O_DIRECT disk reads:   4 MB in  3.21 seconds =   1.25 MB/sec
> 
> Now:
> # hdparm -t --direct /dev/sdb
> 
> /dev/sdb:
>  Timing O_DIRECT disk reads:   4 MB in  4.89 seconds = 837.69 kB/sec
> 

The lost throughput can perhaps be explained by extra kernel-mode CPU 
overhead. Next time maybe check for that with "time hdparm". Anyway, since 
you have a patch, we should try to avoid this regression.

> We should limit the transfer size instead:
> --- a/drivers/scsi/g_NCR5380.c
> +++ b/drivers/scsi/g_NCR5380.c
> @@ -45,7 +45,8 @@
> int c400_blk_cnt; \
> int c400_host_buf; \
> int io_width; \
> -   int pdma_residual
> +   int pdma_residual; \
> +   int board;
> 
>  #define NCR5380_dma_xfer_lengeneric_NCR5380_dma_xfer_len
>  #define NCR5380_dma_recv_setup  generic_NCR5380_pread
> @@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct 
> scsi_host_template *tpnt,
> case BOARD_DTC3181E:
> ports = dtc_3181e_ports;
> magic = ncr_53c400a_magic;
> -   tpnt->sg_tablesize = 1;
> break;
> }
> 

I've dropped my "sg_tablesize = 1" patch from the v3 series.

> @@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct 
> scsi_host_template *tpnt,
> }
> hostdata = shost_priv(instance);
> 
> +   hostdata->board = board;
> hostdata->io = iomem;
> hostdata->region_size = region_size;
> 

I've added the hostdata->board variable in v3. It looks like we are going 
to need it one way or another.

> @@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct 
> NCR5380_hostdata *hostdata,
> /* 53C400 datasheet: non-modulo-128-byte transfers should use PIO */
> if (transfersize % 128)
> transfersize = 0;
> +   /* Limit transfers to 512B to prevent random write corruption on DTC 
> */
> +   if (hostdata->board == BOARD_DTC3181E && transfersize > 512)
> +   transfersize = 512;
> 
> return min(transfersize, DMA_MAX_SIZE);
>  }
> 
> 
> No data corruption

How did you confirm that? What about a 1024 byte limit? 2048? 4096? Does 
it make any difference?

Do you have any theories that might explain the need for a 512 B limit?

My theory about the "read overrun" issue doesn't seem applicable, given 
that this is actually the CMOS version, which has different errata than 
the NMOS version.

The need for udelay(4) on this board does suggest timing issues. What 
happens if you add a udelay into the generic_NCR5380_pwrite() loop?

> and no performance regression:
> # hdparm -t --direct /dev/sdb
> 

Well, I'd still expect some added CPU overhead. The driver is being handed 
a 4096 byte transfer and is now splitting that up into eight separate 
transfers. That means eight times as many DMA setup and completion calls 
and presumably a similar increase in interrupts.

> /dev/sdb:
>  Timing O_DIRECT disk reads:   4 MB in  3.25 seconds =   1.23 MB/sec
> 
> As the data corruption affects only writes, we could keep transfersize
> unlimited for reads:
> +   /* Limit write transfe

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

2017-06-25 Thread Ondrej Zary

On Saturday 24 June 2017 08:37:36 Finn Thain wrote:
> Ondrej, would you please test this new series?
>
> Changed since v1:
> - PDMA transfer residual is calculated earlier.
> - End of DMA flag check is now polled (if there is any residual).
>
>
> Finn Thain (2):
>   g_NCR5380: Limit sg_tablesize to avoid PDMA read overruns on DTC436
>   g_NCR5380: Cleanup comments and whitespace
>
> Ondrej Zary (3):
>   g_NCR5380: Fix PDMA transfer size
>   g_NCR5380: End PDMA transfer correctly on target disconnection
>   g_NCR5380: Re-work PDMA loops
>
>  drivers/scsi/g_NCR5380.c | 231
> ++- 1 file changed, 107
> insertions(+), 124 deletions(-)

It mostly works, but there are some problems:

It's not reliable - we continue the data transfer after poll_politely2
returns zero but we don't know if it returned because of host buffer
being ready of because of an IRQ. So if a device disconnects during write,
we continue to fill the buffer and only then find out that wait for
53c80 registers timed out. Then PDMA gets disabled:
[ 3458.774251] sd 2:0:1:0: [sdb] tag#0 53c80 registers not accessible, device 
will be reset
[ 3458.774251] sd 2:0:1:0: [sdb] tag#0 switching to slow handshake

We can just reset and continue with a new PDMA transfer. Found no problems
with reads. But when this happens during a write, we might have lost some
data buffers that we need to transfer again. The chip's PDMA block counter
does not seem to be very helpful here - testing shows that either one buffer
is missing in the file or is duplicated.

That's why my code had separate host buffer ready and IRQ checks. Host
buffer first - if it's ready, transfer the data. If not, check for IRQ - 
if it was an error, rollback 2 buffers (the same if the host buffer is not
ready in time).

There's also a performance regression on DTC436 - the sg_tablesize limit
affects performance badly.
Before:
# hdparm -t --direct /dev/sdb

/dev/sdb:
 Timing O_DIRECT disk reads:   4 MB in  3.21 seconds =   1.25 MB/sec

Now:
# hdparm -t --direct /dev/sdb

/dev/sdb:
 Timing O_DIRECT disk reads:   4 MB in  4.89 seconds = 837.69 kB/sec

We should limit the transfer size instead:
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -45,7 +45,8 @@
int c400_blk_cnt; \
int c400_host_buf; \
int io_width; \
-   int pdma_residual
+   int pdma_residual; \
+   int board;

 #define NCR5380_dma_xfer_lengeneric_NCR5380_dma_xfer_len
 #define NCR5380_dma_recv_setup  generic_NCR5380_pread
@@ -247,7 +248,6 @@ static int generic_NCR5380_init_one(struct 
scsi_host_template *tpnt,
case BOARD_DTC3181E:
ports = dtc_3181e_ports;
magic = ncr_53c400a_magic;
-   tpnt->sg_tablesize = 1;
break;
}

@@ -317,6 +317,7 @@ static int generic_NCR5380_init_one(struct 
scsi_host_template *tpnt,
}
hostdata = shost_priv(instance);

+   hostdata->board = board;
hostdata->io = iomem;
hostdata->region_size = region_size;

@@ -625,6 +626,9 @@ static int generic_NCR5380_dma_xfer_len(struct 
NCR5380_hostdata *hostdata,
/* 53C400 datasheet: non-modulo-128-byte transfers should use PIO */
if (transfersize % 128)
transfersize = 0;
+   /* Limit transfers to 512B to prevent random write corruption on DTC */
+   if (hostdata->board == BOARD_DTC3181E && transfersize > 512)
+   transfersize = 512;

return min(transfersize, DMA_MAX_SIZE);
 }

No data corruption and no performance regression:
# hdparm -t --direct /dev/sdb

/dev/sdb:
 Timing O_DIRECT disk reads:   4 MB in  3.25 seconds =   1.23 MB/sec

As the data corruption affects only writes, we could keep transfersize
unlimited for reads:
+   /* Limit write transfers to 512B to prevent random corruption on DTC */
+   if (hostdata->board == BOARD_DTC3181E &&
+   cmd->sc_data_direction == DMA_TO_DEVICE && transfersize > 512)
+   transfersize = 512;

So we can get some performance gain at least for reads:
# hdparm -t --direct /dev/sdb

/dev/sdb:
 Timing O_DIRECT disk reads:   6 MB in  4.17 seconds =   1.44 MB/sec

-- 
Ondrej Zary

[PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

2017-06-23 Thread Finn Thain

Ondrej, would you please test this new series?

Changed since v1:
- PDMA transfer residual is calculated earlier.
- End of DMA flag check is now polled (if there is any residual).


Finn Thain (2):
  g_NCR5380: Limit sg_tablesize to avoid PDMA read overruns on DTC436
  g_NCR5380: Cleanup comments and whitespace

Ondrej Zary (3):
  g_NCR5380: Fix PDMA transfer size
  g_NCR5380: End PDMA transfer correctly on target disconnection
  g_NCR5380: Re-work PDMA loops

 drivers/scsi/g_NCR5380.c | 231 ++-
 1 file changed, 107 insertions(+), 124 deletions(-)

-- 
2.13.0

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

Re: [PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

[PATCH v2 0/5] g_NCR5380: PDMA fixes and cleanup

4 matches

Site Navigation

Mail list logo

Footer information