On Sun, Jul 08, 2018 at 02:25:38PM -0400, jacob feder wrote:
> On Sun, Jul 8, 2018 at 9:28 AM Greg KH <gre...@linuxfoundation.org> wrote:
> 
>     On Sat, Jul 07, 2018 at 10:19:40PM -0400, Jacob Feder wrote:
>     > Hi all,
>     > I have developed this driver for a Xilinx-provided IP block for their
>     Zynq
>     > SoC. I fixed all of the checkpatch.pl problems I knew how to. If someone
>     > could chime in on how to fix the remaining few it would be appreciated.
>     >
>     > Also looking for feedback on general structure. It's my first driver 
> (and
>     > patch submission) so I'm sure there are plenty of things to be improved
>     on
>     > :).
>     >
>     > Functionally everything works OK except the performance isn't as good as
>     I
>     > was hoping for. I have been testing it by operating the FIFO in loopback
>     > mode (AXI-Stream TX interface feeding back into RX interface) running on
>     > the XC7Z020 (7000 series) Zynq device. I am getting anything between
>     > 3-16MB/s depending on the amount of data transferred. The performance
>     > claimed by the PG080 datasheet is ~65MB/s. The CPU isn't under
>     significant
>     > load (~12%) as reported by top so I don't think that's the bottleneck.
>     >
>     > Please +CC in responses as I'm not on the mailing list.
>     >
>     > Cheers
>     >
>     >
>     > This IP core has read and write AXI-Stream FIFOs, the contents of which
>     can
>     > be accessed from the AXI4 memory-mapped interface. This is useful for
>     > transferring data from a processor into the FPGA fabric. The driver
>     creates
>     > a character device that can be read/written to with standard
>     > open/read/write/close.
> 
>     Why not use the uio api, which allows userspace to mmap the memory of
>     the device and access it directly from userspace?  That should make
>     things a lot faster, right?
> 
> 
> 
> I thought about the UIO method but based on what I read it seemed like more of
> a hack (and also doesn't expose interrupts?). Whether it would make
> things faster I have no idea.

Directly mmap is faster than read/write in a serial way, right?

> 
>     Or if that doesn't work, what about the fpga API the kernel now has?
>     Would that work for this hardware?
> 
> 
> 
> I'm not totally sure what you're referring to here, but I think the FPGA 
> kernel
> drivers are for downloading bitstreams to the FPGA (bitstream is
> equivalent to asm for fpgas), which isn't what I'm trying to do. 

Ah, ok.

>     >
>     > See Xilinx PG080 document for IP details.
> 
>     Do you have a link to that?  If so, can you put it in here?
> 
> 
> 
> https://www.xilinx.com/support/documentation/ip_documentation/axi_fifo_mm_s/
> v4_1/pg080-axi-fifo-mm-s.pdf

Can you put this in the changelog message please?

>     > Currently supports only store-forward mode with a 32-bit
>     > AXI4-Lite interface. DOES NOT support:
>     >       - cut-through mode
>     >       - AXI4 (non-lite)
>     >
>     > Signed-off-by: Jacob Feder <jacobsfe...@gmail.com>
>     > ---
>     >  drivers/staging/axisfifo/axis-fifo.c | 1296
>     ++++++++++++++++++++++++++++++++++
>     >  drivers/staging/axisfifo/axis-fifo.h |  119 ++++
>     >  2 files changed, 1415 insertions(+)
>     >  create mode 100644 drivers/staging/axisfifo/axis-fifo.c
>     >  create mode 100644 drivers/staging/axisfifo/axis-fifo.h
> 
>     Why does a single .c file need a .h file?
> 
> 
> 
> Good point... this can be consolidated :)
>  
> 
>     I'll be glad to take this driver, as others can clean it up in-tree (I
>     think your locking is crazy and is probably causing a lot of performance
>     issues), but I need a TODO file for it listing what you think is needed
>     to do in order to get this out of the staging tree?
> 
> 
> 
> I'm confused about why you don't like the locking - all I'm doing is locking 
> on
> open() calls to prevent multiple userspace apps from reading/writing
> to the fifo simultaneously.  This shouldn't reduce performance because
> the mutexes are only tested on open() not on read() or write().
> Presumably the user is only opening once.

Trying to "protect" userspace from itself is not really a good idea.
Grabbing a lock over open/close is ripe for problems.  You really are
not keeping multiple owners from accessing the device at the same time,
think about passing around a file descriptor to different processes or
threads.  Your kernel code does not protect from that, so it is best
just not to worry about that at all.  If userspace does foolish things,
let it deal with the result :)

> I think locking is necessary - if the hardware registers are accessed in the
> wrong order it goes into an unknown state and must be reset (and will
> probably cause a kernel panic).

How will the kernel crash because of this?

>     Or, if you thin it should use the fpga or uio interface instead, maybe
>     it's just easier to redo it based on that?
> 
>     thanks,
> 
>     greg k-h
> 
> 
> I have made some slight modifications today. Let me know if there are any 
> other
> things you
> think I should change. I can integrate those then resubmit a v2 for you to
> bring into the tree.
> 
> In terms of TODO it's all ok as far as I'm concerned unless someone wants to
> look into the performance.
> (actually see below - I think this could explain why performance of my system
> is lower - xilinx is using
> some sort of DMA driver instead of io_remap)
> https://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/13011/2/
> Linux-DMA-In-Device-Drivers.pdf

mmap might be the best for what you want, try uio out and see if that
helps in your speed...

thanks,

greg k-h
_______________________________________________
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

Reply via email to