Hi, Noralf Trønnes <nor...@tronnes.org> writes: >> I missed this completely until now. >> Noralf Trønnes <nor...@tronnes.org> writes: >>> This adds the gadget side support for the Generic USB Display. It presents >>> a DRM display device as a USB Display configured through configfs. >>> >>> The display is implemented as a vendor type USB interface with one bulk >>> out endpoint. The protocol is implemented using control requests. >>> lz4 compressed framebuffer data/pixels are sent over the bulk endpoint. >>> >>> The DRM part of the gadget is placed in the DRM subsystem since it reaches >>> into the DRM internals. >> >> First and foremost, could this be done in userspace using the raw gadget >> or f_fs? >> > > An uncompressed 1920x1080-RGB565 frame is ~4MB. All frames can be > compressed (lz4) even if just a little, so this is decompressed into the > framebuffer of the attached DRM device. AFAIU I would need to be able to > mmap the received bulk buffer if I were to do this from userspace > without killing performance. So it doesn't look like I can use raw > gadget or f_fs.
oh, yeah we couldn't map that much. I was thinking that maybe we could transfer several small buffers instead of a single large one, but perhaps that would complicate decompression? >>> diff --git a/drivers/usb/gadget/function/f_gud_drm.c >>> b/drivers/usb/gadget/function/f_gud_drm.c >>> new file mode 100644 >>> index 000000000000..9a2d6bb9739f >>> --- /dev/null >>> +++ b/drivers/usb/gadget/function/f_gud_drm.c >>> @@ -0,0 +1,678 @@ >>> +struct f_gud_drm { >>> + struct usb_function func; >>> + struct work_struct worker; >> >> why do you need a worker? >> > > The gadget runs in interrupt context and I need to call into the DRM > subsystem which can sleep. At some point someone wanted to provide a patch to run endpoint giveback routine in process context, much like usb host stack does if requested. That's currently not implemented, but should be doable by modifying usb_gadget_giveback_request(). >>> + size_t max_buffer_size; >>> + void *ctrl_req_buf; >>> + >>> + u8 interface_id; >>> + struct usb_request *ctrl_req; >>> + >>> + struct usb_ep *bulk_ep; >>> + struct usb_request *bulk_req; >> >> single request? Don't you want to amortize the latency of >> queue->complete by using a series of requests? >> > > I use only one request per update or partial update. > I kmalloc the biggest buffer I can get (default 4MB) and tell the host > about this size. If a frame doesn't fit, the host splits it up into > partial updates. I already support partial updates so this is built in. > Userspace can tell the graphics driver which portion of the framebuffer > it has touched to avoid sending the full frame each time. > Having one continous buffer simplifies decompression. got it > There's a control request preceding the bulk request which tells the > area the update is for and whether it is compressed or not. > I did some testing to see if I should avoid the control request overhead > for split updates, but it turns out that the bottleneck is the > decompression. The control request is just 400-500us, while the total > time from control request to buffer is decompressed is 50-100ms > (depending on how well the frame compresses). yeah, that makes sense. >>> + struct gud_drm_gadget *gdg; >>> + >>> + spinlock_t lock; /* Protects the following members: */ >>> + bool ctrl_pending; >>> + bool status_pending; >>> + bool bulk_pending; >>> + bool disable_pending; >> >> could this be a single u32 with #define'd flags? That would be atomic, >> right? >> > > I have never grasped all the concurrency issues, but wouldn't I need > memory barriers as well? As far as I understand, {test_and_,}{set,clear,change}_bit() handle all the required steps to guarantee proper atomic behavior. I haven't looked at the implementation myself, though. >>> + u8 errno; >> >> a global per-function error? Why? >> > > This is the result of the previously request operation. The host will > request this value to see how it went. I might switch to using a bulk > endpoint for status following a discussion with Alan Stern in the host > driver thread in this patch series. If so I might not need this. got it. >>> + u16 request; >>> + u16 value; >> >> also why? Looks odd >> > > These values contains the operation (in addition to the payload) that > the worker shall perform following the control-OUT requests. > > control-IN requests can run in interrupt context so in that case the > payload is queued up immediately. cool >>> +static void f_gud_drm_bulk_complete(struct usb_ep *ep, struct usb_request >>> *req) >>> +{ >>> + struct f_gud_drm *fgd = req->context; >>> + unsigned long flags; >>> + >>> + if (req->status || req->actual != req->length) >>> + return; >> >> so, if we complete with an erroneous status or a short packet, you'll >> ignore it? >> > > Hmm yeah. When I wrote this I thought that the bottleneck was the USB > transfers, so I didn't want the host to slow down performance by > requesting this status. Now I know it's the decompression that takes > time, so I could actually do a status check and retry the frame if the > device detects an error. sounds good :-) -- balbi
signature.asc
Description: PGP signature
_______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel