Hi,

Noralf Trønnes <nor...@tronnes.org> writes:
>> I missed this completely until now.
>> Noralf Trønnes <nor...@tronnes.org> writes:
>>> This adds the gadget side support for the Generic USB Display. It presents
>>> a DRM display device as a USB Display configured through configfs.
>>>
>>> The display is implemented as a vendor type USB interface with one bulk
>>> out endpoint. The protocol is implemented using control requests.
>>> lz4 compressed framebuffer data/pixels are sent over the bulk endpoint.
>>>
>>> The DRM part of the gadget is placed in the DRM subsystem since it reaches
>>> into the DRM internals.
>> 
>> First and foremost, could this be done in userspace using the raw gadget
>> or f_fs?
>> 
>
> An uncompressed 1920x1080-RGB565 frame is ~4MB. All frames can be
> compressed (lz4) even if just a little, so this is decompressed into the
> framebuffer of the attached DRM device. AFAIU I would need to be able to
> mmap the received bulk buffer if I were to do this from userspace
> without killing performance. So it doesn't look like I can use raw
> gadget or f_fs.

oh, yeah we couldn't map that much. I was thinking that maybe we could
transfer several small buffers instead of a single large one, but
perhaps that would complicate decompression?

>>> diff --git a/drivers/usb/gadget/function/f_gud_drm.c 
>>> b/drivers/usb/gadget/function/f_gud_drm.c
>>> new file mode 100644
>>> index 000000000000..9a2d6bb9739f
>>> --- /dev/null
>>> +++ b/drivers/usb/gadget/function/f_gud_drm.c
>>> @@ -0,0 +1,678 @@
>>> +struct f_gud_drm {
>>> +   struct usb_function func;
>>> +   struct work_struct worker;
>> 
>> why do you need a worker?
>> 
>
> The gadget runs in interrupt context and I need to call into the DRM
> subsystem which can sleep.

At some point someone wanted to provide a patch to run endpoint giveback
routine in process context, much like usb host stack does if
requested. That's currently not implemented, but should be doable by
modifying usb_gadget_giveback_request().

>>> +   size_t max_buffer_size;
>>> +   void *ctrl_req_buf;
>>> +
>>> +   u8 interface_id;
>>> +   struct usb_request *ctrl_req;
>>> +
>>> +   struct usb_ep *bulk_ep;
>>> +   struct usb_request *bulk_req;
>> 
>> single request? Don't you want to amortize the latency of
>> queue->complete by using a series of requests?
>> 
>
> I use only one request per update or partial update.
> I kmalloc the biggest buffer I can get (default 4MB) and tell the host
> about this size. If a frame doesn't fit, the host splits it up into
> partial updates. I already support partial updates so this is built in.
> Userspace can tell the graphics driver which portion of the framebuffer
> it has touched to avoid sending the full frame each time.
> Having one continous buffer simplifies decompression.

got it

> There's a control request preceding the bulk request which tells the
> area the update is for and whether it is compressed or not.
> I did some testing to see if I should avoid the control request overhead
> for split updates, but it turns out that the bottleneck is the
> decompression. The control request is just 400-500us, while the total
> time from control request to buffer is decompressed is 50-100ms
> (depending on how well the frame compresses).

yeah, that makes sense.

>>> +   struct gud_drm_gadget *gdg;
>>> +
>>> +   spinlock_t lock; /* Protects the following members: */
>>> +   bool ctrl_pending;
>>> +   bool status_pending;
>>> +   bool bulk_pending;
>>> +   bool disable_pending;
>> 
>> could this be a single u32 with #define'd flags? That would be atomic,
>> right?
>> 
>
> I have never grasped all the concurrency issues, but wouldn't I need
> memory barriers as well?

As far as I understand, {test_and_,}{set,clear,change}_bit() handle all
the required steps to guarantee proper atomic behavior. I haven't looked
at the implementation myself, though.

>>> +   u8 errno;
>> 
>> a global per-function error? Why?
>> 
>
> This is the result of the previously request operation. The host will
> request this value to see how it went. I might switch to using a bulk
> endpoint for status following a discussion with Alan Stern in the host
> driver thread in this patch series. If so I might not need this.

got it.

>>> +   u16 request;
>>> +   u16 value;
>> 
>> also why? Looks odd
>> 
>
> These values contains the operation (in addition to the payload) that
> the worker shall perform following the control-OUT requests.
>
> control-IN requests can run in interrupt context so in that case the
> payload is queued up immediately.

cool

>>> +static void f_gud_drm_bulk_complete(struct usb_ep *ep, struct usb_request 
>>> *req)
>>> +{
>>> +   struct f_gud_drm *fgd = req->context;
>>> +   unsigned long flags;
>>> +
>>> +   if (req->status || req->actual != req->length)
>>> +           return;
>> 
>> so, if we complete with an erroneous status or a short packet, you'll
>> ignore it?
>> 
>
> Hmm yeah. When I wrote this I thought that the bottleneck was the USB
> transfers, so I didn't want the host to slow down performance by
> requesting this status. Now I know it's the decompression that takes
> time, so I could actually do a status check and retry the frame if the
> device detects an error.

sounds good :-)

-- 
balbi

Attachment: signature.asc
Description: PGP signature

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to