Re: [PATCH 9/9 v2] habanalabs: allow multiple processes to open FD

Oded Gabbay Sun, 28 Jul 2019 09:08:51 -0700

On Sun, Jul 28, 2019 at 4:14 PM Greg KH <[email protected]> wrote:
>
> On Sun, Jul 28, 2019 at 03:48:12PM +0300, Oded Gabbay wrote:
> > This patch removes the limitation of a single process that can open the
> > device.
> >
> > Now, there is no limitation on the number of processes that can open the
> > device and have a valid FD.
> >
> > However, only a single process can perform compute operations. This is
> > enforced by allowing only a single process to have a compute context.
> >
> > Signed-off-by: Oded Gabbay <[email protected]>
> > ---
> > Changes in v2:
> > - Replace WARN with dev_crit
>
> Looks good, thanks.  The other patches in the series looked fine at
> first glance as well
>
> greg k-h


Hi Greg,

Actually after sending this to you, I think I have a problem with this
whole solution of allowing multiple processes to open my device.
I would like to consult with you about this. I have an idea how to
solve this problem but I want to hear what you think and perhaps you
have a better idea.

Basically, I have a multiple-readers, single-writer situation.
I want to allow unlimited number of processes to perform one of the
IOCTLs I have, the INFO IOCTL, which only retrieves certain
information from my device/driver. Those are the readers.
At the same time, I want to allow a single writer to perform ALL the
IOCTLs I have. i.e. submit work to the device, map memory, etc. This
is the writer.

The way I solved it above is not good. By doing "lazy creation" of the
context in the IOCTL call, I created an awkward situation where a
process that opens the device must call an IOCTL to execute something
in order for the compute context to be created.

Now, assume we have a box with 8 devices. The user doesn't care which
device it gets. So he opens the first device and that works for him.
Now another user comes and also wants a device. He opens the first
device and that also works for him. However, when they will try to
execute something on the device, only one of them will succeed and the
other one will fail. But the fail happens in the data path! not in the
open stage. This seems to me like a very bad design and in addition,
this also breaks existing userspace.

I thought of doing something similar to what the drm sub-system is
doing and that is to create an additional device file per device,
which will be called something like "hl-ctrlX" (in addition to hlX).
In hlX I will create the compute context in the open device. That
means the first user that opens the device has full access to it (the
writer). The next opens will fail (because a compute context already
exists). Applications (system management) that want to perform just
the INFO IOCTL call can do it through the hl-ctrlX device. I say it is
similar to drm because drm creates 3 device files per GPU.

What do you think ? Will you accept something like this ? Am I
breaking any rule with this suggestion ? Do you have another Idea ?

btw, I also thought maybe to use the read/write flags of the open
syscall to distinguish between reader and writer. What do you think
about that ?

Any help/advice/flame is appreciated.

Thanks,
Oded

Re: [PATCH 9/9 v2] habanalabs: allow multiple processes to open FD

Reply via email to