[Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Chris Wilson
dma_fence_release() objects to a fence being freed before it is
signaled, so instead of playing fancy tricks to avoid handling dying
requests, let's keep the syncpt alive until signaled. This neatly
removes the issue with having to decouple the syncpt from the timeline
upon fence release.
-Chris


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Bas Nieuwenhuizen
Hi Chris,

My concern with going in this direction was that we potentially allow
an application to allocate a lot of kernel memory but not a lot of fds
by creating lots of fences and then closing the fds but never
signaling them. Is that not an issue?

- Bas

On Wed, Jul 15, 2020 at 12:04 PM Chris Wilson  wrote:
>
> dma_fence_release() objects to a fence being freed before it is
> signaled, so instead of playing fancy tricks to avoid handling dying
> requests, let's keep the syncpt alive until signaled. This neatly
> removes the issue with having to decouple the syncpt from the timeline
> upon fence release.
> -Chris
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Stone
Hi,

On Wed, 15 Jul 2020 at 11:23, Bas Nieuwenhuizen  
wrote:
> My concern with going in this direction was that we potentially allow
> an application to allocate a lot of kernel memory but not a lot of fds
> by creating lots of fences and then closing the fds but never
> signaling them. Is that not an issue?

sw_sync is a userspace DoS mechanism by design - if someone wants to
enable and use it, they have bigger problems than unbounded memory
allocations.

Cheers,
Daniel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Chris Wilson
Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35)
> Hi Chris,
> 
> My concern with going in this direction was that we potentially allow
> an application to allocate a lot of kernel memory but not a lot of fds
> by creating lots of fences and then closing the fds but never
> signaling them. Is that not an issue?

I did look to see if there was a quick way we could couple into the
sync_file release itself to remove the syncpt from the timeline, but
decided that for a debug feature, it wasn't a pressing concern.

Maybe now is the time to ask: are you using sw_sync outside of
validation?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Bas Nieuwenhuizen
On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson  wrote:
>
> Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35)
> > Hi Chris,
> >
> > My concern with going in this direction was that we potentially allow
> > an application to allocate a lot of kernel memory but not a lot of fds
> > by creating lots of fences and then closing the fds but never
> > signaling them. Is that not an issue?
>
> I did look to see if there was a quick way we could couple into the
> sync_file release itself to remove the syncpt from the timeline, but
> decided that for a debug feature, it wasn't a pressing concern.
>
> Maybe now is the time to ask: are you using sw_sync outside of
> validation?

Yes, this is used as part of the Android stack on Chrome OS (need to
see if ChromeOS specific, but
https://source.android.com/devices/graphics/sync#sync_timeline
suggests not)

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Stone
Hi,

On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen  
wrote:
> On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson  
> wrote:
> > Maybe now is the time to ask: are you using sw_sync outside of
> > validation?
>
> Yes, this is used as part of the Android stack on Chrome OS (need to
> see if ChromeOS specific, but
> https://source.android.com/devices/graphics/sync#sync_timeline
> suggests not)

Android used to mandate it for their earlier iteration of release
fences, which was an empty/future fence having no guarantee of
eventual forward progress until someone committed work later on. For
example, when you committed a buffer to SF, it would give you an empty
'release fence' for that buffer which would only be tied to work to
signal it when you committed your _next_ buffer, which might never
happen. They removed that because a) future fences were a bad idea,
and b) it was only ever useful if you assumed strictly
FIFO/round-robin return order which wasn't always true.

So now it's been watered down to 'use this if you don't have a
hardware timeline', but why don't we work with Android people to get
that removed entirely?

Cheers,
Daniel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-15 Thread Daniel Vetter
On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone  wrote:
>
> Hi,
>
> On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen  
> wrote:
> > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson  
> > wrote:
> > > Maybe now is the time to ask: are you using sw_sync outside of
> > > validation?
> >
> > Yes, this is used as part of the Android stack on Chrome OS (need to
> > see if ChromeOS specific, but
> > https://source.android.com/devices/graphics/sync#sync_timeline
> > suggests not)
>
> Android used to mandate it for their earlier iteration of release
> fences, which was an empty/future fence having no guarantee of
> eventual forward progress until someone committed work later on. For
> example, when you committed a buffer to SF, it would give you an empty
> 'release fence' for that buffer which would only be tied to work to
> signal it when you committed your _next_ buffer, which might never
> happen. They removed that because a) future fences were a bad idea,
> and b) it was only ever useful if you assumed strictly
> FIFO/round-robin return order which wasn't always true.
>
> So now it's been watered down to 'use this if you don't have a
> hardware timeline', but why don't we work with Android people to get
> that removed entirely?

I think there's some testcases still using these, but most real fence
testcases use vgem nowadays. So from an upstream pov there's indeed
not much if anything holding us back from just deleting this all. And
would probably be a good idea.

Adding Rob and John for more of the android pov.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] sw_sync deadlock avoidance, take 3

2020-07-16 Thread Daniel Stone
Hi all,

On Wed, 15 Jul 2020 at 12:57, Daniel Vetter  wrote:
> On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone  wrote:
> > On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen  
> > wrote:
> > > Yes, this is used as part of the Android stack on Chrome OS (need to
> > > see if ChromeOS specific, but
> > > https://source.android.com/devices/graphics/sync#sync_timeline
> > > suggests not)
> >
> > Android used to mandate it for their earlier iteration of release
> > fences, which was an empty/future fence having no guarantee of
> > eventual forward progress until someone committed work later on. For
> > example, when you committed a buffer to SF, it would give you an empty
> > 'release fence' for that buffer which would only be tied to work to
> > signal it when you committed your _next_ buffer, which might never
> > happen. They removed that because a) future fences were a bad idea,
> > and b) it was only ever useful if you assumed strictly
> > FIFO/round-robin return order which wasn't always true.
> >
> > So now it's been watered down to 'use this if you don't have a
> > hardware timeline', but why don't we work with Android people to get
> > that removed entirely?
>
> I think there's some testcases still using these, but most real fence
> testcases use vgem nowadays. So from an upstream pov there's indeed
> not much if anything holding us back from just deleting this all. And
> would probably be a good idea.

It looks like this is just a docs hangover which can be fixed; sw_sync
is no longer part of the unified Android kernel image, so it can no
longer be relied on post-Treble. So let's just continue on the
assumption that sw_sync is not anything anyone can rely on.

Cheers,
Daniel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx