[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Chris Wilson
On Wed, Mar 20, 2013 at 09:40:04AM +0100, Maarten Lankhorst wrote:
> Is the drmSetInterfaceVersion call really needed here? If I look at 
> DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like 
> for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side 
> effects.
> This would end up with cleaner code here, and drop the master requirement 
> entirely.

Indeed, it does look like drmSetVersion() at that point is overkill.
Instead we will hit the race later in the drivers. For the purposes of
clearer code, we could happily lose that drmSetVersion().

> Of course there's still a race that needs to be investigated, and is 
> currently not completely understood, I think.

We are all in agreement. Ultimately we want to root cause the race, in
the meantime we need a fallback to make sure that no desktop is left
behind!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Maarten Lankhorst
Op 20-03-13 09:40, Maarten Lankhorst schreef:
> Hey,
>
> Op 19-03-13 22:13, Chris Wilson schreef:
>> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>>> The drmSetMaster call is needed, but the spinning is really just waiting 
>>> for the workqueue to run.
>>>
>>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
>>> for a few seconds before timing out. That call
>>> was failing because his patch series never tried to obtain drm master.
>> You missed that the series Bryce posted did contain the drmSetMaster()
>> call inside the loop to retry drmSetVersion(). :)
>>
>>
> Oh I must have missed that.
>
> Is the drmSetInterfaceVersion call really needed here? If I look at 
> DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like 
> for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side 
> effects.
> This would end up with cleaner code here, and drop the master requirement 
> entirely.
>
> Of course there's still a race that needs to be investigated, and is 
> currently not completely understood, I think.
>
Or worse, is that drmGetBusId call there even useful? From digging at the 
kernel it seems it's a per master value.
So if a device is hotplugged, it wouldn't be set yet. If someone else holds 
master, it wouldn't be set either.
In fact it would only be ever set from DRIOpenDRMMaster, but that call only 
happens a lot later, if it even happens at all.

It seems to me like opening the fd there should be removed entirely, and the 
bus id should be retrieved from the udev event instead.

I'll try to get something working for this.

~Maarten


[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Maarten Lankhorst
Hey,

Op 19-03-13 22:13, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>> The drmSetMaster call is needed, but the spinning is really just waiting for 
>> the workqueue to run.
>>
>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
>> for a few seconds before timing out. That call
>> was failing because his patch series never tried to obtain drm master.
> You missed that the series Bryce posted did contain the drmSetMaster()
> call inside the loop to retry drmSetVersion(). :)
>
>
Oh I must have missed that.

Is the drmSetInterfaceVersion call really needed here? If I look at 
DRM_IOCTL_GET_UNIQUE,
I don't see any requirement of drm master or anything, so it looks to me like 
for this specific race
the drmSetInterfaceVersion call can be skipped entirely without any side 
effects.
This would end up with cleaner code here, and drop the master requirement 
entirely.

Of course there's still a race that needs to be investigated, and is currently 
not completely understood, I think.

~Maarten



Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Chris Wilson
On Wed, Mar 20, 2013 at 09:40:04AM +0100, Maarten Lankhorst wrote:
> Is the drmSetInterfaceVersion call really needed here? If I look at 
> DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like 
> for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side 
> effects.
> This would end up with cleaner code here, and drop the master requirement 
> entirely.

Indeed, it does look like drmSetVersion() at that point is overkill.
Instead we will hit the race later in the drivers. For the purposes of
clearer code, we could happily lose that drmSetVersion().
 
> Of course there's still a race that needs to be investigated, and is 
> currently not completely understood, I think.

We are all in agreement. Ultimately we want to root cause the race, in
the meantime we need a fallback to make sure that no desktop is left
behind!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Maarten Lankhorst
Op 20-03-13 09:40, Maarten Lankhorst schreef:
> Hey,
>
> Op 19-03-13 22:13, Chris Wilson schreef:
>> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>>> The drmSetMaster call is needed, but the spinning is really just waiting 
>>> for the workqueue to run.
>>>
>>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
>>> for a few seconds before timing out. That call
>>> was failing because his patch series never tried to obtain drm master.
>> You missed that the series Bryce posted did contain the drmSetMaster()
>> call inside the loop to retry drmSetVersion(). :)
>>
>>
> Oh I must have missed that.
>
> Is the drmSetInterfaceVersion call really needed here? If I look at 
> DRM_IOCTL_GET_UNIQUE,
> I don't see any requirement of drm master or anything, so it looks to me like 
> for this specific race
> the drmSetInterfaceVersion call can be skipped entirely without any side 
> effects.
> This would end up with cleaner code here, and drop the master requirement 
> entirely.
>
> Of course there's still a race that needs to be investigated, and is 
> currently not completely understood, I think.
>
Or worse, is that drmGetBusId call there even useful? From digging at the 
kernel it seems it's a per master value.
So if a device is hotplugged, it wouldn't be set yet. If someone else holds 
master, it wouldn't be set either.
In fact it would only be ever set from DRIOpenDRMMaster, but that call only 
happens a lot later, if it even happens at all.

It seems to me like opening the fd there should be removed entirely, and the 
bus id should be retrieved from the udev event instead.

I'll try to get something working for this.

~Maarten
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-20 Thread Maarten Lankhorst
Hey,

Op 19-03-13 22:13, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
>> The drmSetMaster call is needed, but the spinning is really just waiting for 
>> the workqueue to run.
>>
>> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
>> for a few seconds before timing out. That call
>> was failing because his patch series never tried to obtain drm master.
> You missed that the series Bryce posted did contain the drmSetMaster()
> call inside the loop to retry drmSetVersion(). :)
>
>
Oh I must have missed that.

Is the drmSetInterfaceVersion call really needed here? If I look at 
DRM_IOCTL_GET_UNIQUE,
I don't see any requirement of drm master or anything, so it looks to me like 
for this specific race
the drmSetInterfaceVersion call can be skipped entirely without any side 
effects.
This would end up with cleaner code here, and drop the master requirement 
entirely.

Of course there's still a race that needs to be investigated, and is currently 
not completely understood, I think.

~Maarten

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Chris Wilson
On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
> The drmSetMaster call is needed, but the spinning is really just waiting for 
> the workqueue to run.
> 
> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
> for a few seconds before timing out. That call
> was failing because his patch series never tried to obtain drm master.

You missed that the series Bryce posted did contain the drmSetMaster()
call inside the loop to retry drmSetVersion(). :)

Your explanation as to why the delay is required is certainly
intriguing. Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Dave Airlie
>
> Because of the delayed fput in recent kernels, it is possible for plymouth to 
> exit and not drop master right away.
> It's put onto a workqueue to be freed slightly later. Xorg-server starts in 
> the meantime, opens a fd, but because the fd
> hasn't been closed by plymouth yet, it didn't get implicitly authenticated 
> and it didn't get drm master either.
>

I thought plymouth explicitly dropped master, and closed later. I know
we "ab"use that fact on Fedora so X can grab the bo from plymouth
before it exits.

Dave.


Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Chris Wilson
On Tue, Mar 19, 2013 at 11:50:47AM +0100, Maarten Lankhorst wrote:
> The drmSetMaster call is needed, but the spinning is really just waiting for 
> the workqueue to run.
> 
> bryce's patch never worked, it just caused it to try drmsetinterfaceversion 
> for a few seconds before timing out. That call
> was failing because his patch series never tried to obtain drm master.

You missed that the series Bryce posted did contain the drmSetMaster()
call inside the loop to retry drmSetVersion(). :)

Your explanation as to why the delay is required is certainly
intriguing. Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Maarten Lankhorst
Op 19-03-13 12:10, Dave Airlie schreef:
>> Because of the delayed fput in recent kernels, it is possible for plymouth 
>> to exit and not drop master right away.
>> It's put onto a workqueue to be freed slightly later. Xorg-server starts in 
>> the meantime, opens a fd, but because the fd
>> hasn't been closed by plymouth yet, it didn't get implicitly authenticated 
>> and it didn't get drm master either.
>>
> I thought plymouth explicitly dropped master, and closed later. I know
> we "ab"use that fact on Fedora so X can grab the bo from plymouth
> before it exits.
>
> Dave.
> ___
> xorg-devel at lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
>
Well from trying the dropmaster kernel patch, it simply looks like there are 
just too many places that could get affected by this assumption.

Lets just try something ugly in the flush callback that's called before final 
fput instead, that should fix all our problems!

XXX: the big if is duplicated from drm_release, and it should probably be split 
into a separate function.
However if you're hit by the plymouth race, this might be a good thing to try.

The fix for drivers other than radeon/i915 is left as an excercise for the 
reader.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..ecf8689 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -177,6 +177,50 @@ err:
 }
 EXPORT_SYMBOL(drm_open);

+int drm_flush(struct file *filp, fl_owner_t id)
+{
+   struct drm_file *file_priv = filp->private_data;
+   struct drm_device *dev = file_priv->minor->dev;
+
+   if (atomic_long_read(&filp->f_count) != 1 || !file_priv->is_master)
+   return 0;
+
+   mutex_lock(&dev->struct_mutex);
+
+   if (file_priv->is_master) {
+   struct drm_master *master = file_priv->master;
+   struct drm_file *temp;
+   list_for_each_entry(temp, &dev->filelist, lhead) {
+   if ((temp->master == file_priv->master) &&
+   (temp != file_priv))
+   temp->authenticated = 0;
+   }
+
+   /**
+* Since the master is disappearing, so is the
+* possibility to lock.
+*/
+
+   if (master->lock.hw_lock) {
+   if (dev->sigdata.lock == master->lock.hw_lock)
+   dev->sigdata.lock = NULL;
+   master->lock.hw_lock = NULL;
+   master->lock.file_priv = NULL;
+   wake_up_interruptible_all(&master->lock.lock_queue);
+   }
+
+   if (file_priv->minor->master == file_priv->master) {
+   /* drop the reference held my the minor */
+   if (dev->driver->master_drop)
+   dev->driver->master_drop(dev, file_priv, true);
+   drm_master_put(&file_priv->minor->master);
+   }
+   }
+   mutex_unlock(&dev->struct_mutex);
+   return 0;
+}
+EXPORT_SYMBOL(drm_flush);
+
 /**
  * File \c open operation.
  *
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 62aaf8d..6dcfec3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1018,6 +1018,7 @@ static const struct vm_operations_struct i915_gem_vm_ops 
= {
 static const struct file_operations i915_driver_fops = {
.owner = THIS_MODULE,
.open = drm_open,
+   .flush = drm_flush,
.release = drm_release,
.unlocked_ioctl = drm_ioctl,
.mmap = drm_gem_mmap,
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 5cdd684..2c439f9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -361,6 +361,7 @@ radeon_pci_resume(struct pci_dev *pdev)
 static const struct file_operations radeon_driver_kms_fops = {
.owner = THIS_MODULE,
.open = drm_open,
+   .flush = drm_flush,
.release = drm_release,
.unlocked_ioctl = drm_ioctl,
.mmap = radeon_mmap,
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 6cd30db..2a4f97d 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -1320,6 +1320,8 @@ extern int drm_stub_open(struct inode *inode, struct file 
*filp);
 extern int drm_fasync(int fd, struct file *filp, int on);
 extern ssize_t drm_read(struct file *filp, char __user *buffer,
size_t count, loff_t *offset);
+
+extern int drm_flush(struct file *filp, fl_owner_t id);
 extern int drm_release(struct inode *inode, struct file *filp);

/* Mapping support (drm_vm.h) */



[PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Maarten Lankhorst
Hey,

Op 19-03-13 11:27, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:02:14AM +0100, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 19-03-13 10:21, Chris Wilson schreef:
>>> On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
 Update:  Squashes a couple commits to avoid potential hang if
 git bisecting.  No other changes from v1.
>>> I'd probably drop the last EAGAIN patch as that is part of the libdrm
>>> API, but other than that it looks to be a reasonably self-contained w/a
>>> for this perplexing problem.
>>>
>>> Reviewed-by: Chris Wilson 
>>> -Chris
>>>
>> And completely wrong, version I pushed to ubuntu's xorg-server for 
>> comparison:
>>
>> Nacked-by: Maarten Lankhorst 
> So you pushed the busy-spin into drmSetMaster(), which is just a tighter
> variant of the above.
>
> Anything which adds the minimal delay, warns about that delay, and
> works around the issue is fine by me.
> -Chris

Here's what I think is happening, based on the information I have.

Because of the delayed fput in recent kernels, it is possible for plymouth to 
exit and not drop master right away.
It's put onto a workqueue to be freed slightly later. Xorg-server starts in the 
meantime, opens a fd, but because the fd
hasn't been closed by plymouth yet, it didn't get implicitly authenticated and 
it didn't get drm master either.

The drmSetMaster call is needed, but the spinning is really just waiting for 
the workqueue to run.

bryce's patch never worked, it just caused it to try drmsetinterfaceversion for 
a few seconds before timing out. That call
was failing because his patch series never tried to obtain drm master.

The get_drm_info call also makes it more likely to run into the same problem as 
well. It opens the fd and immediately
closes it again. This will re-trigger the race..

For testing I did a small patch in the drm core that drops drm master when 
opening device.
The patch is attached inline below.

radeon and intel driver both fail to load with it. Intel doesn't return an 
error, and falls back silently to modesetting.
radeon however complains similar to this:

[42.876] (==) RADEON(G0): Depth 24, (--) framebuffer bpp 32
[42.876] (II) RADEON(G0): Pixel depth = 24 bits stored in 4 bytes (32 bpp 
pixmaps)
[42.876] (==) RADEON(G0): Default visual is TrueColor
[42.876] (==) RADEON(G0): RGB weight 888
[42.876] (II) RADEON(G0): Using 8 bits per RGB (8 bit DAC)
[42.876] (--) RADEON(G0): Chipset: "TURKS" (ChipID = 0x6741)
[42.961] (EE) RADEON(G0): [drm] failed to set drm interface version.
[42.961] (EE) RADEON(G0): Kernel modesetting setup failed

I've seen this error before in one of the races, so it's not just a theoretical 
issue. Just another possible failure mode.

I think all drivers have to be fixed to handle this case correctly, and they 
should probably all do the same spinning as well.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..1d3099f 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -339,6 +339,7 @@ static int drm_open_helper(struct inode *inode, struct file 
*filp,
}
}
mutex_unlock(&dev->struct_mutex);
+   drm_dropmaster_ioctl(dev, NULL, priv);
} else {
/* get a reference to the master */
priv->master = drm_master_get(priv->minor->master);



Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Maarten Lankhorst
Op 19-03-13 12:10, Dave Airlie schreef:
>> Because of the delayed fput in recent kernels, it is possible for plymouth 
>> to exit and not drop master right away.
>> It's put onto a workqueue to be freed slightly later. Xorg-server starts in 
>> the meantime, opens a fd, but because the fd
>> hasn't been closed by plymouth yet, it didn't get implicitly authenticated 
>> and it didn't get drm master either.
>>
> I thought plymouth explicitly dropped master, and closed later. I know
> we "ab"use that fact on Fedora so X can grab the bo from plymouth
> before it exits.
>
> Dave.
> ___
> xorg-de...@lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
>
Well from trying the dropmaster kernel patch, it simply looks like there are 
just too many places that could get affected by this assumption.

Lets just try something ugly in the flush callback that's called before final 
fput instead, that should fix all our problems!

XXX: the big if is duplicated from drm_release, and it should probably be split 
into a separate function.
However if you're hit by the plymouth race, this might be a good thing to try.

The fix for drivers other than radeon/i915 is left as an excercise for the 
reader.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..ecf8689 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -177,6 +177,50 @@ err:
 }
 EXPORT_SYMBOL(drm_open);
 
+int drm_flush(struct file *filp, fl_owner_t id)
+{
+   struct drm_file *file_priv = filp->private_data;
+   struct drm_device *dev = file_priv->minor->dev;
+
+   if (atomic_long_read(&filp->f_count) != 1 || !file_priv->is_master)
+   return 0;
+
+   mutex_lock(&dev->struct_mutex);
+
+   if (file_priv->is_master) {
+   struct drm_master *master = file_priv->master;
+   struct drm_file *temp;
+   list_for_each_entry(temp, &dev->filelist, lhead) {
+   if ((temp->master == file_priv->master) &&
+   (temp != file_priv))
+   temp->authenticated = 0;
+   }
+
+   /**
+* Since the master is disappearing, so is the
+* possibility to lock.
+*/
+
+   if (master->lock.hw_lock) {
+   if (dev->sigdata.lock == master->lock.hw_lock)
+   dev->sigdata.lock = NULL;
+   master->lock.hw_lock = NULL;
+   master->lock.file_priv = NULL;
+   wake_up_interruptible_all(&master->lock.lock_queue);
+   }
+
+   if (file_priv->minor->master == file_priv->master) {
+   /* drop the reference held my the minor */
+   if (dev->driver->master_drop)
+   dev->driver->master_drop(dev, file_priv, true);
+   drm_master_put(&file_priv->minor->master);
+   }
+   }
+   mutex_unlock(&dev->struct_mutex);
+   return 0;
+}
+EXPORT_SYMBOL(drm_flush);
+
 /**
  * File \c open operation.
  *
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 62aaf8d..6dcfec3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1018,6 +1018,7 @@ static const struct vm_operations_struct i915_gem_vm_ops 
= {
 static const struct file_operations i915_driver_fops = {
.owner = THIS_MODULE,
.open = drm_open,
+   .flush = drm_flush,
.release = drm_release,
.unlocked_ioctl = drm_ioctl,
.mmap = drm_gem_mmap,
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 5cdd684..2c439f9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -361,6 +361,7 @@ radeon_pci_resume(struct pci_dev *pdev)
 static const struct file_operations radeon_driver_kms_fops = {
.owner = THIS_MODULE,
.open = drm_open,
+   .flush = drm_flush,
.release = drm_release,
.unlocked_ioctl = drm_ioctl,
.mmap = radeon_mmap,
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 6cd30db..2a4f97d 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -1320,6 +1320,8 @@ extern int drm_stub_open(struct inode *inode, struct file 
*filp);
 extern int drm_fasync(int fd, struct file *filp, int on);
 extern ssize_t drm_read(struct file *filp, char __user *buffer,
size_t count, loff_t *offset);
+
+extern int drm_flush(struct file *filp, fl_owner_t id);
 extern int drm_release(struct inode *inode, struct file *filp);
 
/* Mapping support (drm_vm.h) */

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://

Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Dave Airlie
>
> Because of the delayed fput in recent kernels, it is possible for plymouth to 
> exit and not drop master right away.
> It's put onto a workqueue to be freed slightly later. Xorg-server starts in 
> the meantime, opens a fd, but because the fd
> hasn't been closed by plymouth yet, it didn't get implicitly authenticated 
> and it didn't get drm master either.
>

I thought plymouth explicitly dropped master, and closed later. I know
we "ab"use that fact on Fedora so X can grab the bo from plymouth
before it exits.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 0/7] xfree86: Handle drm race condition

2013-03-19 Thread Maarten Lankhorst
Hey,

Op 19-03-13 11:27, Chris Wilson schreef:
> On Tue, Mar 19, 2013 at 11:02:14AM +0100, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 19-03-13 10:21, Chris Wilson schreef:
>>> On Mon, Mar 18, 2013 at 01:51:44PM -0700, Bryce Harrington wrote:
 Update:  Squashes a couple commits to avoid potential hang if
 git bisecting.  No other changes from v1.
>>> I'd probably drop the last EAGAIN patch as that is part of the libdrm
>>> API, but other than that it looks to be a reasonably self-contained w/a
>>> for this perplexing problem.
>>>
>>> Reviewed-by: Chris Wilson 
>>> -Chris
>>>
>> And completely wrong, version I pushed to ubuntu's xorg-server for 
>> comparison:
>>
>> Nacked-by: Maarten Lankhorst 
> So you pushed the busy-spin into drmSetMaster(), which is just a tighter
> variant of the above.
>
> Anything which adds the minimal delay, warns about that delay, and
> works around the issue is fine by me.
> -Chris

Here's what I think is happening, based on the information I have.

Because of the delayed fput in recent kernels, it is possible for plymouth to 
exit and not drop master right away.
It's put onto a workqueue to be freed slightly later. Xorg-server starts in the 
meantime, opens a fd, but because the fd
hasn't been closed by plymouth yet, it didn't get implicitly authenticated and 
it didn't get drm master either.

The drmSetMaster call is needed, but the spinning is really just waiting for 
the workqueue to run.

bryce's patch never worked, it just caused it to try drmsetinterfaceversion for 
a few seconds before timing out. That call
was failing because his patch series never tried to obtain drm master.

The get_drm_info call also makes it more likely to run into the same problem as 
well. It opens the fd and immediately
closes it again. This will re-trigger the race..

For testing I did a small patch in the drm core that drops drm master when 
opening device.
The patch is attached inline below.

radeon and intel driver both fail to load with it. Intel doesn't return an 
error, and falls back silently to modesetting.
radeon however complains similar to this:

[42.876] (==) RADEON(G0): Depth 24, (--) framebuffer bpp 32
[42.876] (II) RADEON(G0): Pixel depth = 24 bits stored in 4 bytes (32 bpp 
pixmaps)
[42.876] (==) RADEON(G0): Default visual is TrueColor
[42.876] (==) RADEON(G0): RGB weight 888
[42.876] (II) RADEON(G0): Using 8 bits per RGB (8 bit DAC)
[42.876] (--) RADEON(G0): Chipset: "TURKS" (ChipID = 0x6741)
[42.961] (EE) RADEON(G0): [drm] failed to set drm interface version.
[42.961] (EE) RADEON(G0): Kernel modesetting setup failed

I've seen this error before in one of the races, so it's not just a theoretical 
issue. Just another possible failure mode.

I think all drivers have to be fixed to handle this case correctly, and they 
should probably all do the same spinning as well.

diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index f369429..1d3099f 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -339,6 +339,7 @@ static int drm_open_helper(struct inode *inode, struct file 
*filp,
}
}
mutex_unlock(&dev->struct_mutex);
+   drm_dropmaster_ioctl(dev, NULL, priv);
} else {
/* get a reference to the master */
priv->master = drm_master_get(priv->minor->master);

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel