Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-05-08 Thread Jakob Bornecrantz
2009/5/8 Kristian Høgsberg :
> From: Kristian Høgsberg 
>
> This patch adds a vblank synced pageflip ioctl for to the modesetting
> family of ioctls.  The ioctl takes a crtc and an fb and schedules a
> pageflip to the new fb at the next coming vertical blank event.  This
> feature lets userspace implement tear-free updating of the screen contents
> with hw-guaranteed low latency page flipping.
>
> The ioctl is asynchronous in that it returns immediately and then later
> notifies the client by making an event available for reading on the drm fd.
> This lets applications add the drm fd to their main loop and handle other
> tasks while waiting for the flip to happen.  The event includes the time
> of the flip, the frame counter and a 64 bit opaque token provided by
> user space in the ioctl.

Great idea with the poll/read stuff.

I think you have the userspace <-> kernel part pretty much all figured out now.

[SNIP]

> +
> +/**
> + * drm_mode_page_flip - page flip ioctl
> + * @dev: DRM device
> + * @data: ioctl args
> + * @file_priv: file private data
> + *
> + * The page flip ioctl replaces the current front buffer with a new one,
> + * using the CRTC's mode_set_base function, which should just update
> + * the front buffer base pointer.  It's up to mode_set_base to make
> + * sure the update doesn't result in tearing (on some hardware the base
> + * register is double buffered, so this is easy).
> + *
> + * Note that this covers just the simple case of flipping the front
> + * buffer immediately.  Interval handling and interlaced modes have to
> + * be handled by userspace, or with new ioctls.
> + */
> +int drm_mode_page_flip(struct drm_device *dev, void *data,
> +                      struct drm_file *file_priv)

This function should have _ioctl appended to it. It should also go
into the drm_crtc.c file. And should not use drm_crtc_helper functions
and structs since those are not apart of the core kms implementation.

Also I rather see as much as possible of the code below be put in the
generic part and not just wrap this functions in a small function.

[SNIP]

> +       /*
> +        * The mode_set_base call will change the domain on the new
> +        * fb, which will force the rendering to finish and block the
> +        * ioctl.  We need to do this last part from a work queue, to
> +        * avoid blocking userspace here.
> +        */
> +       crtc->fb = obj_to_fb(fb_obj);
> +       ret = crtc_funcs->mode_set_base(crtc, 0, 0, NULL);

We should not block on rendering in the ioctl but do the call from a
worker thread. And you probably need one thread per crtc.

Also holding the struct_mutex while waiting for rendering might blow
up in your face big time if the rendering command takes a long while
to complete.

[SNIP]

> +
> +       /*
> +        * Unpin the old fb after setting a mode.  Must be called
> +        * after the old framebuffer is no longer visible, ie, after
> +        * the next vblank, typically.
> +        */
> +       void (*mode_unpin_fb)(struct drm_crtc *crtc, struct drm_framebuffer 
> *fb);

I was going to say that should probably go on the framebuffer
functions but the framebuffer doesn't have any helper functions.

Cheers Jakob.

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Thomas Hellström
Kristian Høgsberg wrote:
> This patch adds a vblank synced pageflip ioctl for to the modesetting
> family of ioctls.  The ioctl takes a crtc and an fb and schedules a
> pageflip to the new fb at the next coming vertical blank event.  This
> feature lets userspace implement tear-free updating of the screen contents
> with hw-guaranteed low latency page flipping.
>
> The ioctl is asynchronous in that it returns immediately and then later
> notifies the client by making an event available for reading on the drm fd.
> This lets applications add the drm fd to their main loop and handle other
> tasks while waiting for the flip to happen.  The event includes the time
> of the flip, the frame counter and a 64 bit opaque token provided by
> user space in the ioctl.
>
> Based on work and suggestions from
>   Jesse Barnes ,
>   Jakob Bornecrantz ,
>   Chris Wilson 
>
> Signed-off-by: Kristian Høgsberg 
> Signed-off-by: Jesse Barnes 
> ---
>
> Ok, another version of this patch.  This one has fixes to work with radeon
> kms plus a missing list head init that would cause an oops in the case
> where we schedule a flip before the previous one has been queued.
>
> I'm now ready to propose this patch for the 2.6.32 merge window.
>
>   
Hi!
First a general question: There is some hardware (for example Unichromes 
and Chrome9) that can schedule
page-flips in the command stream after optional vblank barriers. For 
this kind of hardware the pageflip would be a matter of scheduling the 
flip and fence the old and new buffers.  No need for delayed unpins and 
explicit user-space notifications, although the workqueue could be handy 
to avoid command processing stalls in the vblank barriers. How would 
such a scheme fit into the framework below?

> Kristian
>
>  drivers/gpu/drm/drm_crtc.c  |  170 
> ++-
>  drivers/gpu/drm/drm_crtc_helper.c   |   12 ++
>  drivers/gpu/drm/drm_drv.c   |1 +
>  drivers/gpu/drm/drm_fops.c  |   68 -
>  drivers/gpu/drm/drm_irq.c   |   43 
>  drivers/gpu/drm/i915/i915_drv.c |1 +
>  drivers/gpu/drm/i915/intel_display.c|   24 +++--
>  drivers/gpu/drm/radeon/radeon_display.c |3 +-
>  include/drm/drm.h   |   25 +
>  include/drm/drmP.h  |   32 ++
>  include/drm/drm_crtc.h  |   27 +
>  include/drm/drm_crtc_helper.h   |4 +
>  include/drm/drm_mode.h  |   16 +++
>  13 files changed, 414 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 8fab789..0906cb3 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -34,6 +34,8 @@
>  #include "drmP.h"
>  #include "drm_crtc.h"
>  
> +#undef set_base
> +
>  struct drm_prop_enum_list {
>   int type;
>   char *name;
> @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb)
>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>  
>  /**
> + * drm_crtc_async_flip - do a set_base call from a work queue
> + * @work: work struct
> + *
> + * Called when a set_base call is queued by the page flip code.  This
> + * allows the flip ioctl itself to return immediately and allow userspace
> + * to continue working.
> + */
> +static void drm_crtc_async_flip(struct work_struct *work)
> +{
> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip);
> + struct drm_device *dev = crtc->dev;
> + struct drm_pending_flip *pending;
> +
> + BUG_ON(crtc->pending_flip == NULL);
> +
> + mutex_lock(&dev->struct_mutex);
>   

Here the struct mutex is taken before a call to a function that doesn't 
even have dev as an argument.
Exactly what in the generic modesetting code is protected by the struct 
mutex here?
If the driver needs the struct mutex to protect internal data 
structures, let the driver take care of the locking. If the struct mutex 
protects the dev->flip_list, then we should take it around the 
manipulation of that list only. I'd hate to see the not-too-careful use 
of struct_mutex in some drivers leak out to generic code.

> + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL);
> +
> + pending = crtc->pending_flip;
> + crtc->pending_flip = NULL;
> +
> + pending->frame = drm_vblank_count(dev, crtc->pipe);
> + list_add_tail(&pending->link, &dev->flip_list);
> +
> + mutex_unlock(&dev->struct_mutex);
> +}
> +
> +/**
>   * drm_crtc_init - Initialise a new CRTC object
>   * @dev: DRM device
>   * @crtc: CRTC object to init
> @@ -352,17 +382,19 @@ EXPORT_SYMBOL(drm_framebuffer_cleanup);
>   *
>   * Inits a new object created as base part of an driver crtc object.
>   */
> -void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc,
> +void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, int pipe,
>  const struct drm_crtc_funcs *funcs)
>  {
>   crtc->dev = dev;
> 

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Jesse Barnes
On Mon, 17 Aug 2009 22:22:34 +0200
Thomas Hellström  wrote:

> Kristian Høgsberg wrote:
> > This patch adds a vblank synced pageflip ioctl for to the
> > modesetting family of ioctls.  The ioctl takes a crtc and an fb and
> > schedules a pageflip to the new fb at the next coming vertical
> > blank event.  This feature lets userspace implement tear-free
> > updating of the screen contents with hw-guaranteed low latency page
> > flipping.
> >
> > The ioctl is asynchronous in that it returns immediately and then
> > later notifies the client by making an event available for reading
> > on the drm fd. This lets applications add the drm fd to their main
> > loop and handle other tasks while waiting for the flip to happen.
> > The event includes the time of the flip, the frame counter and a 64
> > bit opaque token provided by user space in the ioctl.
> >
> > Based on work and suggestions from
> > Jesse Barnes ,
> > Jakob Bornecrantz ,
> > Chris Wilson 
> >
> > Signed-off-by: Kristian Høgsberg 
> > Signed-off-by: Jesse Barnes 
> > ---
> >
> > Ok, another version of this patch.  This one has fixes to work with
> > radeon kms plus a missing list head init that would cause an oops
> > in the case where we schedule a flip before the previous one has
> > been queued.
> >
> > I'm now ready to propose this patch for the 2.6.32 merge window.
> >
> >   
> Hi!
> First a general question: There is some hardware (for example
> Unichromes and Chrome9) that can schedule
> page-flips in the command stream after optional vblank barriers. For 
> this kind of hardware the pageflip would be a matter of scheduling
> the flip and fence the old and new buffers.  No need for delayed
> unpins and explicit user-space notifications, although the workqueue
> could be handy to avoid command processing stalls in the vblank
> barriers. How would such a scheme fit into the framework below?

If you fence the flip, doesn't that mean you only unpin after it's
completed?  The initial version of this patch used a command stream
flip, but I still had to worry about pinning...

If you don't need the userspace notifications we could probably factor
that out; do you see any issues with the flip ioctl itself?  If not, we
can change things as needed if/when more drivers support it...

> A couple of years ago, any attempt to return anything else than 0
> from drm poll resulted in an X server error.
> http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix
> mentioned in the bug was actually to return 0 from drm poll, and a
> comment about this is still present in drm.git. The above breaks drm
> for old X servers and all drivers, which I think is against drm
> policy?

Ouch... I remember Kristian had some issues with this part, but I don't
think this was one of them.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Kristian Høgsberg
2009/8/17 Thomas Hellström :
> Kristian Høgsberg wrote:
>> This patch adds a vblank synced pageflip ioctl for to the modesetting
>> family of ioctls.  The ioctl takes a crtc and an fb and schedules a
>> pageflip to the new fb at the next coming vertical blank event.  This
>> feature lets userspace implement tear-free updating of the screen contents
>> with hw-guaranteed low latency page flipping.
>>
>> The ioctl is asynchronous in that it returns immediately and then later
>> notifies the client by making an event available for reading on the drm fd.
>> This lets applications add the drm fd to their main loop and handle other
>> tasks while waiting for the flip to happen.  The event includes the time
>> of the flip, the frame counter and a 64 bit opaque token provided by
>> user space in the ioctl.
>>
>> Based on work and suggestions from
>>       Jesse Barnes ,
>>       Jakob Bornecrantz ,
>>       Chris Wilson 
>>
>> Signed-off-by: Kristian Høgsberg 
>> Signed-off-by: Jesse Barnes 
>> ---
>>
>> Ok, another version of this patch.  This one has fixes to work with radeon
>> kms plus a missing list head init that would cause an oops in the case
>> where we schedule a flip before the previous one has been queued.
>>
>> I'm now ready to propose this patch for the 2.6.32 merge window.
>>
>>
> Hi!
> First a general question: There is some hardware (for example Unichromes
> and Chrome9) that can schedule
> page-flips in the command stream after optional vblank barriers. For
> this kind of hardware the pageflip would be a matter of scheduling the
> flip and fence the old and new buffers.  No need for delayed unpins and
> explicit user-space notifications, although the workqueue could be handy
> to avoid command processing stalls in the vblank barriers. How would
> such a scheme fit into the framework below?

By sending an event back on the file descriptor we allow users of the
API to wait on the flip to finish in a standard poll or select
mainloop, where it can handle input from other sources while waiting.
If you rely on fences, the application will block when it tries to
access the buffers protected by the fence, and is unable to handle
input from other sources while it's blocking.

Even with the vsync barrier in the command stream (which intel hw also
supports) you do need to keep the current front buffer pinned for the
remainder of the frame, and then you need notification when the swap
has happened so that you can unpin the old buffer.  I'm sure Unichrome
has a command to send an interrupt, which can be queued after the
vsync barrier to run the workqueue and trigger this cleanup as well as
sending the userspace notification.

>> Kristian
>>
>>  drivers/gpu/drm/drm_crtc.c              |  170 
>> ++-
>>  drivers/gpu/drm/drm_crtc_helper.c       |   12 ++
>>  drivers/gpu/drm/drm_drv.c               |    1 +
>>  drivers/gpu/drm/drm_fops.c              |   68 -
>>  drivers/gpu/drm/drm_irq.c               |   43 
>>  drivers/gpu/drm/i915/i915_drv.c         |    1 +
>>  drivers/gpu/drm/i915/intel_display.c    |   24 +++--
>>  drivers/gpu/drm/radeon/radeon_display.c |    3 +-
>>  include/drm/drm.h                       |   25 +
>>  include/drm/drmP.h                      |   32 ++
>>  include/drm/drm_crtc.h                  |   27 +
>>  include/drm/drm_crtc_helper.h           |    4 +
>>  include/drm/drm_mode.h                  |   16 +++
>>  13 files changed, 414 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
>> index 8fab789..0906cb3 100644
>> --- a/drivers/gpu/drm/drm_crtc.c
>> +++ b/drivers/gpu/drm/drm_crtc.c
>> @@ -34,6 +34,8 @@
>>  #include "drmP.h"
>>  #include "drm_crtc.h"
>>
>> +#undef set_base
>> +
>>  struct drm_prop_enum_list {
>>       int type;
>>       char *name;
>> @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb)
>>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>>
>>  /**
>> + * drm_crtc_async_flip - do a set_base call from a work queue
>> + * @work: work struct
>> + *
>> + * Called when a set_base call is queued by the page flip code.  This
>> + * allows the flip ioctl itself to return immediately and allow userspace
>> + * to continue working.
>> + */
>> +static void drm_crtc_async_flip(struct work_struct *work)
>> +{
>> +     struct drm_crtc *crtc = container_of(work, struct drm_crtc, 
>> async_flip);
>> +     struct drm_device *dev = crtc->dev;
>> +     struct drm_pending_flip *pending;
>> +
>> +     BUG_ON(crtc->pending_flip == NULL);
>> +
>> +     mutex_lock(&dev->struct_mutex);
>>
>
> Here the struct mutex is taken before a call to a function that doesn't
> even have dev as an argument.

It's a work queue callback, there's no way we can change that to
include a dev argument :)

> Exactly what in the generic modesetting code is protected by the struct
> mutex here?

I'm a little fuzzy on the details, this part of the patch was carried
over from Jesse's or

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Kristian Høgsberg
On Mon, Aug 17, 2009 at 4:36 PM, Jesse Barnes wrote:
> On Mon, 17 Aug 2009 22:22:34 +0200
> Thomas Hellström  wrote:
>
>> Kristian Høgsberg wrote:
>> > This patch adds a vblank synced pageflip ioctl for to the
>> > modesetting family of ioctls.  The ioctl takes a crtc and an fb and
>> > schedules a pageflip to the new fb at the next coming vertical
>> > blank event.  This feature lets userspace implement tear-free
>> > updating of the screen contents with hw-guaranteed low latency page
>> > flipping.
>> >
>> > The ioctl is asynchronous in that it returns immediately and then
>> > later notifies the client by making an event available for reading
>> > on the drm fd. This lets applications add the drm fd to their main
>> > loop and handle other tasks while waiting for the flip to happen.
>> > The event includes the time of the flip, the frame counter and a 64
>> > bit opaque token provided by user space in the ioctl.
>> >
>> > Based on work and suggestions from
>> >     Jesse Barnes ,
>> >     Jakob Bornecrantz ,
>> >     Chris Wilson 
>> >
>> > Signed-off-by: Kristian Høgsberg 
>> > Signed-off-by: Jesse Barnes 
>> > ---
>> >
>> > Ok, another version of this patch.  This one has fixes to work with
>> > radeon kms plus a missing list head init that would cause an oops
>> > in the case where we schedule a flip before the previous one has
>> > been queued.
>> >
>> > I'm now ready to propose this patch for the 2.6.32 merge window.
>> >
>> >
>> Hi!
>> First a general question: There is some hardware (for example
>> Unichromes and Chrome9) that can schedule
>> page-flips in the command stream after optional vblank barriers. For
>> this kind of hardware the pageflip would be a matter of scheduling
>> the flip and fence the old and new buffers.  No need for delayed
>> unpins and explicit user-space notifications, although the workqueue
>> could be handy to avoid command processing stalls in the vblank
>> barriers. How would such a scheme fit into the framework below?
>
> If you fence the flip, doesn't that mean you only unpin after it's
> completed?  The initial version of this patch used a command stream
> flip, but I still had to worry about pinning...
>
> If you don't need the userspace notifications we could probably factor
> that out; do you see any issues with the flip ioctl itself?  If not, we
> can change things as needed if/when more drivers support it...

We need the async notiifcation for AIGLX.  We can't just block the X
server on the flip finishing, we need to be able to suspend and resume
the AIGLX client.

>> A couple of years ago, any attempt to return anything else than 0
>> from drm poll resulted in an X server error.
>> http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix
>> mentioned in the bug was actually to return 0 from drm poll, and a
>> comment about this is still present in drm.git. The above breaks drm
>> for old X servers and all drivers, which I think is against drm
>> policy?
>
> Ouch... I remember Kristian had some issues with this part, but I don't
> think this was one of them.

The trouble I ran into here is that the WakeupHandler call chain is
called even if select returns with an error, specifically ERESTART
when the input SIGIO fires.  When that happens the select read fd_set
hasn't been changed, which means the drm fd looks readable.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Dave Airlie
>>
>> A couple of years ago, any attempt to return anything else than 0 from
>> drm poll resulted in an X server error.
>> http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix mentioned
>> in the bug was actually to return 0 from drm poll, and a comment about
>> this is still present in drm.git. The above breaks drm for old X servers
>> and all drivers, which I think is against drm policy?
>
> That can't be the real problem.  The X server polls on a ton of file
> descriptors already; sockets from clients, dbus, input devices.  They
> all have poll implementations that don't return 0... I mean, otherwise
> they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
> the evdev poll implementation, for example.

Its a very real problem otherwise we wouldn't have a bug open and an
bodge around in the kernel to workaround the broken X servers.

Since this stuff is reliant on kms perhaps just use an alternate poll
for kms drivers that does it right.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Kristian Høgsberg
On Mon, Aug 17, 2009 at 7:14 PM, Dave Airlie wrote:
>>>
>>> A couple of years ago, any attempt to return anything else than 0 from
>>> drm poll resulted in an X server error.
>>> http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix mentioned
>>> in the bug was actually to return 0 from drm poll, and a comment about
>>> this is still present in drm.git. The above breaks drm for old X servers
>>> and all drivers, which I think is against drm policy?
>>
>> That can't be the real problem.  The X server polls on a ton of file
>> descriptors already; sockets from clients, dbus, input devices.  They
>> all have poll implementations that don't return 0... I mean, otherwise
>> they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
>> the evdev poll implementation, for example.
>
> Its a very real problem otherwise we wouldn't have a bug open and an
> bodge around in the kernel to workaround the broken X servers.

I didn't say that it wasn't a real problem.  I just very seriously
doubt that it's because the drm fd now has a poll implementation.
There is no difference between, say the evdev fops poll implementation
and what I've done in the patch.  Also, note that the return value
from the poll fops function is not what gets returned from the
select(2) syscall.  There is no way the poll implementation above can
make select(2) return EINVAL that wasn't already possible.

> Since this stuff is reliant on kms perhaps just use an alternate poll
> for kms drivers that does it right.

I don't know what it is you feel that the implementation in the patch
doesn't do right.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Dave Airlie
On Tue, Aug 18, 2009 at 9:54 AM, Kristian Høgsberg wrote:
> On Mon, Aug 17, 2009 at 7:14 PM, Dave Airlie wrote:

 A couple of years ago, any attempt to return anything else than 0 from
 drm poll resulted in an X server error.
 http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix mentioned
 in the bug was actually to return 0 from drm poll, and a comment about
 this is still present in drm.git. The above breaks drm for old X servers
 and all drivers, which I think is against drm policy?
>>>
>>> That can't be the real problem.  The X server polls on a ton of file
>>> descriptors already; sockets from clients, dbus, input devices.  They
>>> all have poll implementations that don't return 0... I mean, otherwise
>>> they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
>>> the evdev poll implementation, for example.
>>
>> Its a very real problem otherwise we wouldn't have a bug open and an
>> bodge around in the kernel to workaround the broken X servers.
>
> I didn't say that it wasn't a real problem.  I just very seriously
> doubt that it's because the drm fd now has a poll implementation.
> There is no difference between, say the evdev fops poll implementation
> and what I've done in the patch.  Also, note that the return value
> from the poll fops function is not what gets returned from the
> select(2) syscall.  There is no way the poll implementation above can
> make select(2) return EINVAL that wasn't already possible.
>
>> Since this stuff is reliant on kms perhaps just use an alternate poll
>> for kms drivers that does it right.
>
> I don't know what it is you feel that the implementation in the patch
> doesn't do right.

Yeah on review it seems like it should be fine, I think the issue
before was with something calling read after poll returned a non-zero
value.

if the default it still to return 0, then it shouldn't affect anyone using it,
and since old server can't load kms or dri2 driver it should be safe.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Thomas Hellström
Kristian Høgsberg wrote:
> 2009/8/17 Thomas Hellström :
>   
>> Kristian Høgsberg wrote:
>> 
>>> This patch adds a vblank synced pageflip ioctl for to the modesetting
>>> family of ioctls.  The ioctl takes a crtc and an fb and schedules a
>>> pageflip to the new fb at the next coming vertical blank event.  This
>>> feature lets userspace implement tear-free updating of the screen contents
>>> with hw-guaranteed low latency page flipping.
>>>
>>> The ioctl is asynchronous in that it returns immediately and then later
>>> notifies the client by making an event available for reading on the drm fd.
>>> This lets applications add the drm fd to their main loop and handle other
>>> tasks while waiting for the flip to happen.  The event includes the time
>>> of the flip, the frame counter and a 64 bit opaque token provided by
>>> user space in the ioctl.
>>>
>>> Based on work and suggestions from
>>>   Jesse Barnes ,
>>>   Jakob Bornecrantz ,
>>>   Chris Wilson 
>>>
>>> Signed-off-by: Kristian Høgsberg 
>>> Signed-off-by: Jesse Barnes 
>>> ---
>>>
>>> Ok, another version of this patch.  This one has fixes to work with radeon
>>> kms plus a missing list head init that would cause an oops in the case
>>> where we schedule a flip before the previous one has been queued.
>>>
>>> I'm now ready to propose this patch for the 2.6.32 merge window.
>>>
>>>
>>>   
>> Hi!
>> First a general question: There is some hardware (for example Unichromes
>> and Chrome9) that can schedule
>> page-flips in the command stream after optional vblank barriers. For
>> this kind of hardware the pageflip would be a matter of scheduling the
>> flip and fence the old and new buffers.  No need for delayed unpins and
>> explicit user-space notifications, although the workqueue could be handy
>> to avoid command processing stalls in the vblank barriers. How would
>> such a scheme fit into the framework below?
>> 
>
> By sending an event back on the file descriptor we allow users of the
> API to wait on the flip to finish in a standard poll or select
> mainloop, where it can handle input from other sources while waiting.
> If you rely on fences, the application will block when it tries to
> access the buffers protected by the fence, and is unable to handle
> input from other sources while it's blocking.
>
> Even with the vsync barrier in the command stream (which intel hw also
> supports) you do need to keep the current front buffer pinned for the
> remainder of the frame, and then you need notification when the swap
> has happened so that you can unpin the old buffer.  I'm sure Unichrome
> has a command to send an interrupt, which can be queued after the
> vsync barrier to run the workqueue and trigger this cleanup as well as
> sending the userspace notification.
>
>   
>>> Kristian
>>>
>>>  drivers/gpu/drm/drm_crtc.c  |  170 
>>> ++-
>>>  drivers/gpu/drm/drm_crtc_helper.c   |   12 ++
>>>  drivers/gpu/drm/drm_drv.c   |1 +
>>>  drivers/gpu/drm/drm_fops.c  |   68 -
>>>  drivers/gpu/drm/drm_irq.c   |   43 
>>>  drivers/gpu/drm/i915/i915_drv.c |1 +
>>>  drivers/gpu/drm/i915/intel_display.c|   24 +++--
>>>  drivers/gpu/drm/radeon/radeon_display.c |3 +-
>>>  include/drm/drm.h   |   25 +
>>>  include/drm/drmP.h  |   32 ++
>>>  include/drm/drm_crtc.h  |   27 +
>>>  include/drm/drm_crtc_helper.h   |4 +
>>>  include/drm/drm_mode.h  |   16 +++
>>>  13 files changed, 414 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
>>> index 8fab789..0906cb3 100644
>>> --- a/drivers/gpu/drm/drm_crtc.c
>>> +++ b/drivers/gpu/drm/drm_crtc.c
>>> @@ -34,6 +34,8 @@
>>>  #include "drmP.h"
>>>  #include "drm_crtc.h"
>>>
>>> +#undef set_base
>>> +
>>>  struct drm_prop_enum_list {
>>>   int type;
>>>   char *name;
>>> @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct drm_framebuffer 
>>> *fb)
>>>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>>>
>>>  /**
>>> + * drm_crtc_async_flip - do a set_base call from a work queue
>>> + * @work: work struct
>>> + *
>>> + * Called when a set_base call is queued by the page flip code.  This
>>> + * allows the flip ioctl itself to return immediately and allow userspace
>>> + * to continue working.
>>> + */
>>> +static void drm_crtc_async_flip(struct work_struct *work)
>>> +{
>>> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, 
>>> async_flip);
>>> + struct drm_device *dev = crtc->dev;
>>> + struct drm_pending_flip *pending;
>>> +
>>> + BUG_ON(crtc->pending_flip == NULL);
>>> +
>>> + mutex_lock(&dev->struct_mutex);
>>>
>>>   
>> Here the struct mutex is taken before a call to a function that doesn't
>> even have dev as an argument.
>> 
>
> It's a work queue callback, there's no way we can change th

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-17 Thread Dave Airlie
> +#undef set_base
> +
>  struct drm_prop_enum_list {
>        int type;
>        char *name;
> @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb)
>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>
>  /**
> + * drm_crtc_async_flip - do a set_base call from a work queue
> + * @work: work struct
> + *
> + * Called when a set_base call is queued by the page flip code.  This
> + * allows the flip ioctl itself to return immediately and allow userspace
> + * to continue working.
> + */
> +static void drm_crtc_async_flip(struct work_struct *work)
> +{
> +       struct drm_crtc *crtc = container_of(work, struct drm_crtc, 
> async_flip);
> +       struct drm_device *dev = crtc->dev;
> +       struct drm_pending_flip *pending;
> +
> +       BUG_ON(crtc->pending_flip == NULL);
> +
> +       mutex_lock(&dev->struct_mutex);

I'm not sure locking mode objects here will help though.

I assume drm_crtc can't go away while this is scheduled.


> +       crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL);
> +
> +       pending = crtc->pending_flip;
> +       crtc->pending_flip = NULL;
> +
> +       pending->frame = drm_vblank_count(dev, crtc->pipe);
> +       list_add_tail(&pending->link, &dev->flip_list);
> +
> +       mutex_unlock(&dev->struct_mutex);
> +}


> + */
> +int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data,
> +                            struct drm_file *file_priv)
> +{
> +       struct drm_pending_flip *pending;
> +       struct drm_mode_page_flip *flip_data = data;
> +       struct drm_mode_object *drm_obj, *fb_obj;
> +       struct drm_crtc *crtc;
> +       int ret = 0;
> +
> +       if (!(drm_core_check_feature(dev, DRIVER_MODESET)))
> +               return -ENODEV;
> +
> +       /*
> +        * Reject unknown flags so future userspace knows what we (don't)
> +        * support
> +        */
> +       if (flip_data->flags & (~DRM_MODE_PAGE_FLIP_FLAGS_MASK)) {
> +               DRM_DEBUG("bad page flip flags\n");
> +               return -EINVAL;
> +       }
> +
> +       pending = kzalloc(sizeof *pending, GFP_KERNEL);
> +       if (pending == NULL)
> +               return -ENOMEM;
> +
> +       mutex_lock(&dev->struct_mutex);

I suspect this should be locking dev->mode_config.mutex
which is what is need to protect mode objects, not struct_mutex.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Jesse Barnes wrote:
> On Mon, 17 Aug 2009 22:22:34 +0200
> Thomas Hellström  wrote:
>
>   
>> Kristian Høgsberg wrote:
>> 
>>> This patch adds a vblank synced pageflip ioctl for to the
>>> modesetting family of ioctls.  The ioctl takes a crtc and an fb and
>>> schedules a pageflip to the new fb at the next coming vertical
>>> blank event.  This feature lets userspace implement tear-free
>>> updating of the screen contents with hw-guaranteed low latency page
>>> flipping.
>>>
>>> The ioctl is asynchronous in that it returns immediately and then
>>> later notifies the client by making an event available for reading
>>> on the drm fd. This lets applications add the drm fd to their main
>>> loop and handle other tasks while waiting for the flip to happen.
>>> The event includes the time of the flip, the frame counter and a 64
>>> bit opaque token provided by user space in the ioctl.
>>>
>>> Based on work and suggestions from
>>> Jesse Barnes ,
>>> Jakob Bornecrantz ,
>>> Chris Wilson 
>>>
>>> Signed-off-by: Kristian Høgsberg 
>>> Signed-off-by: Jesse Barnes 
>>> ---
>>>
>>> Ok, another version of this patch.  This one has fixes to work with
>>> radeon kms plus a missing list head init that would cause an oops
>>> in the case where we schedule a flip before the previous one has
>>> been queued.
>>>
>>> I'm now ready to propose this patch for the 2.6.32 merge window.
>>>
>>>   
>>>   
>> Hi!
>> First a general question: There is some hardware (for example
>> Unichromes and Chrome9) that can schedule
>> page-flips in the command stream after optional vblank barriers. For 
>> this kind of hardware the pageflip would be a matter of scheduling
>> the flip and fence the old and new buffers.  No need for delayed
>> unpins and explicit user-space notifications, although the workqueue
>> could be handy to avoid command processing stalls in the vblank
>> barriers. How would such a scheme fit into the framework below?
>> 
>
> If you fence the flip, doesn't that mean you only unpin after it's
> completed?  The initial version of this patch used a command stream
> flip, but I still had to worry about pinning...
>   
With a fenced flip (and some TTM terminology) it would amount to

reserve(old_scanout);  // Reserve means "claim buffer for validation".
reserve(new_scanout);   
submit_flip();
unpin(old_scanout);
pin(new_scanout);
fence(both_scanouts);
unreserve(old_scanout);
unreserve(new_scanout);

So the pin / unpin would happen at command submission time and not at 
flip execution time,
and the fence would make sure any old scanouts remain in GPU memory 
until they are no longer referenced by the VDC.

But the real benefit of this scheme is pageflipping without vblank 
waits. I'm not sure the vblank barrier stalls are acceptable in reality, 
since they block all command execution.

> If you don't need the userspace notifications we could probably factor
> that out; do you see any issues with the flip ioctl itself?  If not, we
> can change things as needed if/when more drivers support it...
>
>   
No, I don't have anything against it at all.
I'm just trying to get a basic understanding how this would be 
implemented with
different hardware, and as you say, we could change things if needed if 
the need for it arises.

/Thomas



--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Kristian Høgsberg wrote:
> 2009/8/17 Thomas Hellström :
>   
>> Kristian Høgsberg wrote:
>> 
>>> This patch adds a vblank synced pageflip ioctl for to the modesetting
>>> family of ioctls.  The ioctl takes a crtc and an fb and schedules a
>>> pageflip to the new fb at the next coming vertical blank event.  This
>>> feature lets userspace implement tear-free updating of the screen contents
>>> with hw-guaranteed low latency page flipping.
>>>
>>> The ioctl is asynchronous in that it returns immediately and then later
>>> notifies the client by making an event available for reading on the drm fd.
>>> This lets applications add the drm fd to their main loop and handle other
>>> tasks while waiting for the flip to happen.  The event includes the time
>>> of the flip, the frame counter and a 64 bit opaque token provided by
>>> user space in the ioctl.
>>>
>>> Based on work and suggestions from
>>>   Jesse Barnes ,
>>>   Jakob Bornecrantz ,
>>>   Chris Wilson 
>>>
>>> Signed-off-by: Kristian Høgsberg 
>>> Signed-off-by: Jesse Barnes 
>>> ---
>>>
>>> Ok, another version of this patch.  This one has fixes to work with radeon
>>> kms plus a missing list head init that would cause an oops in the case
>>> where we schedule a flip before the previous one has been queued.
>>>
>>> I'm now ready to propose this patch for the 2.6.32 merge window.
>>>
>>>
>>>   
>> Hi!
>> First a general question: There is some hardware (for example Unichromes
>> and Chrome9) that can schedule
>> page-flips in the command stream after optional vblank barriers. For
>> this kind of hardware the pageflip would be a matter of scheduling the
>> flip and fence the old and new buffers.  No need for delayed unpins and
>> explicit user-space notifications, although the workqueue could be handy
>> to avoid command processing stalls in the vblank barriers. How would
>> such a scheme fit into the framework below?
>> 
>
> By sending an event back on the file descriptor we allow users of the
> API to wait on the flip to finish in a standard poll or select
> mainloop, where it can handle input from other sources while waiting.
> If you rely on fences, the application will block when it tries to
> access the buffers protected by the fence, and is unable to handle
> input from other sources while it's blocking.
>   

Yes. you're right. How is the flip_finished event typically used by user 
space?
Is it mostly for convenience or is it a must have?

> Even with the vsync barrier in the command stream (which intel hw also
> supports) you do need to keep the current front buffer pinned for the
> remainder of the frame, and then you need notification when the swap
> has happened so that you can unpin the old buffer.
That would be taken care of by fences (see previous mail), but that's 
merely a terminology thing. The big difference is that the driver would 
handle that synchronization as part of command submission.

> I'm sure Unichrome
> has a command to send an interrupt, which can be queued after the
> vsync barrier to run the workqueue and trigger this cleanup as well as
> sending the userspace notification.
>   
Actually it hasn't (stupid hardware), so a medium frequency polling scheme
needs to be used for most GPU waiting.
So the user-space notification would require some additional thoughts 
but nothing
that can't be implemented when the need for it arises.

/Thomas




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
>> I'm a little fuzzy on the details, this part of the patch was carried
>> over from Jesse's original work.  I believe set_base is supposed to be
>> called with the struct mutex held.
>>
>>
>
> We shouldn't be this sloppy about locking, and in particular we shouldn't
> protect a callback with a lock without knowing why. A lock should protect
> data, not serialize calls to functions, and it should ideally be documented
> somewhere. Why does set_base need the _caller_ to take the struct_mutex
> lock? Are there more driver callbacks that need the struct_mutex held?
>
> I mean if I implement a modesetting driver that takes care to use device
> private locks not visible to the caller to avoid polluting the struct_mutex
> with possible contention and lockdep errors as a result, I'd still find the
> generic code taking the struct_mutex for me becase it assumes that I need
> it? This can't be right and should go away.

Good point, I'll take a closer look and figure out what the locking
should be like.

>> Perhaps TTM bo read/write could use ioctls like the gem pwrite/pread
>> ioctls?  The only way we can get asynchronous notifications from the
>> drm to userspace is through readable events on the drm fd.  How were
>> you planning to implement read and write from multiple bo's through
>> one fd anyway?  Reusing the fake offset we use for mmap?  I think the
>> ioctl approach is cleaner since you can just pass in the object handle
>> and the offset into the object.  Maybe even add src and dst stride
>> arguments.
>
> It's a common misconception that the TTM bo offsets are fake offsets from
> outer space.
> The offsets aren't fake offsets but real offsets into the device address
> space, so a logical consequence of mapping a part of that address space is
> to be able to read and write to the same address space, That is read / write
> implemented using syscalls as intended.
>
> That said, rectangular bo blit ioctls is probably a good idea as well, and
> some drivers, like via, already have them.

I don't know that we need a separate char device or that read/write
was intended for accessing buffer data.  We already have lots of
different, unrelated ioclts on the fd - I see it more as a IPC
mechanism between kernel and userspace,

>>> A couple of years ago, any attempt to return anything else than 0 from
>>> drm poll resulted in an X server error.
>>> http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. The fix mentioned
>>> in the bug was actually to return 0 from drm poll, and a comment about
>>> this is still present in drm.git. The above breaks drm for old X servers
>>> and all drivers, which I think is against drm policy?
>>>
>>
>> That can't be the real problem.  The X server polls on a ton of file
>> descriptors already; sockets from clients, dbus, input devices.  They
>> all have poll implementations that don't return 0... I mean, otherwise
>> they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
>> the evdev poll implementation, for example.
>>
>
> You're probably right, but we should probably find out what went wrong and
> make sure it doesn't happen again with non-modesetting drivers + dri1 before
> pushing this.

I really don't think that's necessary.  As I wrote in my reply to
Dave, there's nothing in this patch that can cause select(2) to return
EINVAL that isn't already present in other poll fops implementations.
Like the evdev one, which we already select on - please compare that
function with the poll implementation in my patch and tell me why the
drm poll is cause for concern.  I need a better, more specific reason
why this is such a risk and why I should spend more time tracking this
stuff down.  And if select(2), for whatever reason, returns EINVAL
because of the drm_poll() fops implementation, that's a bug in the
kernel that needs to be fixed.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 3:52 AM, Thomas Hellström wrote:
> Kristian Høgsberg wrote:
>> By sending an event back on the file descriptor we allow users of the
>> API to wait on the flip to finish in a standard poll or select
>> mainloop, where it can handle input from other sources while waiting.
>> If you rely on fences, the application will block when it tries to
>> access the buffers protected by the fence, and is unable to handle
>> input from other sources while it's blocking.
>>
>
> Yes. you're right. How is the flip_finished event typically used by user
> space?
> Is it mostly for convenience or is it a must have?

In the server we use the event to wake up the client who calls
glXSwapBuffer().  We block the client (SuspendClient) if it tries to
draw before the swap is complete and wake it up again when we get the
event from drm.  See the dri2-swapbuffers branch in xorg/xserver git
for details.

>> Even with the vsync barrier in the command stream (which intel hw also
>> supports) you do need to keep the current front buffer pinned for the
>> remainder of the frame, and then you need notification when the swap
>> has happened so that you can unpin the old buffer.
>
> That would be taken care of by fences (see previous mail), but that's merely
> a terminology thing. The big difference is that the driver would handle that
> synchronization as part of command submission.

Right, but you'll still need an interrupt inside the drm driver to let
you know that the vblank barrier has been processed so you can clear
the fence.

>> I'm sure Unichrome
>> has a command to send an interrupt, which can be queued after the
>> vsync barrier to run the workqueue and trigger this cleanup as well as
>> sending the userspace notification.
>>
>
> Actually it hasn't (stupid hardware), so a medium frequency polling scheme
> needs to be used for most GPU waiting.
> So the user-space notification would require some additional thoughts but
> nothing
> that can't be implemented when the need for it arises.

Yikes. Is there a way to at least get a timestamp from the hw at the
time the swap happens?

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell
I think the bug in question was because somebody (Jon Smirl??) removed the 
empty & apparently unused poll implementation from the drm fd, only to discover 
that the X server was actually polling the fd.

If this code adds to, extends or at least doesn't remove the ability to poll 
the drm fd, it should be fine.

Keith

From: Kristian Høgsberg [...@bitplanet.net]
Sent: Tuesday, August 18, 2009 8:31 AM
To: Thomas Hellström
Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm   
event


>> That can't be the real problem.  The X server polls on a ton of file
>> descriptors already; sockets from clients, dbus, input devices.  They
>> all have poll implementations that don't return 0... I mean, otherwise
>> they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
>> the evdev poll implementation, for example.
>>
>
> You're probably right, but we should probably find out what went wrong and
> make sure it doesn't happen again with non-modesetting drivers + dri1 before
> pushing this.

I really don't think that's necessary.  As I wrote in my reply to
Dave, there's nothing in this patch that can cause select(2) to return
EINVAL that isn't already present in other poll fops implementations.
Like the evdev one, which we already select on - please compare that
function with the poll implementation in my patch and tell me why the
drm poll is cause for concern.  I need a better, more specific reason
why this is such a risk and why I should spend more time tracking this
stuff down.  And if select(2), for whatever reason, returns EINVAL
because of the drm_poll() fops implementation, that's a bug in the
kernel that needs to be fixed.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Jesse Barnes
On Tue, 18 Aug 2009 09:32:50 +0200
Thomas Hellström  wrote:

> Jesse Barnes wrote:
> > On Mon, 17 Aug 2009 22:22:34 +0200
> > Thomas Hellström  wrote:
> >
> >   
> >> Kristian Høgsberg wrote:
> >> 
> >>> This patch adds a vblank synced pageflip ioctl for to the
> >>> modesetting family of ioctls.  The ioctl takes a crtc and an fb
> >>> and schedules a pageflip to the new fb at the next coming vertical
> >>> blank event.  This feature lets userspace implement tear-free
> >>> updating of the screen contents with hw-guaranteed low latency
> >>> page flipping.
> >>>
> >>> The ioctl is asynchronous in that it returns immediately and then
> >>> later notifies the client by making an event available for reading
> >>> on the drm fd. This lets applications add the drm fd to their main
> >>> loop and handle other tasks while waiting for the flip to happen.
> >>> The event includes the time of the flip, the frame counter and a
> >>> 64 bit opaque token provided by user space in the ioctl.
> >>>
> >>> Based on work and suggestions from
> >>>   Jesse Barnes ,
> >>>   Jakob Bornecrantz ,
> >>>   Chris Wilson 
> >>>
> >>> Signed-off-by: Kristian Høgsberg 
> >>> Signed-off-by: Jesse Barnes 
> >>> ---
> >>>
> >>> Ok, another version of this patch.  This one has fixes to work
> >>> with radeon kms plus a missing list head init that would cause an
> >>> oops in the case where we schedule a flip before the previous one
> >>> has been queued.
> >>>
> >>> I'm now ready to propose this patch for the 2.6.32 merge window.
> >>>
> >>>   
> >>>   
> >> Hi!
> >> First a general question: There is some hardware (for example
> >> Unichromes and Chrome9) that can schedule
> >> page-flips in the command stream after optional vblank barriers.
> >> For this kind of hardware the pageflip would be a matter of
> >> scheduling the flip and fence the old and new buffers.  No need
> >> for delayed unpins and explicit user-space notifications, although
> >> the workqueue could be handy to avoid command processing stalls in
> >> the vblank barriers. How would such a scheme fit into the
> >> framework below? 
> >
> > If you fence the flip, doesn't that mean you only unpin after it's
> > completed?  The initial version of this patch used a command stream
> > flip, but I still had to worry about pinning...
> >   
> With a fenced flip (and some TTM terminology) it would amount to
> 
> reserve(old_scanout);  // Reserve means "claim buffer for
> validation". reserve(new_scanout);   
> submit_flip();
> unpin(old_scanout);
> pin(new_scanout);
> fence(both_scanouts);
> unreserve(old_scanout);
> unreserve(new_scanout);
> 
> So the pin / unpin would happen at command submission time and not at 
> flip execution time,
> and the fence would make sure any old scanouts remain in GPU memory 
> until they are no longer referenced by the VDC.
> 
> But the real benefit of this scheme is pageflipping without vblank 
> waits. I'm not sure the vblank barrier stalls are acceptable in
> reality, since they block all command execution.

Ah right, the fence semantics are a nice fit here.  In this code
we should only be blocking on rendering to the still active back
buffer though; any other rendering will proceed normally (at least that
was our intent).

> > If you don't need the userspace notifications we could probably
> > factor that out; do you see any issues with the flip ioctl itself?
> > If not, we can change things as needed if/when more drivers support
> > it...
> >
> >   
> No, I don't have anything against it at all.
> I'm just trying to get a basic understanding how this would be 
> implemented with
> different hardware, and as you say, we could change things if needed
> if the need for it arises.

Great, thanks a lot for looking at it.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Jesse Barnes
On Tue, 18 Aug 2009 08:08:24 +0200
Thomas Hellström  wrote:

> Kristian Høgsberg wrote:
> > 2009/8/17 Thomas Hellström :
> >   
> >> Kristian Høgsberg wrote:
> >> 
> >>> This patch adds a vblank synced pageflip ioctl for to the
> >>> modesetting family of ioctls.  The ioctl takes a crtc and an fb
> >>> and schedules a pageflip to the new fb at the next coming
> >>> vertical blank event.  This feature lets userspace implement
> >>> tear-free updating of the screen contents with hw-guaranteed low
> >>> latency page flipping.
> >>>
> >>> The ioctl is asynchronous in that it returns immediately and then
> >>> later notifies the client by making an event available for
> >>> reading on the drm fd. This lets applications add the drm fd to
> >>> their main loop and handle other tasks while waiting for the flip
> >>> to happen.  The event includes the time of the flip, the frame
> >>> counter and a 64 bit opaque token provided by user space in the
> >>> ioctl.
> >>>
> >>> Based on work and suggestions from
> >>>   Jesse Barnes ,
> >>>   Jakob Bornecrantz ,
> >>>   Chris Wilson 
> >>>
> >>> Signed-off-by: Kristian Høgsberg 
> >>> Signed-off-by: Jesse Barnes 
> >>> ---
> >>>
> >>> Ok, another version of this patch.  This one has fixes to work
> >>> with radeon kms plus a missing list head init that would cause an
> >>> oops in the case where we schedule a flip before the previous one
> >>> has been queued.
> >>>
> >>> I'm now ready to propose this patch for the 2.6.32 merge window.
> >>>
> >>>
> >>>   
> >> Hi!
> >> First a general question: There is some hardware (for example
> >> Unichromes and Chrome9) that can schedule
> >> page-flips in the command stream after optional vblank barriers.
> >> For this kind of hardware the pageflip would be a matter of
> >> scheduling the flip and fence the old and new buffers.  No need
> >> for delayed unpins and explicit user-space notifications, although
> >> the workqueue could be handy to avoid command processing stalls in
> >> the vblank barriers. How would such a scheme fit into the
> >> framework below? 
> >
> > By sending an event back on the file descriptor we allow users of
> > the API to wait on the flip to finish in a standard poll or select
> > mainloop, where it can handle input from other sources while
> > waiting. If you rely on fences, the application will block when it
> > tries to access the buffers protected by the fence, and is unable
> > to handle input from other sources while it's blocking.
> >
> > Even with the vsync barrier in the command stream (which intel hw
> > also supports) you do need to keep the current front buffer pinned
> > for the remainder of the frame, and then you need notification when
> > the swap has happened so that you can unpin the old buffer.  I'm
> > sure Unichrome has a command to send an interrupt, which can be
> > queued after the vsync barrier to run the workqueue and trigger
> > this cleanup as well as sending the userspace notification.
> >
> >   
> >>> Kristian
> >>>
> >>>  drivers/gpu/drm/drm_crtc.c  |  170
> >>> ++-
> >>> drivers/gpu/drm/drm_crtc_helper.c   |   12 ++
> >>> drivers/gpu/drm/drm_drv.c   |1 +
> >>> drivers/gpu/drm/drm_fops.c  |   68 -
> >>> drivers/gpu/drm/drm_irq.c   |   43 
> >>> drivers/gpu/drm/i915/i915_drv.c |1 +
> >>> drivers/gpu/drm/i915/intel_display.c|   24 +++--
> >>> drivers/gpu/drm/radeon/radeon_display.c |3 +-
> >>> include/drm/drm.h   |   25 +
> >>> include/drm/drmP.h  |   32 ++
> >>> include/drm/drm_crtc.h  |   27 +
> >>> include/drm/drm_crtc_helper.h   |4 +
> >>> include/drm/drm_mode.h  |   16 +++ 13 files
> >>> changed, 414 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_crtc.c
> >>> b/drivers/gpu/drm/drm_crtc.c index 8fab789..0906cb3 100644
> >>> --- a/drivers/gpu/drm/drm_crtc.c
> >>> +++ b/drivers/gpu/drm/drm_crtc.c
> >>> @@ -34,6 +34,8 @@
> >>>  #include "drmP.h"
> >>>  #include "drm_crtc.h"
> >>>
> >>> +#undef set_base
> >>> +
> >>>  struct drm_prop_enum_list {
> >>>   int type;
> >>>   char *name;
> >>> @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct
> >>> drm_framebuffer *fb) EXPORT_SYMBOL(drm_framebuffer_cleanup);
> >>>
> >>>  /**
> >>> + * drm_crtc_async_flip - do a set_base call from a work queue
> >>> + * @work: work struct
> >>> + *
> >>> + * Called when a set_base call is queued by the page flip code.
> >>> This
> >>> + * allows the flip ioctl itself to return immediately and allow
> >>> userspace
> >>> + * to continue working.
> >>> + */
> >>> +static void drm_crtc_async_flip(struct work_struct *work)
> >>> +{
> >>> + struct drm_crtc *crtc = container_of(work, struct drm_crtc,
> >>> async_flip);
> >>> + struct drm_device *dev = crtc->dev;
> >>> + struct drm_pending_flip *pending;
> >>> +
>

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Jesse Barnes
On Tue, 18 Aug 2009 16:24:14 +1000
Dave Airlie  wrote:

> > +#undef set_base
> > +
> >  struct drm_prop_enum_list {
> >        int type;
> >        char *name;
> > @@ -342,6 +344,34 @@ void drm_framebuffer_cleanup(struct
> > drm_framebuffer *fb) EXPORT_SYMBOL(drm_framebuffer_cleanup);
> >
> >  /**
> > + * drm_crtc_async_flip - do a set_base call from a work queue
> > + * @work: work struct
> > + *
> > + * Called when a set_base call is queued by the page flip code.
> >  This
> > + * allows the flip ioctl itself to return immediately and allow
> > userspace
> > + * to continue working.
> > + */
> > +static void drm_crtc_async_flip(struct work_struct *work)
> > +{
> > +       struct drm_crtc *crtc = container_of(work, struct drm_crtc,
> > async_flip);
> > +       struct drm_device *dev = crtc->dev;
> > +       struct drm_pending_flip *pending;
> > +
> > +       BUG_ON(crtc->pending_flip == NULL);
> > +
> > +       mutex_lock(&dev->struct_mutex);
> 
> I'm not sure locking mode objects here will help though.
> 
> I assume drm_crtc can't go away while this is scheduled.
> 
> 
> > +       crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL);
> > +
> > +       pending = crtc->pending_flip;
> > +       crtc->pending_flip = NULL;
> > +
> > +       pending->frame = drm_vblank_count(dev, crtc->pipe);
> > +       list_add_tail(&pending->link, &dev->flip_list);
> > +
> > +       mutex_unlock(&dev->struct_mutex);
> > +}
> 
> 
> > + */
> > +int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data,
> > +                            struct drm_file *file_priv)
> > +{
> > +       struct drm_pending_flip *pending;
> > +       struct drm_mode_page_flip *flip_data = data;
> > +       struct drm_mode_object *drm_obj, *fb_obj;
> > +       struct drm_crtc *crtc;
> > +       int ret = 0;
> > +
> > +       if (!(drm_core_check_feature(dev, DRIVER_MODESET)))
> > +               return -ENODEV;
> > +
> > +       /*
> > +        * Reject unknown flags so future userspace knows what we
> > (don't)
> > +        * support
> > +        */
> > +       if (flip_data->flags & (~DRM_MODE_PAGE_FLIP_FLAGS_MASK)) {
> > +               DRM_DEBUG("bad page flip flags\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       pending = kzalloc(sizeof *pending, GFP_KERNEL);
> > +       if (pending == NULL)
> > +               return -ENOMEM;
> > +
> > +       mutex_lock(&dev->struct_mutex);
> 
> I suspect this should be locking dev->mode_config.mutex
> which is what is need to protect mode objects, not struct_mutex.

Arg I think you're right...  Maybe this is what Thomas was concerned
about.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 2:08 AM, Thomas Hellström wrote:
> Kristian Høgsberg wrote:
>> Perhaps TTM bo read/write could use ioctls like the gem pwrite/pread
>> ioctls?  The only way we can get asynchronous notifications from the
>> drm to userspace is through readable events on the drm fd.  How were
>> you planning to implement read and write from multiple bo's through
>> one fd anyway?  Reusing the fake offset we use for mmap?  I think the
>> ioctl approach is cleaner since you can just pass in the object handle
>> and the offset into the object.  Maybe even add src and dst stride
>> arguments.
>
> It's a common misconception that the TTM bo offsets are fake offsets from
> outer space.
> The offsets aren't fake offsets but real offsets into the device address
> space, so a logical consequence of mapping a part of that address space is
> to be able to read and write to the same address space, That is read / write
> implemented using syscalls as intended.
>
> That said, rectangular bo blit ioctls is probably a good idea as well, and
> some drivers, like via, already have them.
>
> But I do think that if we claim device reads and writes for modesetting, we
> should move bo IO to another device to be consistent.

Oh, to this last paragraph I wanted to point out that the drm fd event
isn't modesetting specific.  It's a generic event mechanism that we
can use for other cases.  For example, an asynchronous
glReadPixels/DownloadFromScreen ioctls, or a "breadcrumb" ioctl that
sends an event back once the command processor has caught up.  The
event structure has a header like this:

struct drm_event {
__u32 type;
__u32 length;
};

which lets us introduce new types of events and lets an older
userspace skip unknown events.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Jesse Barnes
On Tue, 18 Aug 2009 09:17:23 -0700
Jesse Barnes  wrote:
> > >> Exactly what in the generic modesetting code is protected by the
> > >> struct mutex here?
> > >> 
> > >
> > > I'm a little fuzzy on the details, this part of the patch was
> > > carried over from Jesse's original work.  I believe set_base is
> > > supposed to be called with the struct mutex held.
> > >
> > >   
> > 
> > We shouldn't be this sloppy about locking, and in particular we 
> > shouldn't protect a callback with a lock without knowing why. A
> > lock should protect data, not serialize calls to functions, and it
> > should ideally be documented somewhere. Why does set_base need the
> > _caller_ to take the struct_mutex lock? Are there more driver
> > callbacks that need the struct_mutex held?
> 
> Yes; set_base is called from several places.  The struct_mutex protect
> all the mode setting related fields & lists, so if we're going to
> change the fb (or any other setting) we need to hold it.  Pushing the
> locking for it out to callers was necessary because in the flip ioctl,
> we already hold the lock.  And we don't want to drop it and re-acquire
> since that opens up a big window where the config could change out
> from under us.

As Dave noted though I had the locks mixed up.  We need the mode config
mutex here...  Looks like I "fixed" a non-existent deadlock by pushing
the lock out!

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Kristian Høgsberg wrote:
> On Tue, Aug 18, 2009 at 3:52 AM, Thomas Hellström wrote:
>   
>> Kristian Høgsberg wrote:
>> 
>>> By sending an event back on the file descriptor we allow users of the
>>> API to wait on the flip to finish in a standard poll or select
>>> mainloop, where it can handle input from other sources while waiting.
>>> If you rely on fences, the application will block when it tries to
>>> access the buffers protected by the fence, and is unable to handle
>>> input from other sources while it's blocking.
>>>
>>>   
>> Yes. you're right. How is the flip_finished event typically used by user
>> space?
>> Is it mostly for convenience or is it a must have?
>> 
>
> In the server we use the event to wake up the client who calls
> glXSwapBuffer().  We block the client (SuspendClient) if it tries to
> draw before the swap is complete and wake it up again when we get the
> event from drm.  See the dri2-swapbuffers branch in xorg/xserver git
> for details.
>   
So in the case where you have the command submission inserting relevant 
barriers (like hardware barriers in the Unichrome case, or fence class 
barriers in the general TTM fence-aware driver case, this notification 
is not really necessary since the flip is guaranteed to happen before 
any other rendering commands that touch the buffers in question.

So modulo the flip timing notifications, the flip_finished event can be 
sent already when the flip is scheduled. Are the flip timing 
notifications required to be accurate? If so we have a slight problem, 
because we don't unnecessarily want to put clients to sleep, and we  
don't know when the flip has happened until it has actually happened.

/Thomas




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 1:36 PM, Thomas Hellström wrote:
> Kristian Høgsberg wrote:
>>
>> On Tue, Aug 18, 2009 at 3:52 AM, Thomas Hellström
>> wrote:
>>
>>>
>>> Kristian Høgsberg wrote:
>>>

 By sending an event back on the file descriptor we allow users of the
 API to wait on the flip to finish in a standard poll or select
 mainloop, where it can handle input from other sources while waiting.
 If you rely on fences, the application will block when it tries to
 access the buffers protected by the fence, and is unable to handle
 input from other sources while it's blocking.


>>>
>>> Yes. you're right. How is the flip_finished event typically used by user
>>> space?
>>> Is it mostly for convenience or is it a must have?
>>>
>>
>> In the server we use the event to wake up the client who calls
>> glXSwapBuffer().  We block the client (SuspendClient) if it tries to
>> draw before the swap is complete and wake it up again when we get the
>> event from drm.  See the dri2-swapbuffers branch in xorg/xserver git
>> for details.
>>
>
> So in the case where you have the command submission inserting relevant
> barriers (like hardware barriers in the Unichrome case, or fence class
> barriers in the general TTM fence-aware driver case, this notification is
> not really necessary since the flip is guaranteed to happen before any other
> rendering commands that touch the buffers in question.
>
> So modulo the flip timing notifications, the flip_finished event can be sent
> already when the flip is scheduled. Are the flip timing notifications
> required to be accurate? If so we have a slight problem, because we don't
> unnecessarily want to put clients to sleep, and we  don't know when the flip
> has happened until it has actually happened.

We don't put clients to sleep until they try to render to the new
backbuffer.  For direct rendering this happens when the client calls
DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
is not yet finished at that time, we restart the X request and suspend
the client.  When the drm event fires it is read by the ddx driver,
which then calls DRI2SwapComplete() which will wake the client up
again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
the wakeup happens the same way.

Either way, for DRI2 the notification doesn't need to be accurate, but
the later we send it, the later we wake up the client, the less time
the client will have to render the next frame.  Of course, if you're
willing to block the command queue on the flip, you can send the event
as soon as the vsync barrier and swap commands have been queued and
everything should be fine.  But with the drm event, we have the option
of not blocking the hw queue on the flip for hw that lets us do
vsynced page flips outside the command queue (on intel hw it's just a
register write, and the hw latches the new address and loads it on the
next vsync).  Does your hw not have a vsync interrupt either?  We
don't fire an interrupt from the command queue, we just program the
flip and then send the event from the vsync interrup.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell
This seems wrong to me -- the client doesn't need to sleep - all it's going to 
do is build a command buffer targeting the new backbuffer.  There's no problem 
with that, it should be the preserve of the GPU scheduler (TTM or GEM) to 
ensure those commands (once submitted) don't get executed until the buffer is 
available - otherwise you're potentially pausing your application for no good 
reason.  The app should be throttled if it gets more than a couple of frames 
ahead, but there should be 100% overlap with hardware otherwise.

If you need  a solution that doesn't rely on the buffer manager, perhaps resort 
to triple-buffering, or else create a new buffer and return that in 
DRI2GetBuffers (and let the scanout one be freed once the flip is done). 

It seems like arbitrating command execution against on-hardware buffers should 
be the preserve of the kernel memory manager & other actors shouldn't be 
double-guessing that.

Keith

From: Kristian Høgsberg [...@bitplanet.net]
Sent: Tuesday, August 18, 2009 11:46 AM
To: Thomas Hellström
Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm   
event

We don't put clients to sleep until they try to render to the new
backbuffer.  For direct rendering this happens when the client calls
DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
is not yet finished at that time, we restart the X request and suspend
the client.  When the drm event fires it is read by the ddx driver,
which then calls DRI2SwapComplete() which will wake the client up
again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
the wakeup happens the same way.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Keith Whitwell wrote:
> This seems wrong to me -- the client doesn't need to sleep - all it's going 
> to do is build a command buffer targeting the new backbuffer.  There's no 
> problem with that, it should be the preserve of the GPU scheduler (TTM or 
> GEM) to ensure those commands (once submitted) don't get executed until the 
> buffer is available - otherwise you're potentially pausing your application 
> for no good reason.  The app should be throttled if it gets more than a 
> couple of frames ahead, but there should be 100% overlap with hardware 
> otherwise.
>
> If you need  a solution that doesn't rely on the buffer manager, perhaps 
> resort to triple-buffering, or else create a new buffer and return that in 
> DRI2GetBuffers (and let the scanout one be freed once the flip is done). 
>
> It seems like arbitrating command execution against on-hardware buffers 
> should be the preserve of the kernel memory manager & other actors shouldn't 
> be double-guessing that.
>   
I agree.

It would be fairly trivial to use TTMs synchronization mechanisms to put 
special fences on buffers with outstanding flips. These fences signal 
when a "software flipper" engine has programmed a flip. Any command 
submission involving these buffers while fenced will effectively put the 
rendering client to sleep. Rendering to other buffers, including a 
potential third buffer would continue undisturbed. Pinning and unpinning 
of scanouts would take place at command submission time as detailed before.
Of course, in the case of AIGLX, the X server might temporarily be put 
to sleep, which might not be fully optimal.

A "software flipper" engine would be a simple vblank irq handler that 
flips and signals the flipper fence when the previous fences of the 
buffers (if present) have signaled (to finish any outstanding rendering 
before flipping).

I guess GEM's buffer object busy lists be used to accomodate something 
similar?

/Thomas


> Keith
> 
> From: Kristian Høgsberg [...@bitplanet.net]
> Sent: Tuesday, August 18, 2009 11:46 AM
> To: Thomas Hellström
> Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
> Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm 
>   event
>
> We don't put clients to sleep until they try to render to the new
> backbuffer.  For direct rendering this happens when the client calls
> DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
> is not yet finished at that time, we restart the X request and suspend
> the client.  When the drm event fires it is read by the ddx driver,
> which then calls DRI2SwapComplete() which will wake the client up
> again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
> the wakeup happens the same way.
>   




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Thomas Hellström wrote:
> Keith Whitwell wrote:
>> This seems wrong to me -- the client doesn't need to sleep - all it's 
>> going to do is build a command buffer targeting the new backbuffer.  
>> There's no problem with that, it should be the preserve of the GPU 
>> scheduler (TTM or GEM) to ensure those commands (once submitted) 
>> don't get executed until the buffer is available - otherwise you're 
>> potentially pausing your application for no good reason.  The app 
>> should be throttled if it gets more than a couple of frames ahead, 
>> but there should be 100% overlap with hardware otherwise.
>>
>> If you need  a solution that doesn't rely on the buffer manager, 
>> perhaps resort to triple-buffering, or else create a new buffer and 
>> return that in DRI2GetBuffers (and let the scanout one be freed once 
>> the flip is done).
>> It seems like arbitrating command execution against on-hardware 
>> buffers should be the preserve of the kernel memory manager & other 
>> actors shouldn't be double-guessing that.
>>   
> I agree.
>
> It would be fairly trivial to use TTMs synchronization mechanisms to 
> put special fences on buffers with outstanding flips. These fences 
> signal when a "software flipper" engine has programmed a flip. Any 
> command submission involving these buffers while fenced will 
> effectively put the rendering client to sleep. Rendering to other 
> buffers, including a potential third buffer would continue 
> undisturbed. Pinning and unpinning of scanouts would take place at 
> command submission time as detailed before.
> Of course, in the case of AIGLX, the X server might temporarily be put 
> to sleep, which might not be fully optimal.
>
> A "software flipper" engine would be a simple vblank irq handler that 
> flips and signals the flipper fence when the previous fences of the 
> buffers (if present) have signaled (to finish any outstanding 
> rendering before flipping).
>
> I guess GEM's buffer object busy lists be used to accomodate something 
> similar?
>
> /Thomas
>
Oh, and a fully-fledged GPU scheduler might of course temporarily buffer 
the hw commands without putting the client to sleep, just as Keith's 
suggests.

In any case there's a full range of hw and scheduler capabilities that 
won't require the X server to do scheduling and put clients to sleep. 
Why not leave this to the kernel?

/Thomas

>
>> Keith
>> ________________
>> From: Kristian Høgsberg [...@bitplanet.net]
>> Sent: Tuesday, August 18, 2009 11:46 AM
>> To: Thomas Hellström
>> Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
>> Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding 
>> drm   event
>>
>> We don't put clients to sleep until they try to render to the new
>> backbuffer.  For direct rendering this happens when the client calls
>> DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
>> is not yet finished at that time, we restart the X request and suspend
>> the client.  When the drm event fires it is read by the ddx driver,
>> which then calls DRI2SwapComplete() which will wake the client up
>> again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
>> the wakeup happens the same way.
>>   
>
>




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Jesse Barnes
On Tue, 18 Aug 2009 12:10:34 -0700
Keith Whitwell  wrote:

> This seems wrong to me -- the client doesn't need to sleep - all it's
> going to do is build a command buffer targeting the new backbuffer.
> There's no problem with that, it should be the preserve of the GPU
> scheduler (TTM or GEM) to ensure those commands (once submitted)
> don't get executed until the buffer is available - otherwise you're
> potentially pausing your application for no good reason.  The app
> should be throttled if it gets more than a couple of frames ahead,
> but there should be 100% overlap with hardware otherwise.

I think this is possible with the current scheme if you return a "flip
complete" event immediately, or structure your DDX driver to indicate
completion early.  That would mean tracking busy (i.e. flip pending)
buffers in your scheduler (I think I had it hooked into GEM at one
point, but with the generic code it's not needed), but it's definitely
possible as Thomas described.  And I agree you want to keep the GPU fed
with commands (that it can actually execute!) as much as possible;
that's especially important for low end chips.

> If you need  a solution that doesn't rely on the buffer manager,
> perhaps resort to triple-buffering, or else create a new buffer and
> return that in DRI2GetBuffers (and let the scanout one be freed once
> the flip is done). 

Yeah, that's another good option; should be fairly trivial with this
codebase (the next DRI2GetBuffers call would just come back immediately
with new buffers).

> It seems like arbitrating command execution against on-hardware
> buffers should be the preserve of the kernel memory manager & other
> actors shouldn't be double-guessing that.

Right, but the main thing here is to avoid here is putting the whole
server to sleep.

Anyway, to me this discussion is more of a "future directions" one than
a blocker for this particular patchset.  AFAICT the only thing that
needs fixing with this patch is my lock confusion (struct_mutex vs
mode_config).  Or would you like something substantial changed with
these bits before they land?

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell
No, I'm fine.  I don't have the patch in front of me but it doesn't sound like 
it precludes these types of changes in the future.

Keith

From: Jesse Barnes [jbar...@virtuousgeek.org]
Sent: Tuesday, August 18, 2009 1:23 PM
To: Keith Whitwell
Cc: Kristian Høgsberg; Thomas Hellström; Kristian Høgsberg; 
dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event


Anyway, to me this discussion is more of a "future directions" one than
a blocker for this particular patchset.  AFAICT the only thing that
needs fixing with this patch is my lock confusion (struct_mutex vs
mode_config).  Or would you like something substantial changed with
these bits before they land?

--
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Dave Airlie
On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
> I think the bug in question was because somebody (Jon Smirl??) removed the 
> empty & apparently unused poll implementation from the drm fd, only to 
> discover that the X server was actually polling the fd.
>
> If this code adds to, extends or at least doesn't remove the ability to poll 
> the drm fd, it should be fine.

You have a better memory than me, but thats exactly what happened alright.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 5:03 PM, Dave Airlie wrote:
> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
>> I think the bug in question was because somebody (Jon Smirl??) removed the 
>> empty & apparently unused poll implementation from the drm fd, only to 
>> discover that the X server was actually polling the fd.
>>
>> If this code adds to, extends or at least doesn't remove the ability to poll 
>> the drm fd, it should be fine.
>
> You have a better memory than me, but thats exactly what happened alright.

And in the meantime I did dig up the story of the poll implementation
in drm.  I'm guessing the trouble started when Jon Smirl removed the
drm_poll implementation that returned 0 in
3aef3841d0c8099a97a56a285f0a21d9147405bd.  The default behaviour in
the case of an fd with missing poll implementation is to assume always
readable and writable (DEFAULT_POLLMASK).  So by removing the drm_poll
function the drm fd went from never selecting readable or writable to
always selecting readable and writable.  To "fix" this
5e8838fd115879174567c4c2db8ad25331619994 in drm.git and looks like this:

+/** No-op. */
+/* This is to deal with older X servers that believe 0 means data is
+ * available which is not the correct return for a poll function.
+ * By alternating returns both interfaces are happy. This is fixed
+ * in newer X servers.
+ */
+unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait)
+{
+   static int flip;
+   if ((flip = !flip))
+   return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM);
+   return 0;
+}
+EXPORT_SYMBOL(drm_poll);

which wins the WTF of the day.  It was the fixed in
b974e2cd683fa798970cd1bdc5e20acfb7a34a9c where it was changed to just
return 0 like before.

Again, the problem was not that select returns zero, the problem was
that select would error with EINVAL.  I don't see any error path in
do_select() in fs/select.c that could cause EINVAL based on the return
value from one of the poll fops implementations.  The only two cases I
can think of is that either the core select implementation was
different back then, for example return EINVAL if a poll fops returns
0 without registering a poll waitqueue or not having a read fops or
something.  However, that should still be valid and if it errors it
shouldn't throw EINVAL since it's not a user error.  The other theory
is that since the default behaviour in case of no poll fops
implementation is to return immediately (or in case of the wtf poll,
to return immediately on every other select() call) always with the
drm fd readable and writable, it could have triggered an overflow edge
case in the poll timeout value computation.  Aside from negative nfds,
an invalid timeout value is the only condition that can trigger EINVAL
according to the man page.

That's it, I rest my case ;)
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Kristian Høgsberg wrote:
> On Tue, Aug 18, 2009 at 5:03 PM, Dave Airlie wrote:
>   
>> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
>> 
>>> I think the bug in question was because somebody (Jon Smirl??) removed the 
>>> empty & apparently unused poll implementation from the drm fd, only to 
>>> discover that the X server was actually polling the fd.
>>>
>>> If this code adds to, extends or at least doesn't remove the ability to 
>>> poll the drm fd, it should be fine.
>>>   
>> You have a better memory than me, but thats exactly what happened alright.
>> 
>
> And in the meantime I did dig up the story of the poll implementation
> in drm.  I'm guessing the trouble started when Jon Smirl removed the
> drm_poll implementation that returned 0 in
> 3aef3841d0c8099a97a56a285f0a21d9147405bd.  The default behaviour in
> the case of an fd with missing poll implementation is to assume always
> readable and writable (DEFAULT_POLLMASK).  So by removing the drm_poll
> function the drm fd went from never selecting readable or writable to
> always selecting readable and writable.  To "fix" this
> 5e8838fd115879174567c4c2db8ad25331619994 in drm.git and looks like this:
>
> +/** No-op. */
> +/* This is to deal with older X servers that believe 0 means data is
> + * available which is not the correct return for a poll function.
> + * By alternating returns both interfaces are happy. This is fixed
> + * in newer X servers.
> + */
> +unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait)
> +{
> +   static int flip;
> +   if ((flip = !flip))
> +   return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM);
> +   return 0;
> +}
> +EXPORT_SYMBOL(drm_poll);
>
> which wins the WTF of the day.  It was the fixed in
> b974e2cd683fa798970cd1bdc5e20acfb7a34a9c where it was changed to just
> return 0 like before.
>
> Again, the problem was not that select returns zero, the problem was
> that select would error with EINVAL.  I don't see any error path in
> do_select() in fs/select.c that could cause EINVAL based on the return
> value from one of the poll fops implementations.  The only two cases I
> can think of is that either the core select implementation was
> different back then, for example return EINVAL if a poll fops returns
> 0 without registering a poll waitqueue or not having a read fops or
> something.  However, that should still be valid and if it errors it
> shouldn't throw EINVAL since it's not a user error.  The other theory
> is that since the default behaviour in case of no poll fops
> implementation is to return immediately (or in case of the wtf poll,
> to return immediately on every other select() call) always with the
> drm fd readable and writable, it could have triggered an overflow edge
> case in the poll timeout value computation.  Aside from negative nfds,
> an invalid timeout value is the only condition that can trigger EINVAL
> according to the man page.
>
> That's it, I rest my case ;)
> Kristian
>   
Actually, looking at the dri1 code, it installs a SIGIO handler to do 
user-space context switches, and it's waiting for data on the drm file 
descriptor.

Now, when the X server receives a SIGIO no matter from what client, it 
calls select to determine from which  sigio'd file descriptor. If the 
drm file descriptor indicates reading would not block, it goes ahead 
reading data to process a context switch, which will of course be bogus 
data.

So correct me if I'm wrong, but from my understanding DRI1 would break 
with the event notification mechanism.

/Thomas





--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Luc Verhaegen
On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote:
> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
> > I think the bug in question was because somebody (Jon Smirl??) 
> > removed the empty & apparently unused poll implementation from the 
> > drm fd, only to discover that the X server was actually polling the 
> > fd.
> 
> You have a better memory than me, but thats exactly what happened alright.
> 
> Dave.

You never did any of the sort, right?

Try pouncing issues created by those who are currently still active, 
preferably shortly after they made their mistakes, even if they belong 
to your current political faction.

Luc Verhaegen.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Thomas Hellström
Jesse Barnes wrote:
>
> Anyway, to me this discussion is more of a "future directions" one than
> a blocker for this particular patchset.  AFAICT the only thing that
> needs fixing with this patch is my lock confusion (struct_mutex vs
> mode_config).  Or would you like something substantial changed with
> these bits before they land?
>
>   
I think as long as there are efficient ways to short-circuit this 
implementation if the command submission mechanism does a better job, I 
think it's OK. I had a quick look in the new dri2 code and it looks like 
the driver can trick dri2 into thinking there are no outstanding swaps, 
thus avoiding the event reads? What's important is that the IOCTL 
interface gets right.

What remains to be fixed are the mutex issue and possibly the poll issue.

I also think without looking to closely that the drm_read() function 
doesn't do return values properly (see ldd3 and man 2 read for return 
values for the various blocking modes). In particular, 
wait_event_interruptible() may return -ERESTARTSYS which should never be 
returned to user-space.

It also looks like the new ioctl argument lengths are not 64 bit aligned?

/Thomas




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 6:12 PM, Thomas Hellström wrote:
> Kristian Høgsberg wrote:
>>
>> On Tue, Aug 18, 2009 at 5:03 PM, Dave Airlie wrote:
>>
>>>
>>> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
>>>

 I think the bug in question was because somebody (Jon Smirl??) removed
 the empty & apparently unused poll implementation from the drm fd, only to
 discover that the X server was actually polling the fd.

 If this code adds to, extends or at least doesn't remove the ability to
 poll the drm fd, it should be fine.

>>>
>>> You have a better memory than me, but thats exactly what happened
>>> alright.
>>>
>>
>> And in the meantime I did dig up the story of the poll implementation
>> in drm.  I'm guessing the trouble started when Jon Smirl removed the
>> drm_poll implementation that returned 0 in
>> 3aef3841d0c8099a97a56a285f0a21d9147405bd.  The default behaviour in
>> the case of an fd with missing poll implementation is to assume always
>> readable and writable (DEFAULT_POLLMASK).  So by removing the drm_poll
>> function the drm fd went from never selecting readable or writable to
>> always selecting readable and writable.  To "fix" this
>> 5e8838fd115879174567c4c2db8ad25331619994 in drm.git and looks like this:
>>
>> +/** No-op. */
>> +/* This is to deal with older X servers that believe 0 means data is
>> + * available which is not the correct return for a poll function.
>> + * By alternating returns both interfaces are happy. This is fixed
>> + * in newer X servers.
>> + */
>> +unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait)
>> +{
>> +       static int flip;
>> +       if ((flip = !flip))
>> +               return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM);
>> +       return 0;
>> +}
>> +EXPORT_SYMBOL(drm_poll);
>>
>> which wins the WTF of the day.  It was the fixed in
>> b974e2cd683fa798970cd1bdc5e20acfb7a34a9c where it was changed to just
>> return 0 like before.
>>
>> Again, the problem was not that select returns zero, the problem was
>> that select would error with EINVAL.  I don't see any error path in
>> do_select() in fs/select.c that could cause EINVAL based on the return
>> value from one of the poll fops implementations.  The only two cases I
>> can think of is that either the core select implementation was
>> different back then, for example return EINVAL if a poll fops returns
>> 0 without registering a poll waitqueue or not having a read fops or
>> something.  However, that should still be valid and if it errors it
>> shouldn't throw EINVAL since it's not a user error.  The other theory
>> is that since the default behaviour in case of no poll fops
>> implementation is to return immediately (or in case of the wtf poll,
>> to return immediately on every other select() call) always with the
>> drm fd readable and writable, it could have triggered an overflow edge
>> case in the poll timeout value computation.  Aside from negative nfds,
>> an invalid timeout value is the only condition that can trigger EINVAL
>> according to the man page.
>>
>> That's it, I rest my case ;)
>> Kristian
>>
>
> Actually, looking at the dri1 code, it installs a SIGIO handler to do
> user-space context switches, and it's waiting for data on the drm file
> descriptor.
>
> Now, when the X server receives a SIGIO no matter from what client, it calls
> select to determine from which  sigio'd file descriptor. If the drm file
> descriptor indicates reading would not block, it goes ahead reading data to
> process a context switch, which will of course be bogus data.

Yes, so if we use DRI1 and DRI2 page flipping at the same time we
might have a problem.  If we just used DRI1, the drm fd will never
select readable, if we just use DRI2, we never install the SIGIO
handler.

Looking way back in the drm.git history, I see that there actually
used to be a drm_read() and a drm_poll().  They weren't used for
reading out the contents of the buffer maps, they were for sending
events to userspace.  Only one event was ever supported, in the form
of lines like "C 5 8", meaning "switch context" where 5 and 8 are
handles of the old and new context.  So if anything, there's
precedence for using the drm fd for event delivery, but of course we
need to make sure we dont mix things up between DRI1 and DRI2.

So as long as no DDX driver tries to support DRI1 and DRI2 at the same
time I dont see a problem here.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Dave Airlie
On Wed, Aug 19, 2009 at 8:03 AM, Luc Verhaegen wrote:
> On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote:
>> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
>> > I think the bug in question was because somebody (Jon Smirl??)
>> > removed the empty & apparently unused poll implementation from the
>> > drm fd, only to discover that the X server was actually polling the
>> > fd.
>>
>> You have a better memory than me, but thats exactly what happened alright.
>>
>> Dave.
>
> You never did any of the sort, right?
>
> Try pouncing issues created by those who are currently still active,
> preferably shortly after they made their mistakes, even if they belong
> to your current political faction.
>

Thanks Luc, yet again your technical contribution to the topic at hand
is outstandingly useful, hopefully any future employers understand your
ability to involve yourself in technical discourse.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Luc Verhaegen
On Wed, Aug 19, 2009 at 09:07:55AM +1000, Dave Airlie wrote:
> On Wed, Aug 19, 2009 at 8:03 AM, Luc Verhaegen wrote:
> > On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote:
> >> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:
> >> > I think the bug in question was because somebody (Jon Smirl??)
> >> > removed the empty & apparently unused poll implementation from the
> >> > drm fd, only to discover that the X server was actually polling the
> >> > fd.
> >>
> >> You have a better memory than me, but thats exactly what happened alright.
> >>
> >> Dave.
> >
> > You never did any of the sort, right?
> >
> > Try pouncing issues created by those who are currently still active,
> > preferably shortly after they made their mistakes, even if they belong
> > to your current political faction.
> >
> 
> Thanks Luc, yet again your technical contribution to the topic at hand
> is outstandingly useful, hopefully any future employers understand your
> ability to involve yourself in technical discourse.
> 
> Dave.

And on the flip side of this, what you do is purely technical, always.

#dri-devel 00:05 <+airlied> krh: you've been smirled

Luc Verhaegen.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 7:12 PM, Luc Verhaegen wrote:
> And on the flip side of this, what you do is purely technical, always.
>
> #dri-devel 00:05 <+airlied> krh: you've been smirled

Yeah, I was wondering about that, I thought the word was smirl-rolled :D

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Luc Verhaegen
On Wed, Aug 19, 2009 at 09:22:10AM +1000, Dave Airlie wrote:
> On Wed, Aug 19, 2009 at 9:12 AM, Luc Verhaegen wrote:
> > On Wed, Aug 19, 2009 at 09:07:55AM +1000, Dave Airlie wrote:
> >> On Wed, Aug 19, 2009 at 8:03 AM, Luc Verhaegen wrote:
> >> > On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote:
> >> >> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell 
> >> >> wrote:
> >> >> > I think the bug in question was because somebody (Jon Smirl??)
> >> >> > removed the empty & apparently unused poll implementation from the
> >> >> > drm fd, only to discover that the X server was actually polling the
> >> >> > fd.
> >> >>
> >> >> You have a better memory than me, but thats exactly what happened 
> >> >> alright.
> >> >>
> >> >> Dave.
> >> >
> >> > You never did any of the sort, right?
> >> >
> >> > Try pouncing issues created by those who are currently still active,
> >> > preferably shortly after they made their mistakes, even if they belong
> >> > to your current political faction.
> >> >
> >>
> >> Thanks Luc, yet again your technical contribution to the topic at hand
> >> is outstandingly useful, hopefully any future employers understand your
> >> ability to involve yourself in technical discourse.
> >>
> >> Dave.
> >
> > And on the flip side of this, what you do is purely technical, always.
> >
> > #dri-devel 00:05 <+airlied> krh: you've been smirled
> >
> 
> Thanks Luc, yet again your technical contribution to the topic at hand
> is outstandingly useful, hopefully any future employers understand your
> ability to involve yourself in technical discourse.
> 
> Dave.

Dave,

Ask yourself whether your statement, the one i replied to, was a
technical contribution, or something else?

Luc Verhaegen.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Dave Airlie
On Wed, Aug 19, 2009 at 9:31 AM, Luc Verhaegen wrote:
> On Wed, Aug 19, 2009 at 09:22:10AM +1000, Dave Airlie wrote:
>> On Wed, Aug 19, 2009 at 9:12 AM, Luc Verhaegen wrote:
>> > On Wed, Aug 19, 2009 at 09:07:55AM +1000, Dave Airlie wrote:
>> >> On Wed, Aug 19, 2009 at 8:03 AM, Luc Verhaegen wrote:
>> >> > On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote:
>> >> >> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell 
>> >> >> wrote:
>> >> >> > I think the bug in question was because somebody (Jon Smirl??)
>> >> >> > removed the empty & apparently unused poll implementation from the
>> >> >> > drm fd, only to discover that the X server was actually polling the
>> >> >> > fd.
>> >> >>
>> >> >> You have a better memory than me, but thats exactly what happened 
>> >> >> alright.
>> >> >>
>> >> >> Dave.
>> >> >
>> >> > You never did any of the sort, right?
>> >> >
>> >> > Try pouncing issues created by those who are currently still active,
>> >> > preferably shortly after they made their mistakes, even if they belong
>> >> > to your current political faction.
>> >> >
>> >>
>> >> Thanks Luc, yet again your technical contribution to the topic at hand
>> >> is outstandingly useful, hopefully any future employers understand your
>> >> ability to involve yourself in technical discourse.
>> >>
>> >> Dave.
>> >
>> > And on the flip side of this, what you do is purely technical, always.
>> >
>> > #dri-devel 00:05 <+airlied> krh: you've been smirled
>> >
>>
>> Thanks Luc, yet again your technical contribution to the topic at hand
>> is outstandingly useful, hopefully any future employers understand your
>> ability to involve yourself in technical discourse.
>>
>> Dave.
>
> Dave,
>
> Ask yourself whether your statement, the one i replied to, was a
> technical contribution, or something else?
>

this email has a subject line, you are replying. so

Thanks Luc, yet again your technical contribution to the topic at hand
is outstandingly useful, hopefully any future employers understand your
ability to involve yourself in technical discourse.

Dave.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Daniel Stone
On Wed, Aug 19, 2009 at 01:31:06AM +0200, Luc Verhaegen wrote:
> Dave,
> 
> Ask yourself whether your statement, the one i replied to, was a
> technical contribution, or something else?

Luc,
Just ... stop.  This is embarassing.

Daniel


pgp6apJk0Gjf4.pgp
Description: PGP signature
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Kristian Høgsberg
On Tue, Aug 18, 2009 at 6:31 PM, Thomas Hellström wrote:
> Jesse Barnes wrote:
>>
>> Anyway, to me this discussion is more of a "future directions" one than
>> a blocker for this particular patchset.  AFAICT the only thing that
>> needs fixing with this patch is my lock confusion (struct_mutex vs
>> mode_config).  Or would you like something substantial changed with
>> these bits before they land?
>>
>>
>
> I think as long as there are efficient ways to short-circuit this
> implementation if the command submission mechanism does a better job, I
> think it's OK. I had a quick look in the new dri2 code and it looks like the
> driver can trick dri2 into thinking there are no outstanding swaps, thus
> avoiding the event reads? What's important is that the IOCTL interface gets
> right.

Let me follow up on this tomorrow, I've used up my out-going mail
quota for today.

> What remains to be fixed are the mutex issue and possibly the poll issue.

I'll resend the patch with the mutex issue fixed.  The poll issue is
not a problem as long as we don't enable DRI1 and DRI2 at the same
time.

> I also think without looking to closely that the drm_read() function doesn't
> do return values properly (see ldd3 and man 2 read for return values for the
> various blocking modes). In particular, wait_event_interruptible() may
> return -ERESTARTSYS which should never be returned to user-space.

Returning -ERESTARTSYS from read is the expected behaviour in case you
get a signal as far as I know and is handled by glibc.  It's used all
over the kernel.

> It also looks like the new ioctl argument lengths are not 64 bit aligned?

Not sure what you mean... that the size of the ioctl struct isn't a
multiple of 64 bits?  That's not a requirement.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-19 Thread Thomas Hellström
Kristian Høgsberg wrote:
> On Tue, Aug 18, 2009 at 6:12 PM, Thomas Hellström wrote:
>   
>> Kristian Høgsberg wrote:
>> 
>>> On Tue, Aug 18, 2009 at 5:03 PM, Dave Airlie wrote:
>>>
>>>   
 On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell wrote:

 
> I think the bug in question was because somebody (Jon Smirl??) removed
> the empty & apparently unused poll implementation from the drm fd, only to
> discover that the X server was actually polling the fd.
>
> If this code adds to, extends or at least doesn't remove the ability to
> poll the drm fd, it should be fine.
>
>   
 You have a better memory than me, but thats exactly what happened
 alright.

 
>>> And in the meantime I did dig up the story of the poll implementation
>>> in drm.  I'm guessing the trouble started when Jon Smirl removed the
>>> drm_poll implementation that returned 0 in
>>> 3aef3841d0c8099a97a56a285f0a21d9147405bd.  The default behaviour in
>>> the case of an fd with missing poll implementation is to assume always
>>> readable and writable (DEFAULT_POLLMASK).  So by removing the drm_poll
>>> function the drm fd went from never selecting readable or writable to
>>> always selecting readable and writable.  To "fix" this
>>> 5e8838fd115879174567c4c2db8ad25331619994 in drm.git and looks like this:
>>>
>>> +/** No-op. */
>>> +/* This is to deal with older X servers that believe 0 means data is
>>> + * available which is not the correct return for a poll function.
>>> + * By alternating returns both interfaces are happy. This is fixed
>>> + * in newer X servers.
>>> + */
>>> +unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait)
>>> +{
>>> +   static int flip;
>>> +   if ((flip = !flip))
>>> +   return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM);
>>> +   return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_poll);
>>>
>>> which wins the WTF of the day.  It was the fixed in
>>> b974e2cd683fa798970cd1bdc5e20acfb7a34a9c where it was changed to just
>>> return 0 like before.
>>>
>>> Again, the problem was not that select returns zero, the problem was
>>> that select would error with EINVAL.  I don't see any error path in
>>> do_select() in fs/select.c that could cause EINVAL based on the return
>>> value from one of the poll fops implementations.  The only two cases I
>>> can think of is that either the core select implementation was
>>> different back then, for example return EINVAL if a poll fops returns
>>> 0 without registering a poll waitqueue or not having a read fops or
>>> something.  However, that should still be valid and if it errors it
>>> shouldn't throw EINVAL since it's not a user error.  The other theory
>>> is that since the default behaviour in case of no poll fops
>>> implementation is to return immediately (or in case of the wtf poll,
>>> to return immediately on every other select() call) always with the
>>> drm fd readable and writable, it could have triggered an overflow edge
>>> case in the poll timeout value computation.  Aside from negative nfds,
>>> an invalid timeout value is the only condition that can trigger EINVAL
>>> according to the man page.
>>>
>>> That's it, I rest my case ;)
>>> Kristian
>>>
>>>   
>> Actually, looking at the dri1 code, it installs a SIGIO handler to do
>> user-space context switches, and it's waiting for data on the drm file
>> descriptor.
>>
>> Now, when the X server receives a SIGIO no matter from what client, it calls
>> select to determine from which  sigio'd file descriptor. If the drm file
>> descriptor indicates reading would not block, it goes ahead reading data to
>> process a context switch, which will of course be bogus data.
>> 
>
> Yes, so if we use DRI1 and DRI2 page flipping at the same time we
> might have a problem.  If we just used DRI1, the drm fd will never
> select readable, if we just use DRI2, we never install the SIGIO
> handler.
>
> Looking way back in the drm.git history, I see that there actually
> used to be a drm_read() and a drm_poll().  They weren't used for
> reading out the contents of the buffer maps, they were for sending
> events to userspace.
Yes, I've already given up on that usage :) so never mind that comment. 
I'll resort to using IOCTLS or a ttm char device.

Still I must say I'm quite doubtful of the event mechanism and how it's 
used in this context.

1) It will not work with dri1. If somebody tries to have dri1 and dri2 
enabled at the same time (XvMC, OpenGL) it will break.
2) It puts a requirement on user-space for correct functionality of a 
drm IOCTL. This means that any future pageflip ioctl user, be it EGL or 
whatever needs to implement that event parsing mechanism to be sure to 
work with any driver.
3) Drivers with a decent command submission mechanism, triple buffering 
or capable hardware will probably do their best to work around it.

Given this, shouldn't we try to avoid this? Now?

/Thomas







---

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-19 Thread Thomas Hellström
Kristian Høgsberg wrote:
>> I also think without looking to closely that the drm_read() function doesn't
>> do return values properly (see ldd3 and man 2 read for return values for the
>> various blocking modes). In particular, wait_event_interruptible() may
>> return -ERESTARTSYS which should never be returned to user-space.
>> 
>
> Returning -ERESTARTSYS from read is the expected behaviour in case you
> get a signal as far as I know and is handled by glibc.  It's used all
> over the kernel.
>   
Ah , yes. I was confusing syscalls and ioctls. Sorry.
>   
>> It also looks like the new ioctl argument lengths are not 64 bit aligned?
>> 
>
> Not sure what you mean... that the size of the ioctl struct isn't a
> multiple of 64 bits?  That's not a requirement.
>   
You'll get into 64 / 32 bit incompatibilities. DRM used to check that 
the user-space size and the kernel-space size of core ioctl arguments 
were the same, and that they had the same read / write mode. This has 
been changed so that the kernel always uses the kernel space size when 
copying. (I'm not sure why, since it was actually a good way to catch 
errors and should ideally be implemented by drivers as well). Anyway, If 
your 32-bit argument is 12 bytes and the 64-bit kernel tries to copy 16, 
you might end up with an EFAULT in the ioctl.

Of course one could change the drm implementation to always use the 
user-space size as it does for driver-specific ioctls.
Better to use all structs padded to 64-bits. Particularly if there are 
arguments which have structs embedded within other structs.

/Thomas
> cheers,
> Kristian
>   




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-24 Thread Michel Dänzer
On Tue, 2009-08-18 at 14:46 -0400, Kristian Høgsberg wrote: 
> On Tue, Aug 18, 2009 at 1:36 PM, Thomas Hellström wrote:
> > Kristian Høgsberg wrote:
> >>
> >> On Tue, Aug 18, 2009 at 3:52 AM, Thomas Hellström
> >> wrote:
> >>
> >>>
> >>> Kristian Høgsberg wrote:
> >>>
> 
>  By sending an event back on the file descriptor we allow users of the
>  API to wait on the flip to finish in a standard poll or select
>  mainloop, where it can handle input from other sources while waiting.
>  If you rely on fences, the application will block when it tries to
>  access the buffers protected by the fence, and is unable to handle
>  input from other sources while it's blocking.
> 
> 
> >>>
> >>> Yes. you're right. How is the flip_finished event typically used by user
> >>> space?
> >>> Is it mostly for convenience or is it a must have?
> >>>
> >>
> >> In the server we use the event to wake up the client who calls
> >> glXSwapBuffer().  We block the client (SuspendClient) if it tries to
> >> draw before the swap is complete and wake it up again when we get the
> >> event from drm.  See the dri2-swapbuffers branch in xorg/xserver git
> >> for details.
> >>
> >
> > So in the case where you have the command submission inserting relevant
> > barriers (like hardware barriers in the Unichrome case, or fence class
> > barriers in the general TTM fence-aware driver case, this notification is
> > not really necessary since the flip is guaranteed to happen before any other
> > rendering commands that touch the buffers in question.
> >
> > So modulo the flip timing notifications, the flip_finished event can be sent
> > already when the flip is scheduled. Are the flip timing notifications
> > required to be accurate? If so we have a slight problem, because we don't
> > unnecessarily want to put clients to sleep, and we  don't know when the flip
> > has happened until it has actually happened.
> 
> We don't put clients to sleep until they try to render to the new
> backbuffer.  For direct rendering this happens when the client calls
> DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
> is not yet finished at that time, we restart the X request and suspend
> the client.  When the drm event fires it is read by the ddx driver,
> which then calls DRI2SwapComplete() which will wake the client up
> again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
> the wakeup happens the same way.

DRI2GetBuffers() is the wrong place to do this though, it'll probably
cause a lot of (most?) apps to block on the previous SwapBuffers before
getting a chance of batching up the rendering commands for the next
frame, draining the GPU pipeline.

If it turns out that blocking on command stream submission / BO mapping
is a problem for AIGLX - IME so far with doing something like this for
DRI2CopyRegion throttling, it shouldn't be - maybe the DRM event
mechanism can be used there.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-27 Thread Thomas Hellström
Kristian Høgsberg skrev:
> ---
>
> Okay, here's the patch again with the locking fixed.  Everything else is
> the same.  As for the potential problem with polling on the drm fd with
> both dri1 and dri2 enabled, I suggest that we just disable the dri1
> sigio handler in the xserver.  We'll do this in a commit that precedes the
> merge of the page flipping code, so that no server will be able to poll
> on the drm fd in both dri1 and dri2 code.
>
> cheers,
> Kristian
>   


1) The locking problems are still not fixed. See below. I just did a 
quick look, so the comment list is not complete.
2) The 64/32-bit EFAULT problems mentioned are ignored. See below.



Even if that is fixed, I believe this patch adds a framework that 
doesn't belong in a modern day graphical subsystem for the following 
reasons, many of them already mentioned, but let me reiterate. Although 
the dri1 compatibility issue will be fixed with your suggestion.

a) It puts requirements on user-space for correct functionality of DRM 
ioctls. I'm talking about the master needing to parse the event queue.
b) It requires the master to act as a scheduler, and circumvents the DRM 
command submission mechanism through the delayed unpin callback. If this 
is to workaround any inability of GEM to serve a command submission 
while a previous command submission is blocked in the kernel, then IMHO 
that should be fixed and not worked around.
c) The implementation will mostly be worked around with capable 
schedulers and hardware.


A couple of questions:
Why are you guys so reluctant to use kernel scheduling for this instead 
of a mix of kernel / user-space scheduling?

If the plan is to eliminate DRI2GetBuffers() once per frame, what will 
then be used to block clients rendering to the old back buffer?




>  drivers/gpu/drm/drm_crtc.c  |  171 
> ++-
>  drivers/gpu/drm/drm_crtc_helper.c   |   10 ++
>  drivers/gpu/drm/drm_drv.c   |1 +
>  drivers/gpu/drm/drm_fops.c  |   68 -
>  drivers/gpu/drm/drm_irq.c   |   42 
>  drivers/gpu/drm/i915/i915_drv.c |1 +
>  drivers/gpu/drm/i915/intel_display.c|   16 +++-
>  drivers/gpu/drm/radeon/radeon_display.c |3 +-
>  include/drm/drm.h   |   25 +
>  include/drm/drmP.h  |   32 ++
>  include/drm/drm_crtc.h  |   27 +
>  include/drm/drm_crtc_helper.h   |4 +
>  include/drm/drm_mode.h  |   16 +++
>  13 files changed, 409 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 8fab789..d877c21 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -34,6 +34,8 @@
>  #include "drmP.h"
>  #include "drm_crtc.h"
>  
> +#undef set_base
> +
>  struct drm_prop_enum_list {
>   int type;
>   char *name;
> @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb)
>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>  
>  /**
> + * drm_crtc_async_flip - do a set_base call from a work queue
> + * @work: work struct
> + *
> + * Called when a set_base call is queued by the page flip code.  This
> + * allows the flip ioctl itself to return immediately and allow userspace
> + * to continue working.
> + */
> +static void drm_crtc_async_flip(struct work_struct *work)
> +{
> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip);
> + struct drm_device *dev = crtc->dev;
> + struct drm_pending_flip *pending;
> +
> + mutex_lock(&dev->mode_config.mutex);
> +
> + BUG_ON(crtc->pending_flip == NULL);
> +
> + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL);
> +
> + pending = crtc->pending_flip;
> + crtc->pending_flip = NULL;
> +
> + pending->frame = drm_vblank_count(dev, crtc->pipe);
>   

What's protecting dev->flip_list here?

> + list_add_tail(&pending->link, &dev->flip_list);
> +
> + mutex_unlock(&dev->mode_config.mutex);
> +}
> +
>
> +

>  
>  #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls )
> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> index 251bc0e..dcd9c66 100644
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct 
> file *filp,
>  
>   INIT_LIST_HEAD(&priv->lhead);
>   INIT_LIST_HEAD(&priv->fbs);
> + INIT_LIST_HEAD(&priv->event_list);
> + init_waitqueue_head(&priv->event_wait);
>  
>   if (dev->driver->driver_features & DRIVER_GEM)
>   drm_gem_open(dev, priv);
> @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp)
>  {
>   struct drm_file *file_priv = filp->private_data;
>   struct drm_device *dev = file_priv->minor->dev;
> + struct drm_pending_flip *f, *ft;
> + struct drm_pending_event *e, *et;
> +
>   int retcode = 0;
>  
>   lock_kernel();
> @@ -451,6 +456,1

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-27 Thread Kristian Høgsberg
2009/8/27 Thomas Hellström :
> Kristian Høgsberg skrev:
>> ---
>>
>> Okay, here's the patch again with the locking fixed.  Everything else is
>> the same.  As for the potential problem with polling on the drm fd with
>> both dri1 and dri2 enabled, I suggest that we just disable the dri1
>> sigio handler in the xserver.  We'll do this in a commit that precedes the
>> merge of the page flipping code, so that no server will be able to poll
>> on the drm fd in both dri1 and dri2 code.
>>
>> cheers,
>> Kristian
>>
> 
>
> 1) The locking problems are still not fixed. See below. I just did a
> quick look, so the comment list is not complete.

It's not as bad as you think.  There is one glitch where I changed
struct_mutex to mode_config.mutex and left dev->flip_list unprotected.
 The other issues you comment on should be fine, I've followed up in
detail below.

> 2) The 64/32-bit EFAULT problems mentioned are ignored. See below.

Not ignored, they were good comments, I just forgot about them in the
heat of the discussion.  I'll pad the structs.

> 
>
> Even if that is fixed, I believe this patch adds a framework that
> doesn't belong in a modern day graphical subsystem for the following
> reasons, many of them already mentioned, but let me reiterate. Although
> the dri1 compatibility issue will be fixed with your suggestion.
>
> a) It puts requirements on user-space for correct functionality of DRM
> ioctls. I'm talking about the master needing to parse the event queue.

Don't we already have a lot of requirements on userspace to use the
DRM?  It's not exactly trivial to prepare a execbuffer ioctl, or to
find a connector, crtc and mode and set that, for example.  Part of
this work is a new libdrm entry point to read an parse the events from
the fd.  See the dri2-swapbuffer branch in drm.git.

> b) It requires the master to act as a scheduler, and circumvents the DRM
> command submission mechanism through the delayed unpin callback. If this
> is to workaround any inability of GEM to serve a command submission
> while a previous command submission is blocked in the kernel, then IMHO
> that should be fixed and not worked around.

It's not about workarounds.  Your suggestion *blocks the hw* while
waiting for vsync.  My patch doesn't do that, it lets other clients
submit rendering to pixmap or backbuffer BOs that aren't involved in
the pending page flip.  If you're willing to block the command stream,
GEM can keep the buffer pinned for you until the swap happens just
fine.  Just like it does for any other command - it's not about that.
In fact, I think using the scheduler to keep buffers pinned for
scanout is conflating things.  The scheduler pulls buffers in and out
of the aperture so that they are there for the GPU when it needs to
access them.  Pinning and unpinning buffers for scanout is a different
matter.

> c) The implementation will mostly be worked around with capable
> schedulers and hardware.

This is not about the capability of the hw or the sw.  The design is deliberate.

> A couple of questions:
> Why are you guys so reluctant to use kernel scheduling for this instead
> of a mix of kernel / user-space scheduling?

I'm not opposed to kernel scheduling as well, but userspace needs this
event.  I've made the case for AIGLX in the X server already, and
direct rendering clients (for example, compositors) that want to
handle input for as long as possible and avoid blocking has the same
needs.

> If the plan is to eliminate DRI2GetBuffers() once per frame, what will
> then be used to block clients rendering to the old back buffer?

There'll be an event that's sent back after each DRI2SwapBuffer and
the clients will block on receiving that event.  We still need to send
a request to the xserver and receive confirmation that the xserver has
received it before we can render again.  DRI2GetBuffers is a request
that expects a reply and will block the client on the xserver when we
call it.  DRI2SwapBuffers is an async request, ie there's no reply and
calling it wont block necessarily the client.  We still have to wait
for the new event before we can go on rendering, but doing it this way
makes the client and server less tightly coupled.  We may end up doing
the roundtrip between client and server at a point where the client
was going to block anyway (like disk i/o or something) saving a
context switch.

>
>
>
>>  drivers/gpu/drm/drm_crtc.c              |  171 
>> ++-
>>  drivers/gpu/drm/drm_crtc_helper.c       |   10 ++
>>  drivers/gpu/drm/drm_drv.c               |    1 +
>>  drivers/gpu/drm/drm_fops.c              |   68 -
>>  drivers/gpu/drm/drm_irq.c               |   42 
>>  drivers/gpu/drm/i915/i915_drv.c         |    1 +
>>  drivers/gpu/drm/i915/intel_display.c    |   16 +++-
>>  drivers/gpu/drm/radeon/radeon_display.c |    3 +-
>>  include/drm/drm.h                       |   25 +
>>  include/drm/drmP.h                      |   32 ++
>>  include/drm/drm_crtc.

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Thomas Hellström
Kristian Høgsberg skrev:
> 2009/8/27 Thomas Hellström :
>   
>> Kristian Høgsberg skrev:
>> 
>>> ---
>>>
>>> Okay, here's the patch again with the locking fixed.  Everything else is
>>> the same.  As for the potential problem with polling on the drm fd with
>>> both dri1 and dri2 enabled, I suggest that we just disable the dri1
>>> sigio handler in the xserver.  We'll do this in a commit that precedes the
>>> merge of the page flipping code, so that no server will be able to poll
>>> on the drm fd in both dri1 and dri2 code.
>>>
>>> cheers,
>>> Kristian
>>>
>>>   
>> 
>>
>> 1) The locking problems are still not fixed. See below. I just did a
>> quick look, so the comment list is not complete.
>> 
>
> It's not as bad as you think.  There is one glitch where I changed
> struct_mutex to mode_config.mutex and left dev->flip_list unprotected.
>  The other issues you comment on should be fine, I've followed up in
> detail below.
>
>   

I actually think it is. See below.

>
>   
>> 
>>
>> Even if that is fixed, I believe this patch adds a framework that
>> doesn't belong in a modern day graphical subsystem for the following
>> reasons, many of them already mentioned, but let me reiterate. Although
>> the dri1 compatibility issue will be fixed with your suggestion.
>>
>> a) It puts requirements on user-space for correct functionality of DRM
>> ioctls. I'm talking about the master needing to parse the event queue.
>> 
>
> Don't we already have a lot of requirements on userspace to use the
> DRM?  It's not exactly trivial to prepare a execbuffer ioctl, or to
> find a connector, crtc and mode and set that, for example.  Part of
> this work is a new libdrm entry point to read an parse the events from
> the fd.  See the dri2-swapbuffer branch in drm.git.
>
>   


>> b) It requires the master to act as a scheduler, and circumvents the DRM
>> command submission mechanism through the delayed unpin callback. If this
>> is to workaround any inability of GEM to serve a command submission
>> while a previous command submission is blocked in the kernel, then IMHO
>> that should be fixed and not worked around.
>> 
>
> It's not about workarounds.  Your suggestion *blocks the hw* while
> waiting for vsync.
No it doesn't. It blocks dri clients when they try to render to old 
fronts. Other dri clients would continue rendering. It provides a 
natural migration path to triple buffering where automagically nothing 
is blocked, and also to advanced software schedulers that can buffer 
command submissions instead of blocking.

Now the concern about GEM was that if the kernel takes a global mutex 
_before_ blocking a client and doesn't release that mutex when the 
client is blocked, all rendering will naturally be blocked as a consequence.

What my suggestion *does* is to block the X server if AIGLX tries to 
render to the old front, and the command scheduler is simple. Now I'm 
not completely sure how serious that is, given that AIGLX will, as 
stated before, frequently block the X server anyway since it's not 
running in a separate thread.

>> c) The implementation will mostly be worked around with capable
>> schedulers and hardware.
>> 
>
> This is not about the capability of the hw or the sw.  The design is 
> deliberate.
>   

What I mean is that if I don't want the kernel code to do delayed 
unpins, because my cs already handles that, and I don't want the X 
server to block clients because my cs or hardware will handle that, I 
would do my very best to work around this code.

>> A couple of questions:
>> Why are you guys so reluctant to use kernel scheduling for this instead
>> of a mix of kernel / user-space scheduling?
>> 
>
> I'm not opposed to kernel scheduling as well, but userspace needs this
> event.  I've made the case for AIGLX in the X server already, and
> direct rendering clients (for example, compositors) that want to
> handle input for as long as possible and avoid blocking has the same
> needs.
>   

I've still failed to understand this. Let's _assume_ for a while that 
the kernel handles scheduling perfectly in a non-blocking fashion, or 
let's assume we have triple-buffering, or let's assume we have a 
multi-FIFO card that can do pageflipping and vblank barriers on all FIFOs.

Why then exactly are events needed? and why are we required to track the 
progress of the command fifo with events like jbarnes suggests, and 
finally why is this mechanism not needed in the non-pageflipping case? 
If you can give a typical use-case that would probably help a lot and 
avoid confusion.

>   
>> If the plan is to eliminate DRI2GetBuffers() once per frame, what will
>> then be used to block clients rendering to the old back buffer?
>> 
>
> There'll be an event that's sent back after each DRI2SwapBuffer and
> the clients will block on receiving that event.  We still need to send
> a request to the xserver and receive confirmation that the xserver has
> received it before we can render again. 

The

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Michel Dänzer
On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: 
> 2009/8/27 Thomas Hellström :
> >
> > b) It requires the master to act as a scheduler, and circumvents the DRM
> > command submission mechanism through the delayed unpin callback. If this
> > is to workaround any inability of GEM to serve a command submission
> > while a previous command submission is blocked in the kernel, then IMHO
> > that should be fixed and not worked around.
> 
> It's not about workarounds.  Your suggestion *blocks the hw* while
> waiting for vsync.  My patch doesn't do that, it lets other clients
> submit rendering to pixmap or backbuffer BOs that aren't involved in
> the pending page flip.  If you're willing to block the command stream,
> GEM can keep the buffer pinned for you until the swap happens just
> fine.  Just like it does for any other command - it's not about that.
> In fact, I think using the scheduler to keep buffers pinned for
> scanout is conflating things.  The scheduler pulls buffers in and out
> of the aperture so that they are there for the GPU when it needs to
> access them.  Pinning and unpinning buffers for scanout is a different
> matter.

How is scanout not GPU access? :) I'd look at it the other way around:
pinning a scanout buffer is a workaround which is only needed while
there are no outstanding fences on it.


> > If the plan is to eliminate DRI2GetBuffers() once per frame, what will
> > then be used to block clients rendering to the old back buffer?
> 
> There'll be an event that's sent back after each DRI2SwapBuffer and
> the clients will block on receiving that event.

Are you referring to a DRI2 protocol event or a DRM event?

> We still need to send a request to the xserver and receive confirmation
> that the xserver has received it before we can render again.  DRI2GetBuffers 
> is a request
> that expects a reply and will block the client on the xserver when we
> call it.  DRI2SwapBuffers is an async request, ie there's no reply and
> calling it wont block necessarily the client.  We still have to wait
> for the new event before we can go on rendering, but doing it this way
> makes the client and server less tightly coupled.  We may end up doing
> the roundtrip between client and server at a point where the client
> was going to block anyway (like disk i/o or something) saving a
> context switch.

It's still a bit fuzzy how this is all supposed to work. To me it seems
like (ab)using DRI2GetBuffers for this will only allow clients to avoid
synchronous rendering with triple buffering, which would be a shame. If
you disagree, can you explain why that isn't the case?


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Thomas Hellström
Thomas Hellström skrev:
>
   
 
>>> What's protecting file_priv->event_list here?
>>> 
>>>   
>> You can test for list emptiness without taking the lock.
>>   
>> 
>
> Are you suggesting accessing a member of a mutex protected struct 
> without taking the mutex?
> Do you have a pointer to where what you say above is documented?
>
> That would be highly architecture dependent and in this particular case 
> require the processor being capable of atomic pointer reads and writes, 
> no processor r/w reordering and the compiler assuming the list head 
> being declared volatile.
>
> Even if that's the case let's assume list_empty() returns true, and then 
>   

That should of course be "... returns false, ..."
 
> another thread sneaks in and empties the queue before you take the 
> mutex. The next thing you do is to access and try to remove "event", 
> which doesn't exist because the list is empty. If you're lucky you'll 
> see an oops. If not, you end up with strange memory corruption.
>
>   
>>   
>> 
 + while (!list_empty(&file_priv->event_list)) {
 + mutex_lock(&dev->struct_mutex);
 + event = list_first_entry(&file_priv->event_list,
 +  struct drm_pending_event, link);
 

/Thomas


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Thomas Hellström
Michel Dänzer skrev:
> On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: 
>   
>> 2009/8/27 Thomas Hellström :
>> 
>>> b) It requires the master to act as a scheduler, and circumvents the DRM
>>> command submission mechanism through the delayed unpin callback. If this
>>> is to workaround any inability of GEM to serve a command submission
>>> while a previous command submission is blocked in the kernel, then IMHO
>>> that should be fixed and not worked around.
>>>   
>> It's not about workarounds.  Your suggestion *blocks the hw* while
>> waiting for vsync.  My patch doesn't do that, it lets other clients
>> submit rendering to pixmap or backbuffer BOs that aren't involved in
>> the pending page flip.  If you're willing to block the command stream,
>> GEM can keep the buffer pinned for you until the swap happens just
>> fine.  Just like it does for any other command - it's not about that.
>> In fact, I think using the scheduler to keep buffers pinned for
>> scanout is conflating things.  The scheduler pulls buffers in and out
>> of the aperture so that they are there for the GPU when it needs to
>> access them.  Pinning and unpinning buffers for scanout is a different
>> matter.
>> 
>
> How is scanout not GPU access? :) I'd look at it the other way around:
> pinning a scanout buffer is a workaround which is only needed while
> there are no outstanding fences on it.
>
>   
Hmm, I missed this part. Yeah using the scheduler synchronization to 
protect buffers which have pending commands on them in the fifo is quite 
natural and also falls out quite neatly as detailed before. The 
workaround needed for page-flipping is that the buffers need to be 
pinned / unpinned just after validation.

/Thomas


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Kristian Høgsberg
On Fri, Aug 28, 2009 at 5:19 AM, Michel Dänzer wrote:
> On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote:
>> 2009/8/27 Thomas Hellström :
>> >
>> > b) It requires the master to act as a scheduler, and circumvents the DRM
>> > command submission mechanism through the delayed unpin callback. If this
>> > is to workaround any inability of GEM to serve a command submission
>> > while a previous command submission is blocked in the kernel, then IMHO
>> > that should be fixed and not worked around.
>>
>> It's not about workarounds.  Your suggestion *blocks the hw* while
>> waiting for vsync.  My patch doesn't do that, it lets other clients
>> submit rendering to pixmap or backbuffer BOs that aren't involved in
>> the pending page flip.  If you're willing to block the command stream,
>> GEM can keep the buffer pinned for you until the swap happens just
>> fine.  Just like it does for any other command - it's not about that.
>> In fact, I think using the scheduler to keep buffers pinned for
>> scanout is conflating things.  The scheduler pulls buffers in and out
>> of the aperture so that they are there for the GPU when it needs to
>> access them.  Pinning and unpinning buffers for scanout is a different
>> matter.
>
> How is scanout not GPU access? :) I'd look at it the other way around:
> pinning a scanout buffer is a workaround which is only needed while
> there are no outstanding fences on it.

I guess I don't see the scanout engine as part of the GPU.  If you
think pinning a scanout buffer is a workaround, how do you propose we
keep it in the aperture?  Using a special fence class that never
expire?  Isn't that what pinning is?

>> > If the plan is to eliminate DRI2GetBuffers() once per frame, what will
>> > then be used to block clients rendering to the old back buffer?
>>
>> There'll be an event that's sent back after each DRI2SwapBuffer and
>> the clients will block on receiving that event.
>
> Are you referring to a DRI2 protocol event or a DRM event?

Sorry, a DRI2 event.

>> We still need to send a request to the xserver and receive confirmation
>> that the xserver has received it before we can render again.  DRI2GetBuffers 
>> is a request
>> that expects a reply and will block the client on the xserver when we
>> call it.  DRI2SwapBuffers is an async request, ie there's no reply and
>> calling it wont block necessarily the client.  We still have to wait
>> for the new event before we can go on rendering, but doing it this way
>> makes the client and server less tightly coupled.  We may end up doing
>> the roundtrip between client and server at a point where the client
>> was going to block anyway (like disk i/o or something) saving a
>> context switch.
>
> It's still a bit fuzzy how this is all supposed to work. To me it seems
> like (ab)using DRI2GetBuffers for this will only allow clients to avoid
> synchronous rendering with triple buffering, which would be a shame. If
> you disagree, can you explain why that isn't the case?

The way it works is that glXSwapBuffers() will send the DRI2SwapBuffer
request and return immediately.  There is not response to the
DRI2SwapBuffer request, so it won't block on the X server there.  The
application is now free to do other processing, loading a new frame of
disk and decode it, for example.  However, the glXSwapBuffers() call
also invalidates the DRI drivers buffers, so once the application
wants to render again, it has to call DRI2GetBuffers, which is a round
trip, so here we'll block on the X server.  This ensures that the X
server has seen the DRI2SwapBuffer request before we start rendering
again, and provides the application with the buffers it needs for the
next frame.  This last bit is important, as the server may or may not
be able to do a page flip, so the client can't generally predict what
the buffer to use for the next frame will be.

What I'm proposing over this scheme with the DRI2 event is that
instead of requiring the client to call DRI2GetBuffers, we can send
out a DRI2 event once the flip happens.  This breaks the required
roundtrip into a one-way request and an event.  Worst case, this still
requires the clent to block on receiving the event.  However, it also
allows the X server to process the DRI2SwapBuffer request and send the
event to the client before the client gets around to needing, in which
case the round trip disappears.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Michel Dänzer
On Fri, 2009-08-28 at 08:57 -0400, Kristian Høgsberg wrote: 
> On Fri, Aug 28, 2009 at 5:19 AM, Michel Dänzer wrote:
> > On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote:
> >> 2009/8/27 Thomas Hellström :
> >> >
> >> > b) It requires the master to act as a scheduler, and circumvents the DRM
> >> > command submission mechanism through the delayed unpin callback. If this
> >> > is to workaround any inability of GEM to serve a command submission
> >> > while a previous command submission is blocked in the kernel, then IMHO
> >> > that should be fixed and not worked around.
> >>
> >> It's not about workarounds.  Your suggestion *blocks the hw* while
> >> waiting for vsync.  My patch doesn't do that, it lets other clients
> >> submit rendering to pixmap or backbuffer BOs that aren't involved in
> >> the pending page flip.  If you're willing to block the command stream,
> >> GEM can keep the buffer pinned for you until the swap happens just
> >> fine.  Just like it does for any other command - it's not about that.
> >> In fact, I think using the scheduler to keep buffers pinned for
> >> scanout is conflating things.  The scheduler pulls buffers in and out
> >> of the aperture so that they are there for the GPU when it needs to
> >> access them.  Pinning and unpinning buffers for scanout is a different
> >> matter.
> >
> > How is scanout not GPU access? :) I'd look at it the other way around:
> > pinning a scanout buffer is a workaround which is only needed while
> > there are no outstanding fences on it.
> 
> I guess I don't see the scanout engine as part of the GPU.  If you
> think pinning a scanout buffer is a workaround, how do you propose we
> keep it in the aperture?  Using a special fence class that never
> expire?  Isn't that what pinning is?

That's kind of the point I guess - which is the real thing and which is
a workaround depends on the point of view.


> >> > If the plan is to eliminate DRI2GetBuffers() once per frame, what will
> >> > then be used to block clients rendering to the old back buffer?
> >>
> >> There'll be an event that's sent back after each DRI2SwapBuffer and
> >> the clients will block on receiving that event.
> >
> > Are you referring to a DRI2 protocol event or a DRM event?
> 
> Sorry, a DRI2 event.
> 
> >> We still need to send a request to the xserver and receive confirmation
> >> that the xserver has received it before we can render again.  
> >> DRI2GetBuffers is a request
> >> that expects a reply and will block the client on the xserver when we
> >> call it.  DRI2SwapBuffers is an async request, ie there's no reply and
> >> calling it wont block necessarily the client.  We still have to wait
> >> for the new event before we can go on rendering, but doing it this way
> >> makes the client and server less tightly coupled.  We may end up doing
> >> the roundtrip between client and server at a point where the client
> >> was going to block anyway (like disk i/o or something) saving a
> >> context switch.
> >
> > It's still a bit fuzzy how this is all supposed to work. To me it seems
> > like (ab)using DRI2GetBuffers for this will only allow clients to avoid
> > synchronous rendering with triple buffering, which would be a shame. If
> > you disagree, can you explain why that isn't the case?
> 
> The way it works is that glXSwapBuffers() will send the DRI2SwapBuffer
> request and return immediately.  There is not response to the
> DRI2SwapBuffer request, so it won't block on the X server there.  The
> application is now free to do other processing, loading a new frame of
> disk and decode it, for example.  However, the glXSwapBuffers() call
> also invalidates the DRI drivers buffers, so once the application
> wants to render again, it has to call DRI2GetBuffers, which is a round
> trip, so here we'll block on the X server.  This ensures that the X
> server has seen the DRI2SwapBuffer request before we start rendering
> again, and provides the application with the buffers it needs for the
> next frame.  This last bit is important, as the server may or may not
> be able to do a page flip, so the client can't generally predict what
> the buffer to use for the next frame will be.
> 
> What I'm proposing over this scheme with the DRI2 event is that
> instead of requiring the client to call DRI2GetBuffers, we can send
> out a DRI2 event once the flip happens.  This breaks the required
> roundtrip into a one-way request and an event.  Worst case, this still
> requires the clent to block on receiving the event.  However, it also
> allows the X server to process the DRI2SwapBuffer request and send the
> event to the client before the client gets around to needing, in which
> case the round trip disappears.

Thanks. Unfortunately, I wasn't missing anything AFAICT. The problem is
that this doesn't allow the client to even start generating hardware
commands for the next frame before the flip occurs, so with a single
client the GPU pipeline is pretty much guaranteed to drain, without

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Jesse Barnes
On Fri, 28 Aug 2009 15:14:47 +0200
Michel Dänzer  wrote:
> One possible solution for this would be for the DRI2SwapBuffer request
> to return the suggested set of buffers for the next frame, and then
> for the client to block (which could involve a DRM event or whatever
> if just plain blocking turns out to be a real problem in practice -
> doubtful IME) on command submission if those buffers still have
> pending flips.

Yeah, that's what the initial implementation did (though with more
blocking than was necessary iirc, but that was just an implementation
detail).  I don't think it precludes things like triple buffering
either; the server is still free to allocate new buffers and return
them to the client at swapbuffers time.

I think we're all agreed that we want to minimize blocking of any kind;
we should keep all the clients busy submitting commands as much as
possible.  Any kind of GPU drain or wait on event is a killer.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Jesse Barnes
On Fri, 28 Aug 2009 10:08:10 +0200
Thomas Hellström  wrote:

> Why then exactly are events needed? and why are we required to track
> the progress of the command fifo with events like jbarnes suggests,
> and finally why is this mechanism not needed in the non-pageflipping
> case? If you can give a typical use-case that would probably help a
> lot and avoid confusion.

After you dropped off yesterday we chatted some more and I understand
how this can be avoided now.  If your driver has entry points to track
completion of various events, you should be able to get an accurate
swap buffer count (the main thing I was concerned about here) w/o the
event mechanism.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Kristian Høgsberg
This mail is getting out of control... too many sub-threads going
on...  maybe we shold break it out and talk about events vs kernel
scheduling and wait with the patch review until we've figured
something out.

2009/8/28 Thomas Hellström :
> Kristian Høgsberg skrev:
>>
>> 2009/8/27 Thomas Hellström :
>>
>>>
>>> Kristian Høgsberg skrev:
>>>

 ---

 Okay, here's the patch again with the locking fixed.  Everything else is
 the same.  As for the potential problem with polling on the drm fd with
 both dri1 and dri2 enabled, I suggest that we just disable the dri1
 sigio handler in the xserver.  We'll do this in a commit that precedes
 the
 merge of the page flipping code, so that no server will be able to poll
 on the drm fd in both dri1 and dri2 code.

 cheers,
 Kristian


>>>
>>> 
>>>
>>> 1) The locking problems are still not fixed. See below. I just did a
>>> quick look, so the comment list is not complete.
>>>
>>
>> It's not as bad as you think.  There is one glitch where I changed
>> struct_mutex to mode_config.mutex and left dev->flip_list unprotected.
>>  The other issues you comment on should be fine, I've followed up in
>> detail below.
>>
>>
>
> I actually think it is. See below.
>
>>
>>
>>>
>>> 
>>>
>>> Even if that is fixed, I believe this patch adds a framework that
>>> doesn't belong in a modern day graphical subsystem for the following
>>> reasons, many of them already mentioned, but let me reiterate. Although
>>> the dri1 compatibility issue will be fixed with your suggestion.
>>>
>>> a) It puts requirements on user-space for correct functionality of DRM
>>> ioctls. I'm talking about the master needing to parse the event queue.
>>>
>>
>> Don't we already have a lot of requirements on userspace to use the
>> DRM?  It's not exactly trivial to prepare a execbuffer ioctl, or to
>> find a connector, crtc and mode and set that, for example.  Part of
>> this work is a new libdrm entry point to read an parse the events from
>> the fd.  See the dri2-swapbuffer branch in drm.git.
>>
>>
>
>
>>> b) It requires the master to act as a scheduler, and circumvents the DRM
>>> command submission mechanism through the delayed unpin callback. If this
>>> is to workaround any inability of GEM to serve a command submission
>>> while a previous command submission is blocked in the kernel, then IMHO
>>> that should be fixed and not worked around.
>>>
>>
>> It's not about workarounds.  Your suggestion *blocks the hw* while
>> waiting for vsync.
>
> No it doesn't. It blocks dri clients when they try to render to old fronts.
> Other dri clients would continue rendering. It provides a natural migration
> path to triple buffering where automagically nothing is blocked, and also to
> advanced software schedulers that can buffer command submissions instead of
> blocking.

You advocated blocking the command queue using a wait-for-vblank
command at some point.

> Now the concern about GEM was that if the kernel takes a global mutex
> _before_ blocking a client and doesn't release that mutex when the client is
> blocked, all rendering will naturally be blocked as a consequence.

That's not how the patch works.  We hold no locks while the client is blocked.

> What my suggestion *does* is to block the X server if AIGLX tries to render
> to the old front, and the command scheduler is simple. Now I'm not
> completely sure how serious that is, given that AIGLX will, as stated
> before, frequently block the X server anyway since it's not running in a
> separate thread.

Are you talking about an AIGLX client submitting an expensive shader
locking up the X server for a long time?  I'm not aware of any way
AIGLX can block the server for up to 20ms at a time right now, but
that is what it'll do if it blocks on vsync.

>>> c) The implementation will mostly be worked around with capable
>>> schedulers and hardware.
>>>
>>
>> This is not about the capability of the hw or the sw.  The design is
>> deliberate.
>>
>
> What I mean is that if I don't want the kernel code to do delayed unpins,
> because my cs already handles that, and I don't want the X server to block
> clients because my cs or hardware will handle that, I would do my very best
> to work around this code.

I don't see the difference between a delayed unpin and a fence.  We're
doing the same thing, why is it ok if we call it a fence?  You were
saying that we should make the vsync interrupt look like a sw command
queue and use that to fence the scan-out buffer right?  Is that really
better?  I feel a bit like we're getting dragged back into the GEM vs
TTM discussion here.  I have no stake in that battle, I'm just trying
to work with what's there. I don't think there's a fundamental problem
nor benefit with either, and what you can do with a TTM fence you can
do by waiting on the right GEM BO, as far as I understand.

>>> A couple of questions:
>>> Why are you guys so reluctant to use kernel scheduling for this instead
>>>

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-28 Thread Thomas Hellström
Kristian Høgsberg skrev:
> This mail is getting out of control... too many sub-threads going
> on...  maybe we shold break it out and talk about events vs kernel
> scheduling and wait with the patch review until we've figured
> something out.
>   
Sure.


 b) It requires the master to act as a scheduler, and circumvents the DRM
 command submission mechanism through the delayed unpin callback. If this
 is to workaround any inability of GEM to serve a command submission
 while a previous command submission is blocked in the kernel, then IMHO
 that should be fixed and not worked around.

 
>>> It's not about workarounds.  Your suggestion *blocks the hw* while
>>> waiting for vsync.
>>>   
>> No it doesn't. It blocks dri clients when they try to render to old fronts.
>> Other dri clients would continue rendering. It provides a natural migration
>> path to triple buffering where automagically nothing is blocked, and also to
>> advanced software schedulers that can buffer command submissions instead of
>> blocking.
>> 
>
> You advocated blocking the command queue using a wait-for-vblank
> command at some point.
>   

I mentioned an example with how unichromes could handle pageflipping 
with vblank barriers yes, but also experessed concerns about the stall. 
That's not what I'm proposing. I sent a mail titled "Pageflipping 
scheduling" to avoid such misunderstandings a day or so ago. That  email 
details what I'm suggesting.

>   
>> Now the concern about GEM was that if the kernel takes a global mutex
>> _before_ blocking a client and doesn't release that mutex when the client is
>> blocked, all rendering will naturally be blocked as a consequence.
>> 
>
> That's not how the patch works.  We hold no locks while the client is blocked.
>   
I was not referring to your patch but to _kernel_ blocking with GEM. My 
concern was that GEM might block all clients instead of just the client 
trying to render to the old front. My initial question was whether the 
DRI2 blocking implementation was a workaround for that.

>> What my suggestion *does* is to block the X server if AIGLX tries to render
>> to the old front, and the command scheduler is simple. Now I'm not
>> completely sure how serious that is, given that AIGLX will, as stated
>> before, frequently block the X server anyway since it's not running in a
>> separate thread.
>> 
>
> Are you talking about an AIGLX client submitting an expensive shader
> locking up the X server for a long time?  I'm not aware of any way
> AIGLX can block the server for up to 20ms at a time right now, but
> that is what it'll do if it blocks on vsync.
>
>   
No, I was mostly referring to drivers doing swapbuffer throttling. A 
gpu-bound app at a low frame rate will certainly block for longer than that.
Anyway, regardless of this AIGLX is special in that there actually is a 
benefit of a drm event as an optimization hint if AIGLX

 c) The implementation will mostly be worked around with capable
 schedulers and hardware.

 
>>> This is not about the capability of the hw or the sw.  The design is
>>> deliberate.
>>>
>>>   
>> What I mean is that if I don't want the kernel code to do delayed unpins,
>> because my cs already handles that, and I don't want the X server to block
>> clients because my cs or hardware will handle that, I would do my very best
>> to work around this code.
>> 
>
> I don't see the difference between a delayed unpin and a fence.  We're
> doing the same thing, why is it ok if we call it a fence?  You were
> saying that we should make the vsync interrupt look like a sw command
> queue and use that to fence the scan-out buffer right?  Is that really
> better?  I feel a bit like we're getting dragged back into the GEM vs
> TTM discussion here.  I have no stake in that battle, I'm just trying
> to work with what's there. I don't think there's a fundamental problem
> nor benefit with either, and what you can do with a TTM fence you can
> do by waiting on the right GEM BO, as far as I understand.
>   

Please, just forget about fences. They are one of the various ways to 
implement selective kernel blocking, and a good example. There are 
hopefully ways GEM can do this nicely as well. This has nothing to do 
with GEM vs TTM.

What I'm saying is If people have a way to handle this in hardware or in 
the kernel more or less for free. Then people will try to work around 
the code you are proposing. There is hardware that can send pageflips 
down the command FIFO and if there are multiple FIFOS that can accept 
pageflip commands makes software blocking completely unnecessary.

 A couple of questions:
 Why are you guys so reluctant to use kernel scheduling for this instead
 of a mix of kernel / user-space scheduling?

 
>>> I'm not opposed to kernel scheduling as well, but userspace needs this
>>> event.  I've made the case for AIGLX in the X server already, and
>>> di

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-30 Thread Thomas Hellström
Thomas Hellström skrev:
>
>> I described this in more detail and hopefully more coherently in my
>> email to Michel.  If that's still not clear, follow up there.
>>
>>   
>> 
> I've read the mail and understand the proposal, thanks.
>
> /Thomas
>
>
>   
So, I've been doing some thinking over the weekend and here's a 
constructive proposal that hopefully can be the base of an agreement:

1) We require the semantics of the pageflip ioctl to be such that it is 
safe to schedule render commands that reference the buffers involved 
immediately after the ioctl return, or in other words, the pageflip has 
entered the graphics pipeline and any render operations to the 
referenced  buffers will be guaranteed to be executed after the 
pageflip. How this is implemented is up to the driver, and thus the code 
will need driver-specific hooks. There is a multitude of ways to 
implement this, ranging from full hardware support to a real naive and 
stupid software implementation that blocks all command submission while 
there are pending flips.
A simple sufficient implementation is to scan the buffers to be 
validated at cs time to see if there are pending pageflips on any of 
them. In that case, release all held cs locks and block. When the 
pageflip happens, continue.
But again, that will be up to the driver.

2) We rip the blocking code out of the DRI2 protocol, since there is no 
longer any need for it.

3) The DRM event mechanism stays as proposed, The ioctl caller needs to 
explicitly request an event to get one. Events will initially be used by 
clients to wake _themselves_ from voluntary select() blocking.

The motivation for 1-2  is as follows:
a) The solution fits all kinds of smart hardware and cs mechanism, 
whereas the DRI2 blocking solution assumes a simple hardware and a naive 
kernel cs mechanism. One can argue that smart kernel schedulers or 
advanced hardware can work around the DRI2 blocking solution by sending 
out the event immediately, but there we are again working around the 
DRI2 blocking solution. We shouldn't need to do that.

b) There's no requirement on masters to do scheduling with this 
proposal. Otherwise we'd have to live with that forever and implement it 
in all masters utilizing the pageflip ioctl.

c) latency - performance. Consider the sequence of events between the 
vsync and the first set of rendering commands on the next frame being 
submit to the hardware:

c1) DRI2 blocking: (Please correct me if I misunderstood something here)
* vsync irq
* schedule a wq thread that adds an event and wakes the X server.
* X server issues a syscall to read the drm event
* X server returns an event to the client with the new buffers. (write 
to socket ?)
* Client reads the event (read from socket ?)
* Client prepares the first command buffer (this is usually quite time 
consuming and in effect not only synchronizes GPU and command buffer 
building, but in effect serializes them).
* Client builds and issues a cs ioctl.
* Kernel submits commands to hardware.

c2) Kernel scheduling (this proposal).
* vsync irq
* schedule the client thread that immediately submits commands to hardware.

IMHO, c1 is far from optimal and should not be considered. One can argue 
once again, that this doesn't add much latency in practice, but we can't 
keep arguing like that for every such item we add per frame, and in this 
case the serialized command buffer building *does* add too much latency. 
We should seek and implement the optimal solution if it doesn't imply 
too much work or have side-effects.

Some added functionality we should also perhaps consider adding to the 
ioctl interface:

1) A flag whether to vsync or not. Ideally a driconf option so that 
should perhaps be communicated as part of dri2 swapbuffers as well. I 
guess on Intel hardware you can only flip on vsync(?) but on some other 
hardware you can just send a new scanout startaddress down the FIFO. 
You'll definitely see tearing, though.

2) Driver private data. An example: Drivers with multiple hardware FIFOs 
that can do pageflipping and barriers on each FIFO might want to 
indicate to the kernel on which FIFO or HW context to schedule the 
pageflip. I guess this private data might also need to be passed along 
with dri2 swapbuffers.

Thanks,
/Thomas


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-31 Thread Kristian Høgsberg
Thomas,

What you describe sounds reasonable and I'll look into updating the
patch.  I'm not too keen on the driver specific flag you suggest,
since it makes it hard to use the ioctl in generic code.  Maybe
drivers that can do pageflip from one of several fifos can expose a
driver specific ioctl to configure which one should be used.  I don't
imagine it's something that will change much?

cheers,
Kristiain

2009/8/30 Thomas Hellström :
> Thomas Hellström skrev:
>>
>>> I described this in more detail and hopefully more coherently in my
>>> email to Michel.  If that's still not clear, follow up there.
>>>
>>>
>>
>> I've read the mail and understand the proposal, thanks.
>>
>> /Thomas
>>
>>
>>
>
> So, I've been doing some thinking over the weekend and here's a constructive
> proposal that hopefully can be the base of an agreement:
>
> 1) We require the semantics of the pageflip ioctl to be such that it is safe
> to schedule render commands that reference the buffers involved immediately
> after the ioctl return, or in other words, the pageflip has entered the
> graphics pipeline and any render operations to the referenced  buffers will
> be guaranteed to be executed after the pageflip. How this is implemented is
> up to the driver, and thus the code will need driver-specific hooks. There
> is a multitude of ways to implement this, ranging from full hardware support
> to a real naive and stupid software implementation that blocks all command
> submission while there are pending flips.
> A simple sufficient implementation is to scan the buffers to be validated at
> cs time to see if there are pending pageflips on any of them. In that case,
> release all held cs locks and block. When the pageflip happens, continue.
> But again, that will be up to the driver.
>
> 2) We rip the blocking code out of the DRI2 protocol, since there is no
> longer any need for it.
>
> 3) The DRM event mechanism stays as proposed, The ioctl caller needs to
> explicitly request an event to get one. Events will initially be used by
> clients to wake _themselves_ from voluntary select() blocking.
>
> The motivation for 1-2  is as follows:
> a) The solution fits all kinds of smart hardware and cs mechanism, whereas
> the DRI2 blocking solution assumes a simple hardware and a naive kernel cs
> mechanism. One can argue that smart kernel schedulers or advanced hardware
> can work around the DRI2 blocking solution by sending out the event
> immediately, but there we are again working around the DRI2 blocking
> solution. We shouldn't need to do that.
>
> b) There's no requirement on masters to do scheduling with this proposal.
> Otherwise we'd have to live with that forever and implement it in all
> masters utilizing the pageflip ioctl.
>
> c) latency - performance. Consider the sequence of events between the vsync
> and the first set of rendering commands on the next frame being submit to
> the hardware:
>
> c1) DRI2 blocking: (Please correct me if I misunderstood something here)
> * vsync irq
> * schedule a wq thread that adds an event and wakes the X server.
> * X server issues a syscall to read the drm event
> * X server returns an event to the client with the new buffers. (write to
> socket ?)
> * Client reads the event (read from socket ?)
> * Client prepares the first command buffer (this is usually quite time
> consuming and in effect not only synchronizes GPU and command buffer
> building, but in effect serializes them).
> * Client builds and issues a cs ioctl.
> * Kernel submits commands to hardware.
>
> c2) Kernel scheduling (this proposal).
> * vsync irq
> * schedule the client thread that immediately submits commands to hardware.
>
> IMHO, c1 is far from optimal and should not be considered. One can argue
> once again, that this doesn't add much latency in practice, but we can't
> keep arguing like that for every such item we add per frame, and in this
> case the serialized command buffer building *does* add too much latency. We
> should seek and implement the optimal solution if it doesn't imply too much
> work or have side-effects.
>
> Some added functionality we should also perhaps consider adding to the ioctl
> interface:
>
> 1) A flag whether to vsync or not. Ideally a driconf option so that should
> perhaps be communicated as part of dri2 swapbuffers as well. I guess on
> Intel hardware you can only flip on vsync(?) but on some other hardware you
> can just send a new scanout startaddress down the FIFO. You'll definitely
> see tearing, though.
>
> 2) Driver private data. An example: Drivers with multiple hardware FIFOs
> that can do pageflipping and barriers on each FIFO might want to indicate to
> the kernel on which FIFO or HW context to schedule the pageflip. I guess
> this private data might also need to be passed along with dri2 swapbuffers.
>
> Thanks,
> /Thomas
>
>

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-31 Thread Thomas Hellström
Kristian Høgsberg wrote:
> Thomas,
>
> What you describe sounds reasonable and I'll look into updating the
> patch.  I'm not too keen on the driver specific flag you suggest,
> since it makes it hard to use the ioctl in generic code.  Maybe
> drivers that can do pageflip from one of several fifos can expose a
> driver specific ioctl to configure which one should be used.  I don't
> imagine it's something that will change much?
>
>   
Kristian,
Yes, that sounds OK with me.
That last driver specific part was not very well thought through, I 
admit. There's probably
other ways to do that as you suggest.

/Thomas


> cheers,
> Kristiain
>
> 2009/8/30 Thomas Hellström :
>   
>> Thomas Hellström skrev:
>> 
 I described this in more detail and hopefully more coherently in my
 email to Michel.  If that's still not clear, follow up there.


 
>>> I've read the mail and understand the proposal, thanks.
>>>
>>> /Thomas
>>>
>>>
>>>
>>>   
>> So, I've been doing some thinking over the weekend and here's a constructive
>> proposal that hopefully can be the base of an agreement:
>>
>> 1) We require the semantics of the pageflip ioctl to be such that it is safe
>> to schedule render commands that reference the buffers involved immediately
>> after the ioctl return, or in other words, the pageflip has entered the
>> graphics pipeline and any render operations to the referenced  buffers will
>> be guaranteed to be executed after the pageflip. How this is implemented is
>> up to the driver, and thus the code will need driver-specific hooks. There
>> is a multitude of ways to implement this, ranging from full hardware support
>> to a real naive and stupid software implementation that blocks all command
>> submission while there are pending flips.
>> A simple sufficient implementation is to scan the buffers to be validated at
>> cs time to see if there are pending pageflips on any of them. In that case,
>> release all held cs locks and block. When the pageflip happens, continue.
>> But again, that will be up to the driver.
>>
>> 2) We rip the blocking code out of the DRI2 protocol, since there is no
>> longer any need for it.
>>
>> 3) The DRM event mechanism stays as proposed, The ioctl caller needs to
>> explicitly request an event to get one. Events will initially be used by
>> clients to wake _themselves_ from voluntary select() blocking.
>>
>> The motivation for 1-2  is as follows:
>> a) The solution fits all kinds of smart hardware and cs mechanism, whereas
>> the DRI2 blocking solution assumes a simple hardware and a naive kernel cs
>> mechanism. One can argue that smart kernel schedulers or advanced hardware
>> can work around the DRI2 blocking solution by sending out the event
>> immediately, but there we are again working around the DRI2 blocking
>> solution. We shouldn't need to do that.
>>
>> b) There's no requirement on masters to do scheduling with this proposal.
>> Otherwise we'd have to live with that forever and implement it in all
>> masters utilizing the pageflip ioctl.
>>
>> c) latency - performance. Consider the sequence of events between the vsync
>> and the first set of rendering commands on the next frame being submit to
>> the hardware:
>>
>> c1) DRI2 blocking: (Please correct me if I misunderstood something here)
>> * vsync irq
>> * schedule a wq thread that adds an event and wakes the X server.
>> * X server issues a syscall to read the drm event
>> * X server returns an event to the client with the new buffers. (write to
>> socket ?)
>> * Client reads the event (read from socket ?)
>> * Client prepares the first command buffer (this is usually quite time
>> consuming and in effect not only synchronizes GPU and command buffer
>> building, but in effect serializes them).
>> * Client builds and issues a cs ioctl.
>> * Kernel submits commands to hardware.
>>
>> c2) Kernel scheduling (this proposal).
>> * vsync irq
>> * schedule the client thread that immediately submits commands to hardware.
>>
>> IMHO, c1 is far from optimal and should not be considered. One can argue
>> once again, that this doesn't add much latency in practice, but we can't
>> keep arguing like that for every such item we add per frame, and in this
>> case the serialized command buffer building *does* add too much latency. We
>> should seek and implement the optimal solution if it doesn't imply too much
>> work or have side-effects.
>>
>> Some added functionality we should also perhaps consider adding to the ioctl
>> interface:
>>
>> 1) A flag whether to vsync or not. Ideally a driconf option so that should
>> perhaps be communicated as part of dri2 swapbuffers as well. I guess on
>> Intel hardware you can only flip on vsync(?) but on some other hardware you
>> can just send a new scanout startaddress down the FIFO. You'll definitely
>> see tearing, though.
>>
>> 2) Driver private data. An example: Drivers with multiple hardware FIFOs
>> that can do pageflipping and barriers on each FIFO might want