Re: [PATCH] non-crashy GTT mapping patch

Eric Anholt Tue, 28 Oct 2008 16:56:19 -0700

On Tue, 2008-10-28 at 14:37 -0700, Jesse Barnes wrote:
> On Monday, October 27, 2008 11:27 am Jesse Barnes wrote:
> > On Friday, October 24, 2008 2:57 pm Jesse Barnes wrote:
> > > Ok this one doesn't crash and doesn't leave the flushing list full at
> > > leavevt time, so I think it's ready for some actual review.
> > >
> > > I'm using the patch I posted to intel-gfx@ to do tiled EXA pixmaps, but I
> > > think my approach of faulting in fence registers may not be the best one
> > > (though I haven't tried making the fence register allocator use LRU yet);
> > > it
> > > seems like we may want a big contiguous chunk of GTT space where pixmaps
> > > sit so we can re-use a single fence register to cover the needs of most
> > > pixmaps. Suggestions appreciated.
> > >
> > > This patch should be pretty safe to push upstream I think since the new
> > > code won't be used unless applications actually call the GTT mapping
> > > ioctl.
> >
> > And with untested 915/830 fence register support.
> 
> Merged up to drm-next from this morning.
> 
> Jesse


Comment here noting what this function's about (handling gtt mmap setup)
might be nice, as just naming-wise it sounds similar to the gem mmap
ioctl which doesn't hit this.

> +int
> +drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +     struct drm_file *priv = filp->private_data;
> +     struct drm_device *dev = priv->minor->dev;
> +     struct drm_gem_mm *mm = dev->mm_private;
> +     struct drm_map *map = NULL;
> +     struct drm_gem_object *obj;
> +     struct drm_hash_item *hash;
> +     unsigned long prot;
> +     int ret = 0;
> +
> +     mutex_lock(&dev->struct_mutex);
> +
> +     if (drm_ht_find_item(&mm->offset_hash, vma->vm_pgoff, &hash)) {
> +             mutex_unlock(&dev->struct_mutex);
> +             return drm_mmap(filp, vma);
> +     }
> +
> +     map = drm_hash_entry(hash, struct drm_map_list, hash)->map;
> +     if (!map ||
> +         ((map->flags & _DRM_RESTRICTED) && !capable(CAP_SYS_ADMIN))) {
> +             ret =  -EPERM;
> +             goto out_unlock;
> +     }
> +
> +     /* Check for valid size. */
> +     if (map->size < vma->vm_end - vma->vm_start) {
> +             ret = -EINVAL;
> +             goto out_unlock;
> +     }
> +
> +     obj = map->handle;
> +     if (!obj->dev->driver->gem_vm_ops) {
> +             ret = -EINVAL;
> +             goto out_unlock;
> +     }
> +
> +     vma->vm_flags |= VM_RESERVED | VM_IO | VM_PFNMAP | VM_DONTEXPAND;
> +     vma->vm_ops = obj->dev->driver->gem_vm_ops;
> +     vma->vm_private_data = map->handle;
> +     /* FIXME: use pgprot_writecombine when available */
> +     prot = pgprot_val(vma->vm_page_prot);
> +     prot |= _PAGE_CACHE_WC;
> +     vma->vm_page_prot = __pgprot(prot);
> +
> +     vma->vm_file = filp;    /* Needed for drm_vm_open() */
> +     drm_vm_open_locked(vma);
> +
> +out_unlock:
> +     mutex_unlock(&dev->struct_mutex);
> +
> +     return ret;
> +}
> +EXPORT_SYMBOL(drm_gem_mmap);
> diff --git a/drivers/gpu/drm/drm_hashtab.c b/drivers/gpu/drm/drm_hashtab.c
> index 3316067..af539f7 100644
> --- a/drivers/gpu/drm/drm_hashtab.c
> +++ b/drivers/gpu/drm/drm_hashtab.c

> +     /**
> +      * Required alignment for the object
> +      */
> +     uint32_t gtt_alignment;

We've got another gtt alignment value that comes in through
i915_gem_pin_ioctl.  We probably want to just calculate this one (which
is about getting things aligned for tiling fence regs) in the kernel
instad of being an ioctl argument from the gtt mmap ioctl, and be sure
in bind_to_gtt to take this value if it's larger than what pin passed
us.

> +/**
> + * i915_gem_fault - fault a page into the GTT
> + * vma: VMA in question
> + * vmf: fault info
> + *
> + * The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
> + * from userspace.  The fault handler takes care of binding the object to
> + * the GTT (if needed), allocating and programming a fence register (again,
> + * only if needed based on whether the old reg is still valid or the object
> + * is tiled) and inserting a new PTE into the faulting process.
> + *
> + * Note that the faulting process may involve evicting existing objects
> + * from the GTT and/or fence registers to make room.  So performance may
> + * suffer if the GTT working set is large or there are few fence registers
> + * left.
> + */
> +int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +     struct drm_gem_object *obj = vma->vm_private_data;
> +     struct drm_device *dev = obj->dev;
> +     struct drm_i915_private *dev_priv = dev->dev_private;
> +     struct drm_i915_gem_object *obj_priv = obj->driver_private;
> +     pgoff_t page_offset;
> +     unsigned long pfn;
> +     int ret = 0;
> +
> +     /* We don't use vmf->pgoff since that has the fake offset */
> +     page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >>
> +             PAGE_SHIFT;
> +
> +     /* Now bind it into the GTT if needed */
> +     mutex_lock(&dev->struct_mutex);
> +     if (!obj_priv->gtt_space) {
> +             ret = i915_gem_object_bind_to_gtt(obj, obj_priv->gtt_alignment);
> +             if (ret) {
> +                     mutex_unlock(&dev->struct_mutex);
> +                     return VM_FAULT_SIGBUS;
> +             }
> +             list_add(&obj_priv->list, &dev_priv->mm.inactive_list);
> +     }
> +
> +     /* Need a new fence register? */
> +     if (obj_priv->fence_reg == I915_FENCE_REG_NONE &&
> +         obj_priv->tiling_mode != I915_TILING_NONE)
> +             i915_gem_object_get_fence_reg(obj);

I like that this is here as opposed to always at object bind time.
However, we need to make sure we get a fence register covering X-tiled
stuff at exec time on pre-965, as it's relied on for 2D rendering (since
they didn't give us bits in the 2D instructions to specify it.  sigh.)


> +int
> +i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv)
> +{
> +     struct drm_i915_gem_mmap_gtt *args = data;
> +     struct drm_i915_private *dev_priv = dev->dev_private;
> +     struct drm_gem_object *obj;
> +     struct drm_i915_gem_object *obj_priv;
> +     int ret;
> +
> +     if (!(dev->driver->driver_features & DRIVER_GEM))
> +             return -ENODEV;
> +
> +     mutex_lock(&dev->struct_mutex);
> +     obj = drm_gem_object_lookup(dev, file_priv, args->handle);
> +     if (obj == NULL) {
> +             drm_gem_object_unreference(obj);
> +             mutex_unlock(&dev->struct_mutex);
> +             return -EBADF;
> +     }

You can call object_lookup without the struct_mutex held. It's protected
by the table_lock spinlock.  Also, unref on obj == NULL?


> @@ -656,10 +787,6 @@ i915_gem_retire_request(struct drm_device *dev,
>                */
>               if (obj_priv->last_rendering_seqno != request->seqno)
>                       return;
> -#if WATCH_LRU
> -             DRM_INFO("%s: retire %d moves to inactive list %p\n",
> -                      __func__, request->seqno, obj);
> -#endif
>  
>               if (obj->write_domain != 0) {
>                       list_move_tail(&obj_priv->list,
> @@ -788,7 +915,7 @@ i915_wait_request(struct drm_device *dev, uint32_t seqno)
>        * buffer to have made it to the inactive list, and we would need
>        * a separate wait queue to handle that.
>        */
> -     if (ret == 0)
> +     if (ret == 0 || ret == -ERESTARTSYS)
>               i915_gem_retire_requests(dev);
>  
>       return ret;

 Not sure what these two hunks are doing in this patch.

> @@ -920,7 +1047,9 @@ static int
>  i915_gem_object_unbind(struct drm_gem_object *obj)
>  {
>       struct drm_device *dev = obj->dev;
> +     struct drm_i915_private *dev_priv = dev->dev_private;
>       struct drm_i915_gem_object *obj_priv = obj->driver_private;
> +     loff_t offset;
>       int ret = 0;
>  
>  #if WATCH_BUF
> @@ -964,6 +1093,22 @@ i915_gem_object_unbind(struct drm_gem_object *obj)
>  
>       BUG_ON(obj_priv->active);
>  
> +     /* blow away mappings if mapped through GTT */
> +     offset = ((loff_t) obj->map_list.hash.key) << PAGE_SHIFT;
> +     unmap_mapping_range(dev->dev_mapping, offset, obj->size, 1);
> +
> +     if (obj_priv->fence_reg != I915_FENCE_REG_NONE) {
> +             if (IS_I965G(dev)) {
> +                     I915_WRITE64(FENCE_REG_965_0 +
> +                                  (obj_priv->fence_reg * 8), 0);
> +             } else {
> +                     I915_WRITE(FENCE_REG_830_0 +
> +                                (obj_priv->fence_reg * 4), 0);
> +             }
> +             dev_priv->fence_regs[obj_priv->fence_reg].obj = NULL;
> +             obj_priv->fence_reg = I915_FENCE_REG_NONE;
> +     }
> +
>       i915_gem_object_free_page_list(obj);
>  
>       if (obj_priv->gtt_space) {

Is unmap_mapping_range costly even when nobody's got it mapped?  I'm
concerned, since we've already got significant overhead in unmap/map
thrashing that's hurting us on relevant apps.

(didn't review any of the actual fence register mapping code that
follows, though I like that it's in pretty little functions now and not
the atrocity I committed in i830_memory.c


> +static void
> +i915_gem_object_get_fence_reg(struct drm_gem_object *obj)
> +{
> +     struct drm_device *dev = obj->dev;
> +     drm_i915_private_t *dev_priv = dev->dev_private;
> +     struct drm_i915_gem_object *obj_priv = obj->driver_private;
> +     struct drm_i915_fence_reg *reg = NULL;
> +     int i, ret;
> +
> +     switch (obj_priv->tiling_mode) {
> +     case I915_TILING_NONE:
> +             WARN(1, "allocating a fence for non-tiled object?\n");
> +             break;
> +     case I915_TILING_X:
> +             WARN(obj_priv->stride & (512 - 1),
> +                  "object is X tiled but has non-512B pitch\n");
> +             break;
> +     case I915_TILING_Y:
> +             WARN(obj_priv->stride & (128 - 1),
> +                  "object is Y tiled but has non-128B pitch\n");
> +             break;
> +     }
> +
> +     /* First try to find a free reg */
> +     for (i = dev_priv->fence_reg_start; i < dev_priv->num_fence_regs; i++) {
> +             reg = &dev_priv->fence_regs[i];
> +             if (!reg->obj)
> +                     break;
> +     }
> +
> +
> +     /* None available, try to steal one or wait for a user to finish */
> +     if (i == dev_priv->num_fence_regs) {
> +             struct drm_i915_gem_object *old_obj_priv = NULL;
> +             loff_t offset;
> +
> +try_again:
> +             /* Could try to use LRU here instead... */
> +             for (i = dev_priv->fence_reg_start;
> +                  i < dev_priv->num_fence_regs; i++) {
> +                     reg = &dev_priv->fence_regs[i];
> +                     old_obj_priv = reg->obj->driver_private;
> +                     if (!old_obj_priv->pin_count)
> +                             break;
> +             }

We'll definitely need to get this fixed, particularly on 915 where we'll
be hitting fence register setup from the accelerator (though we may want
to think about queuing flushes and register writes from the CP to avoid
lockstepping with the hardware too much)

> +             /*
> +              * Now things get ugly... we have to wait for one of the
> +              * objects to finish before trying again.
> +              */
> +             if (i == dev_priv->num_fence_regs) {
> +                     ret = i915_gem_object_wait_rendering(reg->obj);
> +                     if (ret) {
> +                             WARN(ret, "wait_rendering failed: %d\n", ret);
> +                             return;
> +                     }
> +                     goto try_again;
> +             }
> +
> +             /*
> +              * Zap this virtual mapping so we can set up a fence again
> +              * for this object next time we need it.
> +              */
> +             offset = ((loff_t) reg->obj->map_list.hash.key) << PAGE_SHIFT;
> +             unmap_mapping_range(dev->dev_mapping, offset,
> +                                 reg->obj->size, 1);
> +             old_obj_priv->fence_reg = I915_FENCE_REG_NONE;

Move removal of a fence register from an object to its own function?

> @@ -2146,7 +2465,12 @@ i915_gem_throttle_ioctl(struct drm_device *dev, void 
> *data,
>  
>  int i915_gem_init_object(struct drm_gem_object *obj)
>  {
> +     struct drm_device *dev = obj->dev;
> +     struct drm_gem_mm *mm = dev->mm_private;
>       struct drm_i915_gem_object *obj_priv;
> +     struct drm_map_list *list;
> +     struct drm_map *map;
> +     int ret = 0;
>  
>       obj_priv = drm_calloc(1, sizeof(*obj_priv), DRM_MEM_DRIVER);
>       if (obj_priv == NULL)
> @@ -2165,12 +2489,63 @@ int i915_gem_init_object(struct drm_gem_object *obj)
>  
>       obj->driver_private = obj_priv;
>       obj_priv->obj = obj;
> +     obj_priv->fence_reg = I915_FENCE_REG_NONE;
>       INIT_LIST_HEAD(&obj_priv->list);
> +
> +     /* Set the object up for mmap'ing */
> +     list = &obj->map_list;
> +     list->map = drm_calloc(1, sizeof(struct drm_map_list),
> +                            DRM_MEM_DRIVER);
> +     if (!list->map)
> +             return -ENOMEM;
> +
> +     map = list->map;
> +     map->type = _DRM_GEM;
> +     map->size = obj->size;
> +     map->handle = obj;
> +
> +     /* Get a DRM GEM mmap offset allocated... */
> +     list->file_offset_node = drm_mm_search_free(&mm->offset_manager,
> +                                                 obj->size / PAGE_SIZE, 0, 
> 0);
> +     if (!list->file_offset_node) {
> +             DRM_ERROR("failed to allocate offset for bo %d\n", obj->name);
> +             ret = -ENOMEM;
> +             goto out_free_list;
> +     }
> +
> +     list->file_offset_node = drm_mm_get_block(list->file_offset_node,
> +                                               obj->size / PAGE_SIZE, 0);
> +     if (!list->file_offset_node) {
> +             ret = -ENOMEM;
> +             goto out_free_list;
> +     }
> +
> +     list->hash.key = list->file_offset_node->start;
> +     if (drm_ht_insert_item(&mm->offset_hash, &list->hash)) {
> +             DRM_ERROR("failed to add to map hash\n");
> +             goto out_free_mm;
> +     }
> +
> +     /* By now we should be all set, any drm_mmap request on the offset
> +      * below will get to our mmap & fault handler */
> +     obj_priv->mmap_offset = ((uint64_t) list->hash.key) << PAGE_SHIFT;
> +
>       return 0;
> +
> +out_free_mm:
> +     drm_mm_put_block(list->file_offset_node);
> +out_free_list:
> +     drm_free(list->map, sizeof(struct drm_map_list), DRM_MEM_DRIVER);
> +
> +     return ret;
>  }
 
Given that the number of GTT mmaps will be much lower than the number of
objects, this setup should probably be part of gtt_mmap_ioctl.

> @@ -2269,8 +2658,7 @@ i915_gem_idle(struct drm_device *dev)
>        */
>       i915_gem_flush(dev, ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT),
>                      ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT));
> -     seqno = i915_add_request(dev, ~(I915_GEM_DOMAIN_CPU |
> -                                     I915_GEM_DOMAIN_GTT));
> +     seqno = i915_add_request(dev, ~I915_GEM_DOMAIN_CPU);
>  
>       if (seqno == 0) {
>               mutex_unlock(&dev->struct_mutex);

I don't think I've thought about this enough yet.

> @@ -2303,7 +2691,15 @@ i915_gem_idle(struct drm_device *dev)
>        * waited for a sequence higher than any pending execbuffer
>        */
>       BUG_ON(!list_empty(&dev_priv->mm.active_list));
> -     BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
> +     if (!list_empty(&dev_priv->mm.flushing_list)) {
> +             struct drm_i915_gem_object *obj_priv;
> +             DRM_ERROR("flushing list not empty, still has:\n");
> +             list_for_each_entry(obj_priv, &dev_priv->mm.flushing_list,
> +                                 list) {
> +                     DRM_ERROR("    %p: %d\n", obj_priv,
> +                               obj_priv->last_rendering_seqno);
> +             }
> +     }
>  
>       /* Request should now be empty as we've also waited
>        * for the last request in the list

With keithp's fix this shouldn't be necessary, and probably shouldn't be
part of this patch either way.

> @@ -2566,5 +2961,13 @@ i915_gem_load(struct drm_device *dev)
>                         i915_gem_retire_work_handler);
>       dev_priv->mm.next_gem_seqno = 1;
>  
> +     /* Old X drivers will take 0-2 for front, back, depth buffers */
> +     dev_priv->fence_reg_start = 3;
> +
> +     if (IS_I965G(dev))
> +             dev_priv->num_fence_regs = 16;
> +     else
> +             dev_priv->num_fence_regs = 8;
> +
>       i915_gem_detect_bit_6_swizzle(dev);

I like this.  Now to remember at some point have the driver tell us that
it's not using 0-3.

> diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c 
> b/drivers/gpu/drm/i915/i915_gem_tiling.c
> index e8b85ac..1101aed 100644
> --- a/drivers/gpu/drm/i915/i915_gem_tiling.c
> +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
> @@ -207,6 +207,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>               }
>       }
>       obj_priv->tiling_mode = args->tiling_mode;
> +     obj_priv->stride = args->stride;
>  
>       mutex_unlock(&dev->struct_mutex);
>  

I was going to yell at you for changing the ABI by introducing a new
argument to an existing ioctl, and apparently one of you had already
convinced me to sneak that argument in there even though we didn't use
it.  You were right, whoever you were.

-- 
Eric Anholt
[EMAIL PROTECTED]                         [EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] non-crashy GTT mapping patch

Reply via email to