On Thu, Apr 10, 2014 at 10:24:53AM +0800, Zhao Yakui wrote:
> On Wed, 2014-04-09 at 08:34 -0600, Daniel Vetter wrote:
> > On Wed, Apr 09, 2014 at 09:59:56AM +0800, Zhao Yakui wrote:
> > > The BDW GT3 has two independent BSD rings, which can be used to process 
> > > the
> > > video commands. To be simpler, it is transparent to user-space 
> > > driver/middleware.
> > > Instead the kernel driver will decide which ring is to dispatch the BSD 
> > > video
> > > command.
> > > 
> > > As every BSD ring is powerful, it is enough to dispatch the BSD video 
> > > command
> > > based on the drm fd. In such case the different BSD ring is used for 
> > > video playing
> > > back and encoding. At the same time the coarse dispatch mechanism can 
> > > help to avoid
> > > the object synchronization between the BSD rings.
> > > 
> > > Signed-off-by: Zhao Yakui <yakui.z...@intel.com>
> > 
> > This looks way too complicated. First things first please get rid of the
> > atomic_t usage. If you don't have _massive_ comments explaining the memory
> > barriers you're most likely using linux kernel atomic_t wrong. They are
> > fully unordered.
> 
> Thanks for the review.
> 
> For the atomic_t usage:  I will remove it in next version as the counter
> is already protected by the lock.  
> 
> > 
> > With that out of the way this still looks a bit complicated really. Can't
> > we just use a very simple static rule in gen8_dispatch_bsd_ring which
> > hashed the pointer address of the file_priv? Just to get things going,
> > once we have a clear need we can try to make things more intelligent. But
> > in case of doubt I really prefer if we start with the dumbest possible
> > approach first and add complexity instead of starting with something
> > really complex and simplifying it.
> 
> Do you mean that file_priv is hashed and then is mapped to BSD 0 or 1
> ring?  

Yeah, that's the idea. Get in the basic support first, make it fancy like
you describe below second. This has a few upsides:
- We can concentrate on validating basic support in the first round
  instead of potentially fighting a bug in the load balancer.
- Discussions and performance testing for the load balancer won't hold up
  the entire feature.
- Like I've said this might not be required. Before we add more complexity
  than just hashing the file_priv I want to see some benchmarks of
  expected workloads that show that the load balancing is indeed a good
  idea - for the case of a transcode server I guess we should have
  sufficient in-flight operations that it won't really matter. Or at least
  I hope so.

So maybe split this patch up into the first step with the basic file_priv
hashing mapping and the 2nd patch to add the improved algo?

Cheers, Daniel

> The GT3 machine has two independent BSD rings. It will be better that
> the kernel driver can balance the video workload between the two rings. 
> When using the hashed file_priv to select BSD ring, the video balance
> depends on the design of hash design. Under some scenarios, it will be
> possible that one ring is very busy while another ring is very idle. And
> then performance of video playing back/encoding will be affected.
> At the same time the hash mechanism is only used to select the
> corresponding BSD ring when one drm_fd is opened. And it doesn't
> consider the video workload balance after finishing some workloads.
> 
> The following is the basic idea in my patch.(A counter variable is added
> for ring. The bigger the counter, the higher the workload).
>    a. When one new fd needs to dispatch the BSD video command, it will
> select the ring with the lowest workload(lowest counter). And then
> counter in this ring will be added.
>    b. when the drm fd is closed(the workload is finished), the counter
> of the ring used by file_priv will be decreased. 
>    c. When the drm fd already selects one BSD ring in previously
> submitted command, it will check whether it is using the ring with the
> lowest workload(lowest counter). If not, it can be switched. The purpose
> is to assure that the workload is still balanced between the two BSD
> rings. For example: User wants to play back four video clips. BSD 0 ring
> is selected to play back the two long clips. BSD 1 ring is selected to
> play back the two short clips. After it finishes the playing back of two
> short clips, the BSD 1 ring can be switched to play back the long clip.
> Still balance.
> 
> What do you think?
> 
> > -Daniel
> > 
> > > ---
> > >  drivers/gpu/drm/i915/i915_dma.c            |   14 ++++++
> > >  drivers/gpu/drm/i915/i915_drv.h            |    3 ++
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   73 
> > > +++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h    |    2 +
> > >  5 files changed, 93 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> > > b/drivers/gpu/drm/i915/i915_dma.c
> > > index 0b38f88..8260463 100644
> > > --- a/drivers/gpu/drm/i915/i915_dma.c
> > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > @@ -1572,6 +1572,7 @@ int i915_driver_load(struct drm_device *dev, 
> > > unsigned long flags)
> > >   spin_lock_init(&dev_priv->backlight_lock);
> > >   spin_lock_init(&dev_priv->uncore.lock);
> > >   spin_lock_init(&dev_priv->mm.object_stat_lock);
> > > + spin_lock_init(&dev_priv->bsd_lock);
> > >   mutex_init(&dev_priv->dpio_lock);
> > >   mutex_init(&dev_priv->modeset_restore_lock);
> > >  
> > > @@ -1928,7 +1929,20 @@ void i915_driver_preclose(struct drm_device * dev, 
> > > struct drm_file *file_priv)
> > >  void i915_driver_postclose(struct drm_device *dev, struct drm_file *file)
> > >  {
> > >   struct drm_i915_file_private *file_priv = file->driver_priv;
> > > + struct intel_ring_buffer *bsd_ring;
> > > + struct drm_i915_private *dev_priv = dev->dev_private;
> > >  
> > > + if (file_priv && file_priv->bsd_ring) {
> > > +         int cmd_counter;
> > > +         bsd_ring = file_priv->bsd_ring;
> > > +         file_priv->bsd_ring = NULL;
> > > +         spin_lock(&dev_priv->bsd_lock);
> > > +         cmd_counter = atomic_sub_return(1, &bsd_ring->bsd_cmd_counter);
> > > +         if (cmd_counter < 0) {
> > > +                 atomic_set(&bsd_ring->bsd_cmd_counter, 0);
> > > +         }
> > > +         spin_unlock(&dev_priv->bsd_lock);
> > > + }
> > >   kfree(file_priv);
> > >  }
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > > b/drivers/gpu/drm/i915/i915_drv.h
> > > index d77f4e0..128639c 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1457,6 +1457,8 @@ struct drm_i915_private {
> > >   struct i915_dri1_state dri1;
> > >   /* Old ums support infrastructure, same warning applies. */
> > >   struct i915_ums_state ums;
> > > + /* the lock for dispatch video commands on two BSD rings */
> > > + spinlock_t bsd_lock;
> > >  };
> > >  
> > >  static inline struct drm_i915_private *to_i915(const struct drm_device 
> > > *dev)
> > > @@ -1664,6 +1666,7 @@ struct drm_i915_file_private {
> > >  
> > >   struct i915_hw_context *private_default_ctx;
> > >   atomic_t rps_wait_boost;
> > > + struct  intel_ring_buffer *bsd_ring;
> > >  };
> > >  
> > >  /*
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> > > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > index 3491402..75d8cc0 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > @@ -999,6 +999,70 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
> > >   return 0;
> > >  }
> > >  
> > > +/**
> > > + * Find one BSD ring to dispatch the corresponding BSD command.
> > > + * The Ring ID is returned.
> > > + */
> > > +static int gen8_dispatch_bsd_ring(struct drm_device *dev,
> > > +                           struct drm_file *file)
> > > +{
> > > + struct drm_i915_private *dev_priv = dev->dev_private;
> > > + struct drm_i915_file_private *file_priv = file->driver_priv;
> > > + struct intel_ring_buffer *temp_ring, *bsd_ring;
> > > + int bsd_counter, temp_counter;
> > > +
> > > + if (file_priv->bsd_ring) {
> > > +         /* Check whether the load balance is required.*/
> > > +         spin_lock(&dev_priv->bsd_lock);
> > > +         bsd_counter = 
> > > atomic_read(&(file_priv->bsd_ring->bsd_cmd_counter));
> > > +         temp_ring = &dev_priv->ring[VCS];
> > > +         temp_counter = atomic_read(&temp_ring->bsd_cmd_counter);
> > > +         bsd_ring = &dev_priv->ring[VCS];
> > > +
> > > +         temp_ring = &dev_priv->ring[VCS2];
> > > +         if (atomic_read(&temp_ring->bsd_cmd_counter) < temp_counter) {
> > > +                 temp_counter = atomic_read(&temp_ring->bsd_cmd_counter);
> > > +                 bsd_ring = temp_ring;
> > > +         }
> > > +         /*
> > > +          * If it is already the ring with the minimum load, it is
> > > +          * unnecessary to switch it.
> > > +          */
> > > +         if (bsd_ring == file_priv->bsd_ring) {
> > > +                 spin_unlock(&dev_priv->bsd_lock);
> > > +                 return bsd_ring->id;
> > > +         }
> > > +         /*
> > > +          * If the load delta between current ring and target ring is
> > > +          * small, it is unnecessary to switch it.
> > > +          */
> > > +         if ((bsd_counter - temp_counter) <= 1) {
> > > +                 spin_unlock(&dev_priv->bsd_lock);
> > > +                 return file_priv->bsd_ring->id;
> > > +         }
> > > +         /* balance the load between current ring and target ring */
> > > +         atomic_dec(&file_priv->bsd_ring->bsd_cmd_counter);
> > > +         atomic_inc(&bsd_ring->bsd_cmd_counter);
> > > +         spin_unlock(&dev_priv->bsd_lock);
> > > +         file_priv->bsd_ring = bsd_ring;
> > > +         return bsd_ring->id;
> > > + } else {
> > > +         spin_lock(&dev_priv->bsd_lock);
> > > +         bsd_ring = &dev_priv->ring[VCS];
> > > +         bsd_counter = atomic_read(&bsd_ring->bsd_cmd_counter);
> > > +         temp_ring = &dev_priv->ring[VCS2];
> > > +         temp_counter = atomic_read(&temp_ring->bsd_cmd_counter);
> > > +         if (temp_counter < bsd_counter) {
> > > +                 bsd_ring = temp_ring;
> > > +                 bsd_counter = temp_counter;
> > > +         }
> > > +         atomic_inc(&bsd_ring->bsd_cmd_counter);
> > > +         file_priv->bsd_ring = bsd_ring;
> > > +         spin_unlock(&dev_priv->bsd_lock);
> > > +         return bsd_ring->id;
> > > + }
> > > +}
> > > +
> > >  static int
> > >  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> > >                  struct drm_file *file,
> > > @@ -1043,7 +1107,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
> > > void *data,
> > >  
> > >   if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_DEFAULT)
> > >           ring = &dev_priv->ring[RCS];
> > > - else
> > > + else if ((args->flags & I915_EXEC_RING_MASK) == I915_EXEC_BSD) {
> > > +         if (HAS_BSD2(dev)) {
> > > +                 int ring_id;
> > > +                 ring_id = gen8_dispatch_bsd_ring(dev, file);
> > > +                 ring = &dev_priv->ring[ring_id];
> > > +         } else
> > > +                 ring = &dev_priv->ring[VCS];
> > > + } else
> > >           ring = &dev_priv->ring[(args->flags & I915_EXEC_RING_MASK) - 1];
> > >  
> > >   if (!intel_ring_initialized(ring)) {
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 43e0227..852fc29 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -2123,6 +2123,7 @@ int intel_init_bsd_ring_buffer(struct drm_device 
> > > *dev)
> > >           ring->dispatch_execbuffer = i965_dispatch_execbuffer;
> > >   }
> > >   ring->init = init_ring_common;
> > > + atomic_set(&ring->bsd_cmd_counter, 0);
> > >  
> > >   return intel_init_ring_buffer(dev, ring);
> > >  }
> > > @@ -2170,6 +2171,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device 
> > > *dev)
> > >  
> > >   ring->init = init_ring_common;
> > >  
> > > + atomic_set(&ring->bsd_cmd_counter, 0);
> > >   return intel_init_ring_buffer(dev, ring);
> > >  }
> > >  
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > index 8ca4285..4f87b08 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > @@ -196,6 +196,8 @@ struct  intel_ring_buffer {
> > >    * to encode the command length in the header).
> > >    */
> > >   u32 (*get_cmd_length_mask)(u32 cmd_header);
> > > +
> > > + atomic_t bsd_cmd_counter;
> > >  };
> > >  
> > >  static inline bool
> > > -- 
> > > 1.7.10.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to