[Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.

2014-05-14 Thread Gwenole Beauchesne
Simplify the construction of the MI_FLUSH_DW command stream. Use ring
buffer generic variants of BEGIN, OUT, ADVANCE batch functions.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/intel_batchbuffer.c | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
index 9dc496d..bc564d5 100644
--- a/src/intel_batchbuffer.c
+++ b/src/intel_batchbuffer.c
@@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer 
*batch)
 }
 
 } else {
-if (batch-flag == I915_EXEC_BLT) {
-BEGIN_BLT_BATCH(batch, 4);
-OUT_BLT_BATCH(batch, MI_FLUSH_DW);
-OUT_BLT_BATCH(batch, 0);
-OUT_BLT_BATCH(batch, 0);
-OUT_BLT_BATCH(batch, 0);
-ADVANCE_BLT_BATCH(batch);
-}else if (batch-flag == I915_EXEC_VEBOX) {
-BEGIN_VEB_BATCH(batch, 4);
-OUT_VEB_BATCH(batch, MI_FLUSH_DW);
-OUT_VEB_BATCH(batch, 0);
-OUT_VEB_BATCH(batch, 0);
-OUT_VEB_BATCH(batch, 0);
-ADVANCE_VEB_BATCH(batch);
-} else {
-assert(batch-flag == I915_EXEC_BSD);
-BEGIN_BCS_BATCH(batch, 4);
-OUT_BCS_BATCH(batch, MI_FLUSH_DW | 
MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE);
-OUT_BCS_BATCH(batch, 0);
-OUT_BCS_BATCH(batch, 0);
-OUT_BCS_BATCH(batch, 0);
-ADVANCE_BCS_BATCH(batch);
-}
+uint32_t cmd = MI_FLUSH_DW;
+if (batch-flag == I915_EXEC_BSD)
+cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;
+
+__BEGIN_BATCH(batch, 4, batch-flag);
+__OUT_BATCH(batch, cmd);
+__OUT_BATCH(batch, 0);
+__OUT_BATCH(batch, 0);
+__OUT_BATCH(batch, 0);
+__ADVANCE_BATCH(batch);
 }
 } else {
 if (batch-flag == I915_EXEC_RENDER) {
-- 
1.8.3.2

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


[Libva] [PATCH 0/2] Simplify and fix MI_FLUSH_DW for Gen6+

2014-05-14 Thread Gwenole Beauchesne
Hi,

This patch series simplifies intel_batchbuffer_emit_mi_flush() for the
BSD ring, while also providing a minor fix for Broadwell.

Note: there are additional things that could be simplified. Please
have a look at it afterwards.

Thanks,
Gwenole Beauchesne (2):
  batch: factor out MI_FLUSH_DW batch buffer construction.
  batch: fix MI_FLUSH_DW for Broadwell.

 src/intel_batchbuffer.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

-- 
1.8.3.2

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] (H264) Reference Picture selection

2014-05-14 Thread Zhao, Yakui
On Sun, 2014-05-11 at 23:34 -0600, Gwenole Beauchesne wrote:
 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote:
  Hi,
 
  2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com:
   On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote:
   Hi,
  
I have a scheme where the decoder ack's received frames - With this
information I could resync without an IDR by carefully selecting the
reference frame(s).
   
Will you please describe your idea in detail?
   
long term references would typically be employed as well, but it
seems ref-pic-marking was removed may'13?
   
Why do you hope to use the long-term reference? As you know, the
current encoding is based on hardware-acceleration. When more the
reference frames can be selected, the driver will be more complex and
then the encoding speed is also affected.
   
   It's for implementing Cisco's clearpath technology
   (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa
   re/clearpath/clearpath_whitepaper.pdf)
   Basically one would  periodically mark a P frame as long term 
   reference, in
   case of packet loss one can thus refer to this one instead of 
   restarting the
   stream with an idr.
   
   
   I'm not too concerned about the encoding speed as long as it's above
   realtime (60fps 1920x1080)
  
   Maybe it's more convenient If libva can have a switch for slice_header 
   made between driver and app. Then some specially cases(ref-reorder, 
   long-term-ref) may let app to manage slice-header.
  
   Thanks for your suggestion. If the slice-header info is also generated
   by the app and then be passed to the driver, it can be easy to do the
   operation of re-order and long-term-ref
  
   But this will depend on the policy how the slice header info is
   generated. (by app or the driver). Now this is exported by querying the
   capability of the driver. If we move this into the app, the
   infrastructure will be changed a lot.
 
  Infrastructure of what? The application? No, it doesn't really change
  anything but generating the extra packed slice header.
 
  Currently the app will query the capability of encoding and determine
  which kind of header should be passed by the upper application(For
  example: PPS/SPS).
  If the Slice_header info is exposed, the driver will assume that it is
  the responsibility of generating the slice header info. In such case it
  is difficult to mix the two working modes without a lot of changes.(One
  is that the driver generates the slice header info. Another is that the
  app generates the slice header info).
 
 Then the driver needs to be fixed to not assume anything. The API and
 operation points look clear enough, aren't they?

Sorry that something is not described very clearly in my previous email
because of one typo error.

   If the Slice_header info is exposed, app should be responsible of
generating the slice header info and then passed it into the driver. 
   If the slice_header info is not exposed, the driver will be
responsible of generating the slice header info. 

And it is difficult to mix the above two working modes without a lot of
changes.   

 
 The changes to the codec layer are minimal, and this was done already,
 in multiple instances of codec layers, and with another driver.
 
   Another problem is that it is possible that one frame has multiple
   slices. In such case the user should pass the multiple slice header
   info.
 
  This is not an an issue. The API mandates that the packed headers are
  provided in order.
 
 
 
 
  Regards,
  Gwenole.
 
 


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] (H264) Reference Picture selection

2014-05-14 Thread Xiang, Haihao
On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote: 
 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote:
  Hi,
 
  2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com:
   On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote:
   Hi,
  
I have a scheme where the decoder ack's received frames - With this
information I could resync without an IDR by carefully selecting the
reference frame(s).
   
Will you please describe your idea in detail?
   
long term references would typically be employed as well, but it
seems ref-pic-marking was removed may'13?
   
Why do you hope to use the long-term reference? As you know, the
current encoding is based on hardware-acceleration. When more the
reference frames can be selected, the driver will be more complex and
then the encoding speed is also affected.
   
   It's for implementing Cisco's clearpath technology
   (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa
   re/clearpath/clearpath_whitepaper.pdf)
   Basically one would  periodically mark a P frame as long term 
   reference, in
   case of packet loss one can thus refer to this one instead of 
   restarting the
   stream with an idr.
   
   
   I'm not too concerned about the encoding speed as long as it's above
   realtime (60fps 1920x1080)
  
   Maybe it's more convenient If libva can have a switch for slice_header 
   made between driver and app. Then some specially cases(ref-reorder, 
   long-term-ref) may let app to manage slice-header.
  
   Thanks for your suggestion. If the slice-header info is also generated
   by the app and then be passed to the driver, it can be easy to do the
   operation of re-order and long-term-ref
  
   But this will depend on the policy how the slice header info is
   generated. (by app or the driver). Now this is exported by querying the
   capability of the driver. If we move this into the app, the
   infrastructure will be changed a lot.
 
  Infrastructure of what? The application? No, it doesn't really change
  anything but generating the extra packed slice header.
 
  Currently the app will query the capability of encoding and determine
  which kind of header should be passed by the upper application(For
  example: PPS/SPS).
  If the Slice_header info is exposed, the driver will assume that it is
  the responsibility of generating the slice header info. In such case it
  is difficult to mix the two working modes without a lot of changes.(One
  is that the driver generates the slice header info. Another is that the
  app generates the slice header info).
 
 Then the driver needs to be fixed to not assume anything. The API and
 operation points look clear enough, aren't they?
 
 The changes to the codec layer are minimal, and this was done already,
 in multiple instances of codec layers, and with another driver.

Where is your code for codec layer with packed slice header/packed raw
data, I want to know the point where is packed slice header / packed raw
data inserted into the bitstream in the codec layer.

 
 
   Another problem is that it is possible that one frame has multiple
   slices. In such case the user should pass the multiple slice header
   info.
 
  This is not an an issue. The API mandates that the packed headers are
  provided in order.
 
 
 
 
  Regards,
  Gwenole.
 
 
 ___
 Libva mailing list
 Libva@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/libva



___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote:
 Simplify the construction of the MI_FLUSH_DW command stream. Use ring
 buffer generic variants of BEGIN, OUT, ADVANCE batch functions.

Hi, Gwenole

Thanks for your patch.
But I don't think that we need factor out of MI_FLUSH_DW batch
buffer construction.
   1. Currently it is very lucky that BSD/BLT/VEBOX are using the
similar MI_FLUSH_DW. But the command definition is different on these
rings. we are not sure whether we will always use the similar command.
When some bits are enabled on the corresponding rings, we will have to
split it again.
   2. we are considering the redefinition of batch-flag especially for
the dual VDBOX for BDW. 

Thanks.
Yakui

 
 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/intel_batchbuffer.c | 33 ++---
  1 file changed, 10 insertions(+), 23 deletions(-)
 
 diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
 index 9dc496d..bc564d5 100644
 --- a/src/intel_batchbuffer.c
 +++ b/src/intel_batchbuffer.c
 @@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct 
 intel_batchbuffer *batch)
  }
  
  } else {
 -if (batch-flag == I915_EXEC_BLT) {
 -BEGIN_BLT_BATCH(batch, 4);
 -OUT_BLT_BATCH(batch, MI_FLUSH_DW);
 -OUT_BLT_BATCH(batch, 0);
 -OUT_BLT_BATCH(batch, 0);
 -OUT_BLT_BATCH(batch, 0);
 -ADVANCE_BLT_BATCH(batch);
 -}else if (batch-flag == I915_EXEC_VEBOX) {
 -BEGIN_VEB_BATCH(batch, 4);
 -OUT_VEB_BATCH(batch, MI_FLUSH_DW);
 -OUT_VEB_BATCH(batch, 0);
 -OUT_VEB_BATCH(batch, 0);
 -OUT_VEB_BATCH(batch, 0);
 -ADVANCE_VEB_BATCH(batch);
 -} else {
 -assert(batch-flag == I915_EXEC_BSD);
 -BEGIN_BCS_BATCH(batch, 4);
 -OUT_BCS_BATCH(batch, MI_FLUSH_DW | 
 MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE);
 -OUT_BCS_BATCH(batch, 0);
 -OUT_BCS_BATCH(batch, 0);
 -OUT_BCS_BATCH(batch, 0);
 -ADVANCE_BCS_BATCH(batch);
 -}
 +uint32_t cmd = MI_FLUSH_DW;
 +if (batch-flag == I915_EXEC_BSD)
 +cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;
 +
 +__BEGIN_BATCH(batch, 4, batch-flag);
 +__OUT_BATCH(batch, cmd);
 +__OUT_BATCH(batch, 0);
 +__OUT_BATCH(batch, 0);
 +__OUT_BATCH(batch, 0);
 +__ADVANCE_BATCH(batch);
  }
  } else {
  if (batch-flag == I915_EXEC_RENDER) {


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] (H264) Reference Picture selection

2014-05-14 Thread Xiang, Haihao
On Wed, 2014-05-14 at 15:49 +0800, Zhao, Yakui wrote: 
 On Sun, 2014-05-11 at 23:34 -0600, Gwenole Beauchesne wrote:
  2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
   On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote:
   Hi,
  
   2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com:
On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote:
Hi,
   
 I have a scheme where the decoder ack's received frames - With 
 this
 information I could resync without an IDR by carefully selecting 
 the
 reference frame(s).

 Will you please describe your idea in detail?

 long term references would typically be employed as well, but it
 seems ref-pic-marking was removed may'13?

 Why do you hope to use the long-term reference? As you know, the
 current encoding is based on hardware-acceleration. When more the
 reference frames can be selected, the driver will be more complex 
 and
 then the encoding speed is also affected.

It's for implementing Cisco's clearpath technology
(http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa
re/clearpath/clearpath_whitepaper.pdf)
Basically one would  periodically mark a P frame as long term 
reference, in
case of packet loss one can thus refer to this one instead of 
restarting the
stream with an idr.


I'm not too concerned about the encoding speed as long as it's above
realtime (60fps 1920x1080)
   
Maybe it's more convenient If libva can have a switch for 
slice_header made between driver and app. Then some specially 
cases(ref-reorder, long-term-ref) may let app to manage slice-header.
   
Thanks for your suggestion. If the slice-header info is also generated
by the app and then be passed to the driver, it can be easy to do the
operation of re-order and long-term-ref
   
But this will depend on the policy how the slice header info is
generated. (by app or the driver). Now this is exported by querying the
capability of the driver. If we move this into the app, the
infrastructure will be changed a lot.
  
   Infrastructure of what? The application? No, it doesn't really change
   anything but generating the extra packed slice header.
  
   Currently the app will query the capability of encoding and determine
   which kind of header should be passed by the upper application(For
   example: PPS/SPS).
   If the Slice_header info is exposed, the driver will assume that it is
   the responsibility of generating the slice header info. In such case it
   is difficult to mix the two working modes without a lot of changes.(One
   is that the driver generates the slice header info. Another is that the
   app generates the slice header info).
  
  Then the driver needs to be fixed to not assume anything. The API and
  operation points look clear enough, aren't they?
 
 Sorry that something is not described very clearly in my previous email
 because of one typo error.
 
If the Slice_header info is exposed, app should be responsible of
 generating the slice header info and then passed it into the driver. 
If the slice_header info is not exposed, the driver will be
 responsible of generating the slice header info. 
 
 And it is difficult to mix the above two working modes without a lot of
 changes.   

Another thing is for CBR, the QP might be changed per frame in the
driver and app/middleware doesn't not know the final QP. How does app
get the right slice_qp_delta to build the right packed slice header ?
does app always use a fixed slice_qp_delta ?

 
  
  The changes to the codec layer are minimal, and this was done already,
  in multiple instances of codec layers, and with another driver.
  
Another problem is that it is possible that one frame has multiple
slices. In such case the user should pass the multiple slice header
info.
  
   This is not an an issue. The API mandates that the packed headers are
   provided in order.
  
  
  
  
   Regards,
   Gwenole.
  
  
 
 
 ___
 Libva mailing list
 Libva@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/libva


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote:
 The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra
 dword for the high order bits of the Address field.

Thanks for your patch.

What is wrong if this is applied?
The 4 dwords are still ok to Broadwell.  And the address/data field are
not used in the command of MI_FLUSH_DW.
So the current command is correct for Broadwell.

 
 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/intel_batchbuffer.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
 index bc564d5..01e04d5 100644
 --- a/src/intel_batchbuffer.c
 +++ b/src/intel_batchbuffer.c
 @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct 
 intel_batchbuffer *batch)
  }
  
  } else {
 +const uint32_t extra_dword = !!IS_GEN8(intel-device_id);
  uint32_t cmd = MI_FLUSH_DW;
  if (batch-flag == I915_EXEC_BSD)
  cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;
  
 -__BEGIN_BATCH(batch, 4, batch-flag);
 -__OUT_BATCH(batch, cmd);
 +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag);
 +__OUT_BATCH(batch, cmd|extra_dword);
  __OUT_BATCH(batch, 0);
 +if (extra_dword)
 +__OUT_BATCH(batch, 0);
  __OUT_BATCH(batch, 0);
  __OUT_BATCH(batch, 0);
  __ADVANCE_BATCH(batch);


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.

2014-05-14 Thread Gwenole Beauchesne
Hi,

2014-05-14 10:10 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
 On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote:
 The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra
 dword for the high order bits of the Address field.

 Thanks for your patch.

 What is wrong if this is applied?
 The 4 dwords are still ok to Broadwell.  And the address/data field are
 not used in the command of MI_FLUSH_DW.
 So the current command is correct for Broadwell.

Yes, it's still correct... until we use post-sync ops. But anyway, if
you want to keep at the minimum similar semantics to what used to be
used, then it's preferred to have the required provisions for handling
that. Before, we used the defaults (QWORD immediate, if any). Now, the
default semantics changed.

(Un)fortunately, a command stream verifier won't notice that, but if
you want to strictly implement the specs for consistency, an Address
is always 64-bits on Broadwell. If you don't want to follow the specs,
that's your call, just remember that in the future and update the code
accordingly.

Regards,
Gwenole.

 ---
  src/intel_batchbuffer.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
 index bc564d5..01e04d5 100644
 --- a/src/intel_batchbuffer.c
 +++ b/src/intel_batchbuffer.c
 @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct 
 intel_batchbuffer *batch)
  }

  } else {
 +const uint32_t extra_dword = !!IS_GEN8(intel-device_id);
  uint32_t cmd = MI_FLUSH_DW;
  if (batch-flag == I915_EXEC_BSD)
  cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;

 -__BEGIN_BATCH(batch, 4, batch-flag);
 -__OUT_BATCH(batch, cmd);
 +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag);
 +__OUT_BATCH(batch, cmd|extra_dword);
  __OUT_BATCH(batch, 0);
 +if (extra_dword)
 +__OUT_BATCH(batch, 0);
  __OUT_BATCH(batch, 0);
  __OUT_BATCH(batch, 0);
  __ADVANCE_BATCH(batch);


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.

2014-05-14 Thread Gwenole Beauchesne
Hi,

2014-05-14 10:05 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
 On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote:
 Simplify the construction of the MI_FLUSH_DW command stream. Use ring
 buffer generic variants of BEGIN, OUT, ADVANCE batch functions.

 Hi, Gwenole

 Thanks for your patch.
 But I don't think that we need factor out of MI_FLUSH_DW batch
 buffer construction.
1. Currently it is very lucky that BSD/BLT/VEBOX are using the
 similar MI_FLUSH_DW. But the command definition is different on these
 rings. we are not sure whether we will always use the similar command.
 When some bits are enabled on the corresponding rings, we will have to
 split it again.
2. we are considering the redefinition of batch-flag especially for
 the dual VDBOX for BDW.

By clean-up, I meant something that would not cause you to have a
bunch of if()s either. You can use hooks while initializing the batch
buffer for a particular ring. Besides, there are higher order
optimizations possible in the whole intel_batchbuffer.[ch], which
forked off ancient versions that were used in other projects before.

Either way, I don't care but the second patch for behavioural consistency. :)

Thanks,
Gwenole.

 diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
 index 9dc496d..bc564d5 100644
 --- a/src/intel_batchbuffer.c
 +++ b/src/intel_batchbuffer.c
 @@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct 
 intel_batchbuffer *batch)
  }

  } else {
 -if (batch-flag == I915_EXEC_BLT) {
 -BEGIN_BLT_BATCH(batch, 4);
 -OUT_BLT_BATCH(batch, MI_FLUSH_DW);
 -OUT_BLT_BATCH(batch, 0);
 -OUT_BLT_BATCH(batch, 0);
 -OUT_BLT_BATCH(batch, 0);
 -ADVANCE_BLT_BATCH(batch);
 -}else if (batch-flag == I915_EXEC_VEBOX) {
 -BEGIN_VEB_BATCH(batch, 4);
 -OUT_VEB_BATCH(batch, MI_FLUSH_DW);
 -OUT_VEB_BATCH(batch, 0);
 -OUT_VEB_BATCH(batch, 0);
 -OUT_VEB_BATCH(batch, 0);
 -ADVANCE_VEB_BATCH(batch);
 -} else {
 -assert(batch-flag == I915_EXEC_BSD);
 -BEGIN_BCS_BATCH(batch, 4);
 -OUT_BCS_BATCH(batch, MI_FLUSH_DW | 
 MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE);
 -OUT_BCS_BATCH(batch, 0);
 -OUT_BCS_BATCH(batch, 0);
 -OUT_BCS_BATCH(batch, 0);
 -ADVANCE_BCS_BATCH(batch);
 -}
 +uint32_t cmd = MI_FLUSH_DW;
 +if (batch-flag == I915_EXEC_BSD)
 +cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;
 +
 +__BEGIN_BATCH(batch, 4, batch-flag);
 +__OUT_BATCH(batch, cmd);
 +__OUT_BATCH(batch, 0);
 +__OUT_BATCH(batch, 0);
 +__OUT_BATCH(batch, 0);
 +__ADVANCE_BATCH(batch);
  }
  } else {
  if (batch-flag == I915_EXEC_RENDER) {


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 10:27 +0200, Gwenole Beauchesne wrote:
 Hi,
 
 2014-05-14 10:10 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote:
  The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra
  dword for the high order bits of the Address field.
 
  Thanks for your patch.
 
  What is wrong if this is applied?
  The 4 dwords are still ok to Broadwell.  And the address/data field are
  not used in the command of MI_FLUSH_DW.
  So the current command is correct for Broadwell.
 
 Yes, it's still correct... until we use post-sync ops. But anyway, if
 you want to keep at the minimum similar semantics to what used to be
 used, then it's preferred to have the required provisions for handling
 that. Before, we used the defaults (QWORD immediate, if any). Now, the
 default semantics changed.
 
 (Un)fortunately, a command stream verifier won't notice that, but if
 you want to strictly implement the specs for consistency, an Address
 is always 64-bits on Broadwell. If you don't want to follow the specs,
 that's your call, just remember that in the future and update the code
 accordingly.

Thanks for your reminder.
As Haihao mentioned, currently we don't use the post-sync ops. In such
case the 4 or 5 dwords follow the spec for BDW. In fact there is no
post-sync operation, QW/DW doesn't matter. If the post-sync is
essential/required, we will take care of updating the code in future.

Thanks.
Yakui
 
 Regards,
 Gwenole.
 
  ---
   src/intel_batchbuffer.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)
 
  diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c
  index bc564d5..01e04d5 100644
  --- a/src/intel_batchbuffer.c
  +++ b/src/intel_batchbuffer.c
  @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct 
  intel_batchbuffer *batch)
   }
 
   } else {
  +const uint32_t extra_dword = !!IS_GEN8(intel-device_id);
   uint32_t cmd = MI_FLUSH_DW;
   if (batch-flag == I915_EXEC_BSD)
   cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE;
 
  -__BEGIN_BATCH(batch, 4, batch-flag);
  -__OUT_BATCH(batch, cmd);
  +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag);
  +__OUT_BATCH(batch, cmd|extra_dword);
   __OUT_BATCH(batch, 0);
  +if (extra_dword)
  +__OUT_BATCH(batch, 0);
   __OUT_BATCH(batch, 0);
   __OUT_BATCH(batch, 0);
   __ADVANCE_BATCH(batch);
 
 


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH 2/3] Convert encoding misc param type to index

2014-05-14 Thread Gwenole Beauchesne
Hi,

This is close. When I meant do the conversion once while
parsing/accumulating, this is so that to further optimize the
translation/usage process. In this patch, you still have to perform
conversions afterwards, this is suboptimal.

Here is the exact approach I had in mind:

enum {
  /* This strictly follows the base VAEncMiscParameterTypeXXX */
  VAIntelEncMiscParameterTypeFrameRate,
  VAIntelEncMiscParameterTypeRateControl,
  ...

  /* This strictly follows the Intel specific extensions, in order */
  VAIntelEncMiscParameterTypeExtBase,
  VAIntelEncMiscParameterTypeFEIFrameControl =
VAIntelEncMiscParameterTypeExtBase,

  VAIntelEncMiscParameterTypeCount
};

Your conversion function:

switch (type) {
case VAEncMiscParameterTypeFrameRate:
case VAEncMiscParameterTypeRateControl:
...
  intel_type = VAIntelEncMiscParameterTypeFrameRate + (type -
VAEncMiscParameterTypeFrameRate);
  break;
case VAEncMiscParameterTypeFEIFrameControlIntel:
...
  intel_type = VAIntelEncMiscParameterTypeFEIFrameControl + (type -
VAEncMiscParameterTypeFEIFrameControlIntel);
  break;
default:
  /* error out | fix */
  /* XXX: or generate a garbage index that will hold them. Since we
don't need them, it's fine that unused/unsupported buffers replace
older while parsing */
  break;
}

So, once you filled up the internal buffer store in context, you can
easily access them as misc_buffer[VAIntelEncMiscParameterTypeXXX]; No
more conversion. And the initial conversion function will likely be
compiler optimized too with 2 to 3 branches and a couple of subs.

Thanks,
Gwenole.

2014-05-13 11:20 GMT+02:00 Zhong Li zhong...@intel.com:
 Add function va_enc_misc_param_type_to_idx(),
 it is helpful to reduce misc_param of encode_state buffer size.

 Signed-off-by: Zhong Li zhong...@intel.com
 ---
  src/gen6_mfc_common.c |3 ++-
  src/i965_drv_video.c  |   45 ++---
  src/i965_drv_video.h  |3 +++
  3 files changed, 47 insertions(+), 4 deletions(-)

 diff --git a/src/gen6_mfc_common.c b/src/gen6_mfc_common.c
 index 33b9d55..faad699 100644
 --- a/src/gen6_mfc_common.c
 +++ b/src/gen6_mfc_common.c
 @@ -134,7 +134,8 @@ static void intel_mfc_brc_init(struct encode_state 
 *encode_state,
  {
  struct gen6_mfc_context *mfc_context = encoder_context-mfc_context;
  VAEncSequenceParameterBufferH264 *pSequenceParameter = 
 (VAEncSequenceParameterBufferH264 *)encode_state-seq_param_ext-buffer;
 -VAEncMiscParameterBuffer* pMiscParamHRD = 
 (VAEncMiscParameterBuffer*)encode_state-misc_param[VAEncMiscParameterTypeHRD]-buffer;
 +int idx = va_enc_misc_param_type_to_idx(VAEncMiscParameterTypeHRD);
 +VAEncMiscParameterBuffer* pMiscParamHRD = 
 (VAEncMiscParameterBuffer*)encode_state-misc_param[idx]-buffer;
  VAEncMiscParameterHRD* pParameterHRD = 
 (VAEncMiscParameterHRD*)pMiscParamHRD-data;
  double bitrate = pSequenceParameter-bits_per_second;
  double framerate = (double)pSequenceParameter-time_scale /(2 * 
 (double)pSequenceParameter-num_units_in_tick);
 diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
 index e505e4a..808c0d5 100755
 --- a/src/i965_drv_video.c
 +++ b/src/i965_drv_video.c
 @@ -261,6 +261,43 @@ va_enc_packed_type_to_idx(int packed_type)
  return idx;
  }

 +#define I965_ENC_MISC_PARAM_TYPE_FEI_FRAME_CONTROL_INDEX 14
 +
 +int
 +va_enc_misc_param_type_to_idx(int misc_param_type)
 +{
 +int idx = 0;
 +
 +switch (misc_param_type) {
 +case VAEncMiscParameterTypeFrameRate:
 +case VAEncMiscParameterTypeRateControl:
 +case VAEncMiscParameterTypeMaxSliceSize:
 +case VAEncMiscParameterTypeAIR:
 +case VAEncMiscParameterTypeMaxFrameSize:
 +case VAEncMiscParameterTypeHRD:
 +case VAEncMiscParameterTypeQualityLevel:
 +case VAEncMiscParameterTypeRIR:
 +case VAEncMiscParameterTypeQuantization:
 +case VAEncMiscParameterTypeSkipFrame:
 +case VAEncMiscParameterTypeROI:
 +case VAEncMiscParameterTypeCIR:
 +idx = misc_param_type;
 +break;
 +
 +case VAEncMiscParameterTypeFEIFrameControlIntel:
 +idx = I965_ENC_MISC_PARAM_TYPE_FEI_FRAME_CONTROL_INDEX;
 +break;
 +
 +default:
 +/* Should not get here */
 +ASSERT_RET(0, 0);
 +break;
 +}
 +
 +ASSERT_RET(idx  16, 0);
 +return idx;
 +}
 +
  VAStatus
  i965_QueryConfigProfiles(VADriverContextP ctx,
   VAProfile *profile_list,   /* out */
 @@ -2171,17 +2208,19 @@ 
 i965_encoder_render_misc_parameter_buffer(VADriverContextP ctx,
  {
  struct encode_state *encode = obj_context-codec_state.encode;
  VAEncMiscParameterBuffer *param = NULL;
 +int type_index;

  ASSERT_RET(obj_buffer-buffer_store-bo == NULL, 
 VA_STATUS_ERROR_INVALID_BUFFER);
  ASSERT_RET(obj_buffer-buffer_store-buffer, 
 VA_STATUS_ERROR_INVALID_BUFFER);

  param = (VAEncMiscParameterBuffer *)obj_buffer-buffer_store-buffer;
 +type_index = va_enc_misc_param_type_to_idx(param-type);

 -if 

Re: [Libva] [PATCH intel-driver 1/4] decoder: h264: don't deallocate surface storage of older frames.

2014-05-14 Thread Gwenole Beauchesne
Hi,

2014-05-12 8:17 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
 On Sun, 2014-05-11 at 23:31 -0600, Gwenole Beauchesne wrote:
 Hi,

 2014-05-12 7:21 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Sun, 2014-05-11 at 22:35 -0600, Gwenole Beauchesne wrote:
  Hi,
 
  2014-05-12 2:55 GMT+02:00 Zhao Yakui yakui.z...@intel.com:
   On Fri, 2014-05-09 at 08:34 -0600, Gwenole Beauchesne wrote:
   Drop the optimization whereby surfaces that are no longer marked as
   reference and that were already displayed are to be destroyed. This
   is wrong mainly for two reasons:
  
   1. The surface was displayed... once but it may still be needed for
  subsequent operations like displaying it again, using it for a
  transcode pipeline (encode) for instance, etc.
  
   2. The new set of ReferenceFrames[] correspond to the active set of
  reference frames used for decoding the current slice. In presence
  of Multiview Coding (MVC), that could correspond to the current
  view, in view order index, but the surface may still be needed
  for decoding the next view with the same view_id, while also
  decoding other views with another set of reference frames for them.
  
   Hi, Gwenole
  
   Thanks for working on the h264 reference frame list. And I have
   some concerns about the patch set:
   1. When will the unused reference surface be destroyed?
 
  As it ought to be: at vaDestroySurfaces() time.
 
   2. How to find one free slot in decode_state-reference_surfaces
   array to store the corresponding reference frame?
 
  It's the same algorithm. :)
  At this time, the reference_surfaces[] contents exactly matches
  ReferenceFrames[], hole for hole, if there is any. The former holds
  object_surface, the latter holds the associated VAPictureH264. i.e.
  pointer to object_surface [i] is really the reference_surface[i] that
  matches VAPictureH264 at ReferenceFrames[i]. Previously, there could
  be a mismatch between both.
 
  As you know, the current policy in the driver is based on the following
  flowchart:
  1. the driver maintains one array for the hardware-requirement.
  2. When one new frame is decoded, it will check whether the array
  element  maintained by the driver is still used by the DPB. If not, the
  internal data structure will be free and then be used for the later
  selection.
  3. Find the free slot in the array to store the reference frames in
  DPB.
 
  But I don't see how this patch set stores/updates the reference frames
  of DPB into the driver array and how to decide which slot can be
  replaced by the reference frames in new DPB.

 I don't understand your concern. It works the same as before, but
 replaces ReferenceFrames[] with reference_surface[], which now hold
 the same contents, in order. The free_slots[] array is filled in order
 too so that to avoid the extraneous iterations again to look for the
 same free slot id.

 My point is that you can describe how you find the free slot for the
 later update about the reference frame in DPB.

OK, will do but there is really zero change in the algorithm beyond
the natural and higher order optimization.

  It will be better that you can describe the keypoint in the patch
  especially when it updates the content required by the hardware.

 What contents do you think we need to maintain for the hardware? The
 hardware is designed to be stateless, so it normally doesn't keep
 anything. This is necessary for context switching performance.

 As I said in the previous email, the array of reference surface is one
 content required by the hardware.

I think there was a misunderstanding. The reference_surface[] is not
changed (yet), only the reference_objects[]. That's totally different.
The former is the frame store to maintain for the HW (and internal
info we have in the DMV buffers actually); the latter should be a 1:1
correspondence with VA surface ids in ReferenceFrames[] to the
associated object_surface objects (stored in reference_surface).

I will provide another patch set with a couple more patches:
1. Definitely fix MVC decode use cases ;
2. Optimize the process for Haswell and newer.

(3. if this really interests anyone, I may submit doc fixes and
clarifications if I find time to, but the code will be used as
reference).

Regards,
Gwenole.

src/i965_decoder_utils.c | 13 -
1 file changed, 13 deletions(-)
  
   diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c
   index 2533381..9d537ff 100644
   --- a/src/i965_decoder_utils.c
   +++ b/src/i965_decoder_utils.c
   @@ -382,19 +382,6 @@ 
   intel_update_avc_frame_store_index(VADriverContextP ctx,
  
/* remove it from the internal DPB */
if (!found) {
   -struct object_surface *obj_surface = 
   frame_store[i].obj_surface;
   -
   -obj_surface-flags = ~SURFACE_REFERENCED;
   -
   -if ((obj_surface-flags  SURFACE_ALL_MASK) == 
   SURFACE_DISPLAYED) {
   -   

Re: [Libva] (H264) Reference Picture selection

2014-05-14 Thread Sreerenj


On 14.05.2014 10:51, Xiang, Haihao wrote:

On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote:

2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:

On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote:

Hi,

2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com:

On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote:

Hi,


I have a scheme where the decoder ack's received frames - With this
information I could resync without an IDR by carefully selecting the
reference frame(s).

Will you please describe your idea in detail?

long term references would typically be employed as well, but it
seems ref-pic-marking was removed may'13?

Why do you hope to use the long-term reference? As you know, the
current encoding is based on hardware-acceleration. When more the
reference frames can be selected, the driver will be more complex and
then the encoding speed is also affected.

It's for implementing Cisco's clearpath technology
(http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa
re/clearpath/clearpath_whitepaper.pdf)
Basically one would  periodically mark a P frame as long term reference, in
case of packet loss one can thus refer to this one instead of restarting the
stream with an idr.


I'm not too concerned about the encoding speed as long as it's above
realtime (60fps 1920x1080)

Maybe it's more convenient If libva can have a switch for slice_header made 
between driver and app. Then some specially cases(ref-reorder, long-term-ref) 
may let app to manage slice-header.

Thanks for your suggestion. If the slice-header info is also generated
by the app and then be passed to the driver, it can be easy to do the
operation of re-order and long-term-ref

But this will depend on the policy how the slice header info is
generated. (by app or the driver). Now this is exported by querying the
capability of the driver. If we move this into the app, the
infrastructure will be changed a lot.

Infrastructure of what? The application? No, it doesn't really change
anything but generating the extra packed slice header.

Currently the app will query the capability of encoding and determine
which kind of header should be passed by the upper application(For
example: PPS/SPS).
If the Slice_header info is exposed, the driver will assume that it is
the responsibility of generating the slice header info. In such case it
is difficult to mix the two working modes without a lot of changes.(One
is that the driver generates the slice header info. Another is that the
app generates the slice header info).

Then the driver needs to be fixed to not assume anything. The API and
operation points look clear enough, aren't they?

The changes to the codec layer are minimal, and this was done already,
in multiple instances of codec layers, and with another driver.

Where is your code for codec layer with packed slice header/packed raw
data, I want to know the point where is packed slice header / packed raw
data inserted into the bitstream in the codec layer.
I have these patches for the codec-layer (gstreaemr-vaapi)  but not yet 
integrated to gstreamer-vaapi master.
Submitting all packed_headers in the following order  just after the 
submission of VAEncSequenceParameterBuffer to vaRenderPicture() : 
packed_sps, packed_pps, packed_slice.
And then submitting Miscparam, PictureParam and SliceParam. Is it not 
enough?


Anyway the intel driver seems to be using fixed size arrays for 
referencing packed headers (packed_header_param[4] and 
packed_header_data[4]) which needs to be changed if there are
multiple slices per frame since we have to submit packed headers for 
each slice. isn't it?








Another problem is that it is possible that one frame has multiple
slices. In such case the user should pass the multiple slice header
info.

This is not an an issue. The API mandates that the packed headers are
provided in order.




Regards,
Gwenole.



___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva



___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


--
Thanks
Sree

-
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 


This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


[Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.

2014-05-14 Thread Gwenole Beauchesne
Allow vaDeriveImage() to work with grayscale surfaces by only exposing
the luma component.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/i965_drv_video.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
index 589c00c..495ad44 100755
--- a/src/i965_drv_video.c
+++ b/src/i965_drv_video.c
@@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx,
 image-format.bits_per_pixel = 12;
 
 switch (image-format.fourcc) {
+case VA_FOURCC_Y800:
+image-num_planes = 1;
+image-pitches[0] = w_pitch; /* Y */
+image-offsets[0] = 0;
+break;
+
 case VA_FOURCC_YV12:
 image-num_planes = 3;
 image-pitches[0] = w_pitch; /* Y */
-- 
1.7.9.5

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


[Libva] [PATCH v2 intel-driver 2/8] surface: factor out release of surface buffer storage.

2014-05-14 Thread Gwenole Beauchesne
Introduce a new i965_destroy_surface_storage() helper function to
unreference the underlying GEM buffer object, and any associated
private data, if any.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/i965_decoder_utils.c |6 +-
 src/i965_drv_video.c |   14 +++---
 src/i965_drv_video.h |3 +++
 src/i965_output_dri.c|6 +-
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c
index 617bc15..fac9628 100644
--- a/src/i965_decoder_utils.c
+++ b/src/i965_decoder_utils.c
@@ -387,14 +387,10 @@ intel_update_avc_frame_store_index(VADriverContextP ctx,
 obj_surface-flags = ~SURFACE_REFERENCED;
 
 if ((obj_surface-flags  SURFACE_ALL_MASK) == SURFACE_DISPLAYED) {
-dri_bo_unreference(obj_surface-bo);
-obj_surface-bo = NULL;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
+i965_destroy_surface_storage(obj_surface);
 }
 
-if (obj_surface-free_private_data)
-obj_surface-free_private_data(obj_surface-private_data);
-
 frame_store[i].surface_id = VA_INVALID_ID;
 frame_store[i].frame_store_id = -1;
 frame_store[i].obj_surface = NULL;
diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
index 45a5639..589c00c 100755
--- a/src/i965_drv_video.c
+++ b/src/i965_drv_video.c
@@ -749,10 +749,11 @@ VAStatus i965_QueryConfigAttributes(VADriverContextP ctx,
 return vaStatus;
 }
 
-static void 
-i965_destroy_surface(struct object_heap *heap, struct object_base *obj)
+void
+i965_destroy_surface_storage(struct object_surface *obj_surface)
 {
-struct object_surface *obj_surface = (struct object_surface *)obj;
+if (!obj_surface)
+return;
 
 dri_bo_unreference(obj_surface-bo);
 obj_surface-bo = NULL;
@@ -761,7 +762,14 @@ i965_destroy_surface(struct object_heap *heap, struct 
object_base *obj)
 obj_surface-free_private_data(obj_surface-private_data);
 obj_surface-private_data = NULL;
 }
+}
+
+static void 
+i965_destroy_surface(struct object_heap *heap, struct object_base *obj)
+{
+struct object_surface *obj_surface = (struct object_surface *)obj;
 
+i965_destroy_surface_storage(obj_surface);
 object_heap_free(heap, obj);
 }
 
diff --git a/src/i965_drv_video.h b/src/i965_drv_video.h
index 535402f..8f2d288 100644
--- a/src/i965_drv_video.h
+++ b/src/i965_drv_video.h
@@ -423,4 +423,7 @@ extern VAStatus i965_DestroySurfaces(VADriverContextP ctx,
 #define I965_SURFACE_MEM_GEM_FLINK  1
 #define I965_SURFACE_MEM_DRM_PRIME  2
 
+void
+i965_destroy_surface_storage(struct object_surface *obj_surface);
+
 #endif /* _I965_DRV_VIDEO_H_ */
diff --git a/src/i965_output_dri.c b/src/i965_output_dri.c
index 717ee9a..fdd69ce 100644
--- a/src/i965_output_dri.c
+++ b/src/i965_output_dri.c
@@ -209,12 +209,8 @@ i965_put_surface_dri(
 obj_surface-flags |= SURFACE_DISPLAYED;
 
 if ((obj_surface-flags  SURFACE_ALL_MASK) == SURFACE_DISPLAYED) {
-dri_bo_unreference(obj_surface-bo);
-obj_surface-bo = NULL;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
-
-if (obj_surface-free_private_data)
-obj_surface-free_private_data(obj_surface-private_data);
+i965_destroy_surface_storage(obj_surface);
 }
 
 _i965UnlockMutex(i965-render_mutex);
-- 
1.7.9.5

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


[Libva] [PATCH v2 intel-driver 4/8] config: fix vaGetConfigAttributes() to validate profile/entrypoint.

2014-05-14 Thread Gwenole Beauchesne
Factor out code to validate profile/entrypoint per the underlying
hardware capabilities. Also fix vaGetConfigAttributes() to really
validate the profile/entrypoint pair.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/i965_drv_video.c |  145 +++---
 1 file changed, 78 insertions(+), 67 deletions(-)

diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
index 495ad44..a7b7e9b 100755
--- a/src/i965_drv_video.c
+++ b/src/i965_drv_video.c
@@ -512,6 +512,78 @@ i965_QueryConfigEntrypoints(VADriverContextP ctx,
 return n  0 ? VA_STATUS_SUCCESS : VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
 }
 
+static VAStatus
+i965_validate_config(VADriverContextP ctx, VAProfile profile,
+VAEntrypoint entrypoint)
+{
+struct i965_driver_data * const i965 = i965_driver_data(ctx);
+VAStatus va_status;
+
+/* Validate profile  entrypoint */
+switch (profile) {
+case VAProfileMPEG2Simple:
+case VAProfileMPEG2Main:
+if ((HAS_MPEG2_DECODING(i965)  entrypoint == VAEntrypointVLD) ||
+(HAS_MPEG2_ENCODING(i965)  entrypoint == VAEntrypointEncSlice)) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+case VAProfileH264ConstrainedBaseline:
+case VAProfileH264Main:
+case VAProfileH264High:
+if ((HAS_H264_DECODING(i965)  entrypoint == VAEntrypointVLD) ||
+(HAS_H264_ENCODING(i965)  entrypoint == VAEntrypointEncSlice)) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+case VAProfileVC1Simple:
+case VAProfileVC1Main:
+case VAProfileVC1Advanced:
+if (HAS_VC1_DECODING(i965)  entrypoint == VAEntrypointVLD) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+case VAProfileNone:
+if (HAS_VPP(i965)  VAEntrypointVideoProc == entrypoint) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+case VAProfileJPEGBaseline:
+if (HAS_JPEG_DECODING(i965)  entrypoint == VAEntrypointVLD) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+case VAProfileVP8Version0_3:
+if ((HAS_VP8_DECODING(i965)  entrypoint == VAEntrypointVLD) ||
+(HAS_VP8_ENCODING(i965)  entrypoint == VAEntrypointEncSlice)) {
+va_status = VA_STATUS_SUCCESS;
+} else {
+va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+}
+break;
+
+default:
+va_status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+break;
+}
+return va_status;
+}
+
 VAStatus 
 i965_GetConfigAttributes(VADriverContextP ctx,
  VAProfile profile,
@@ -519,8 +591,13 @@ i965_GetConfigAttributes(VADriverContextP ctx,
  VAConfigAttrib *attrib_list,  /* in/out */
  int num_attribs)
 {
+VAStatus va_status;
 int i;
 
+va_status = i965_validate_config(ctx, profile, entrypoint);
+if (va_status != VA_STATUS_SUCCESS)
+return va_status;
+
 /* Other attributes don't seem to be defined */
 /* What to do if we don't know the attribute? */
 for (i = 0; i  num_attribs; i++) {
@@ -606,73 +683,7 @@ i965_CreateConfig(VADriverContextP ctx,
 int i;
 VAStatus vaStatus;
 
-/* Validate profile  entrypoint */
-switch (profile) {
-case VAProfileMPEG2Simple:
-case VAProfileMPEG2Main:
-if ((HAS_MPEG2_DECODING(i965)  VAEntrypointVLD == entrypoint) ||
-(HAS_MPEG2_ENCODING(i965)  VAEntrypointEncSlice == entrypoint)) {
-vaStatus = VA_STATUS_SUCCESS;
-} else {
-vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
-}
-break;
-
-case VAProfileH264ConstrainedBaseline:
-case VAProfileH264Main:
-case VAProfileH264High:
-if ((HAS_H264_DECODING(i965)  VAEntrypointVLD == entrypoint) ||
-(HAS_H264_ENCODING(i965)  VAEntrypointEncSlice == entrypoint)) {
-vaStatus = VA_STATUS_SUCCESS;
-} else {
-vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
-}
-
-break;
-
-case VAProfileVC1Simple:
-case VAProfileVC1Main:
-case VAProfileVC1Advanced:
-if (HAS_VC1_DECODING(i965)  VAEntrypointVLD == entrypoint) {
-vaStatus = VA_STATUS_SUCCESS;
-} else {
-vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
-}
-
-break;
-
-case VAProfileNone:
-if (HAS_VPP(i965)  VAEntrypointVideoProc == entrypoint) {
-vaStatus = 

[Libva] [PATCH v2 intel-driver 5/8] config: fix vaCreateConfig() to not override user chroma format.

2014-05-14 Thread Gwenole Beauchesne
Only validate the user-defined chroma format (VAConfigAttribRTFormat)
attribute, if any. Don't override it. i.e. append a pre-defined value
only if it was not defined by the user beforehand.

Propertly return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT if the supplied
chroma format is not supported.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/i965_drv_video.c |   90 --
 1 file changed, 66 insertions(+), 24 deletions(-)

diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
index a7b7e9b..e60760f 100755
--- a/src/i965_drv_video.c
+++ b/src/i965_drv_video.c
@@ -584,6 +584,20 @@ i965_validate_config(VADriverContextP ctx, VAProfile 
profile,
 return va_status;
 }
 
+static uint32_t
+i965_get_default_chroma_formats(VADriverContextP ctx, VAProfile profile,
+VAEntrypoint entrypoint)
+{
+struct i965_driver_data * const i965 = i965_driver_data(ctx);
+uint32_t chroma_formats = VA_RT_FORMAT_YUV420;
+
+switch (profile) {
+default:
+break;
+}
+return chroma_formats;
+}
+
 VAStatus 
 i965_GetConfigAttributes(VADriverContextP ctx,
  VAProfile profile,
@@ -603,7 +617,8 @@ i965_GetConfigAttributes(VADriverContextP ctx,
 for (i = 0; i  num_attribs; i++) {
 switch (attrib_list[i].type) {
 case VAConfigAttribRTFormat:
-attrib_list[i].value = VA_RT_FORMAT_YUV420;
+attrib_list[i].value = i965_get_default_chroma_formats(ctx,
+profile, entrypoint);
 break;
 
 case VAConfigAttribRateControl:
@@ -644,29 +659,49 @@ i965_destroy_config(struct object_heap *heap, struct 
object_base *obj)
 object_heap_free(heap, obj);
 }
 
-static VAStatus 
-i965_update_attribute(struct object_config *obj_config, VAConfigAttrib *attrib)
+static VAConfigAttrib *
+i965_lookup_config_attribute(struct object_config *obj_config,
+VAConfigAttribType type)
 {
 int i;
 
-/* Check existing attrbiutes */
 for (i = 0; i  obj_config-num_attribs; i++) {
-if (obj_config-attrib_list[i].type == attrib-type) {
-/* Update existing attribute */
-obj_config-attrib_list[i].value = attrib-value;
-return VA_STATUS_SUCCESS;
-}
+VAConfigAttrib * const attrib = obj_config-attrib_list[i];
+if (attrib-type == type)
+return attrib;
 }
+return NULL;
+}
 
-if (obj_config-num_attribs  I965_MAX_CONFIG_ATTRIBUTES) {
-i = obj_config-num_attribs;
-obj_config-attrib_list[i].type = attrib-type;
-obj_config-attrib_list[i].value = attrib-value;
-obj_config-num_attribs++;
+static VAStatus
+i965_append_config_attribute(struct object_config *obj_config,
+const VAConfigAttrib *new_attrib)
+{
+VAConfigAttrib *attrib;
+
+if (obj_config-num_attribs = I965_MAX_CONFIG_ATTRIBUTES)
+return VA_STATUS_ERROR_MAX_NUM_EXCEEDED;
+
+attrib = obj_config-attrib_list[obj_config-num_attribs++];
+attrib-type = new_attrib-type;
+attrib-value = new_attrib-value;
+return VA_STATUS_SUCCESS;
+}
+
+static VAStatus
+i965_ensure_config_attribute(struct object_config *obj_config,
+const VAConfigAttrib *new_attrib)
+{
+VAConfigAttrib *attrib;
+
+/* Check for existing attributes */
+attrib = i965_lookup_config_attribute(obj_config, new_attrib-type);
+if (attrib) {
+/* Update existing attribute */
+attrib-value = new_attrib-value;
 return VA_STATUS_SUCCESS;
 }
-
-return VA_STATUS_ERROR_MAX_NUM_EXCEEDED;
+return i965_append_config_attribute(obj_config, new_attrib);
 }
 
 VAStatus 
@@ -698,16 +733,23 @@ i965_CreateConfig(VADriverContextP ctx,
 
 obj_config-profile = profile;
 obj_config-entrypoint = entrypoint;
-obj_config-attrib_list[0].type = VAConfigAttribRTFormat;
-obj_config-attrib_list[0].value = VA_RT_FORMAT_YUV420;
-obj_config-num_attribs = 1;
+obj_config-num_attribs = 0;
 
-for(i = 0; i  num_attribs; i++) {
-vaStatus = i965_update_attribute(obj_config, (attrib_list[i]));
-
-if (VA_STATUS_SUCCESS != vaStatus) {
+for (i = 0; i  num_attribs; i++) {
+vaStatus = i965_ensure_config_attribute(obj_config, attrib_list[i]);
+if (vaStatus != VA_STATUS_SUCCESS)
 break;
-}
+}
+
+if (vaStatus == VA_STATUS_SUCCESS) {
+VAConfigAttrib attrib, *attrib_found;
+attrib.type = VAConfigAttribRTFormat;
+attrib.value = i965_get_default_chroma_formats(ctx, profile, 
entrypoint);
+attrib_found = i965_lookup_config_attribute(obj_config, attrib.type);
+if (!attrib_found || !attrib_found-value)
+vaStatus = i965_append_config_attribute(obj_config, attrib);
+else if (!(attrib_found-value  attrib.value))
+vaStatus = VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT;
 }
 
 /* Error recovery */
-- 
1.7.9.5


[Libva] [PATCH v2 intel-driver 7/8] decoder: h264: factor out allocation of reconstructed surfaces.

2014-05-14 Thread Gwenole Beauchesne
Add new avc_ensure_surface_bo() helper function to factor out the
allocatiion and initialization processes of the reconstructed VA
surface buffer stores.

Keep preferred native format (NV12) and initialize chroma values
to 0.0 (0x80) when needed for fake grayscale (Y800) surfaces
implemented on top of existing NV12.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/gen6_mfd.c   |   12 +--
 src/gen75_mfd.c  |   12 +--
 src/gen7_mfd.c   |   12 +--
 src/gen8_mfd.c   |   12 +--
 src/i965_decoder_utils.c |   54 +-
 src/i965_decoder_utils.h |8 +++
 6 files changed, 65 insertions(+), 45 deletions(-)

diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c
index 4a22052..2092f69 100755
--- a/src/gen6_mfd.c
+++ b/src/gen6_mfd.c
@@ -840,18 +840,8 @@ gen6_mfd_avc_decode_init(VADriverContextP ctx,
 obj_surface = decode_state-render_object;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
 obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? 
SURFACE_REFERENCED : 0);
-i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, 
SUBSAMPLE_YUV420);
-
-/* initial uv component for YUV400 case */
-if (pic_param-seq_fields.bits.chroma_format_idc == 0) {
- unsigned int uv_offset = obj_surface-width * obj_surface-height; 
- unsigned int uv_size   = obj_surface-width * obj_surface-height / 
2; 
-
- drm_intel_gem_bo_map_gtt(obj_surface-bo);
- memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size);
- drm_intel_gem_bo_unmap_gtt(obj_surface-bo);
-}
 
+avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param);
 gen6_mfd_init_avc_surface(ctx, pic_param, obj_surface);
 
 dri_bo_unreference(gen6_mfd_context-post_deblocking_output.bo);
diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c
index 2d4e236..5b023cf 100644
--- a/src/gen75_mfd.c
+++ b/src/gen75_mfd.c
@@ -1084,18 +1084,8 @@ gen75_mfd_avc_decode_init(VADriverContextP ctx,
 obj_surface = decode_state-render_object;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
 obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? 
SURFACE_REFERENCED : 0);
-i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, 
SUBSAMPLE_YUV420);
-
-/* initial uv component for YUV400 case */
-if (pic_param-seq_fields.bits.chroma_format_idc == 0) {
- unsigned int uv_offset = obj_surface-width * obj_surface-height; 
- unsigned int uv_size   = obj_surface-width * obj_surface-height / 
2; 
-
- drm_intel_gem_bo_map_gtt(obj_surface-bo);
- memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size);
- drm_intel_gem_bo_unmap_gtt(obj_surface-bo);
-}
 
+avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param);
 gen75_mfd_init_avc_surface(ctx, pic_param, obj_surface);
 
 dri_bo_unreference(gen7_mfd_context-post_deblocking_output.bo);
diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c
index e91cfd3..70b1cec 100755
--- a/src/gen7_mfd.c
+++ b/src/gen7_mfd.c
@@ -758,18 +758,8 @@ gen7_mfd_avc_decode_init(VADriverContextP ctx,
 obj_surface = decode_state-render_object;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
 obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? 
SURFACE_REFERENCED : 0);
-i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, 
SUBSAMPLE_YUV420);
-
-/* initial uv component for YUV400 case */
-if (pic_param-seq_fields.bits.chroma_format_idc == 0) {
- unsigned int uv_offset = obj_surface-width * obj_surface-height;
- unsigned int uv_size   = obj_surface-width * obj_surface-height / 2;
-
- drm_intel_gem_bo_map_gtt(obj_surface-bo);
- memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size);
- drm_intel_gem_bo_unmap_gtt(obj_surface-bo);
-}
 
+avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param);
 gen7_mfd_init_avc_surface(ctx, pic_param, obj_surface);
 
 dri_bo_unreference(gen7_mfd_context-post_deblocking_output.bo);
diff --git a/src/gen8_mfd.c b/src/gen8_mfd.c
index 1742bea..e3e71fb 100644
--- a/src/gen8_mfd.c
+++ b/src/gen8_mfd.c
@@ -845,18 +845,8 @@ gen8_mfd_avc_decode_init(VADriverContextP ctx,
 obj_surface = decode_state-render_object;
 obj_surface-flags = ~SURFACE_REF_DIS_MASK;
 obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? 
SURFACE_REFERENCED : 0);
-i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, 
SUBSAMPLE_YUV420);
-
-/* initial uv component for YUV400 case */
-if (pic_param-seq_fields.bits.chroma_format_idc == 0) {
- unsigned int uv_offset = obj_surface-width * obj_surface-height; 
- unsigned int uv_size   = obj_surface-width * obj_surface-height / 
2; 
-
- drm_intel_gem_bo_map_gtt(obj_surface-bo);
- memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size);
- 

[Libva] [PATCH v2 intel-driver 6/8] config: fix supported set of chroma formats for JPEG decode.

2014-05-14 Thread Gwenole Beauchesne
If the hardware supports JPEG decoding, then we have to expose the
right set of chroma formats for the output (decoded) VA surface. In
particular, we could support YUV 4:1:0, 4:2:2 and 4:4:4.

Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
---
 src/i965_drv_video.c |   15 +++
 src/i965_drv_video.h |2 ++
 2 files changed, 17 insertions(+)

diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
index e60760f..768469a 100755
--- a/src/i965_drv_video.c
+++ b/src/i965_drv_video.c
@@ -214,6 +214,10 @@ get_subpic_format(const VAImageFormat *va_format)
 return NULL;
 }
 
+/* Extra set of chroma formats supported for JPEG decoding (beyond YUV 4:2:0) 
*/
+#define EXTRA_JPEG_DEC_CHROMA_FORMATS \
+(VA_RT_FORMAT_YUV411 | VA_RT_FORMAT_YUV422 | VA_RT_FORMAT_YUV444)
+
 extern struct hw_context *i965_proc_context_init(VADriverContextP, struct 
object_config *);
 extern struct hw_context *g4x_dec_hw_context_init(VADriverContextP, struct 
object_config *);
 static struct hw_codec_info g4x_hw_codec_info = {
@@ -278,6 +282,8 @@ static struct hw_codec_info gen7_hw_codec_info = {
 .max_width = 4096,
 .max_height = 4096,
 
+.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS,
+
 .has_mpeg2_decoding = 1,
 .has_mpeg2_encoding = 1,
 .has_h264_decoding = 1,
@@ -305,6 +311,8 @@ static struct hw_codec_info gen75_hw_codec_info = {
 .max_width = 4096,
 .max_height = 4096,
 
+.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS,
+
 .has_mpeg2_decoding = 1,
 .has_mpeg2_encoding = 1,
 .has_h264_decoding = 1,
@@ -336,6 +344,8 @@ static struct hw_codec_info gen8_hw_codec_info = {
 .max_width = 4096,
 .max_height = 4096,
 
+.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS,
+
 .has_mpeg2_decoding = 1,
 .has_mpeg2_encoding = 1,
 .has_h264_decoding = 1,
@@ -592,6 +602,11 @@ i965_get_default_chroma_formats(VADriverContextP ctx, 
VAProfile profile,
 uint32_t chroma_formats = VA_RT_FORMAT_YUV420;
 
 switch (profile) {
+case VAProfileJPEGBaseline:
+if (HAS_JPEG_DECODING(i965)  entrypoint == VAEntrypointVLD)
+chroma_formats |= i965-codec_info-jpeg_dec_chroma_formats;
+break;
+
 default:
 break;
 }
diff --git a/src/i965_drv_video.h b/src/i965_drv_video.h
index 8f2d288..e70852b 100644
--- a/src/i965_drv_video.h
+++ b/src/i965_drv_video.h
@@ -289,6 +289,8 @@ struct hw_codec_info
 int max_width;
 int max_height;
 
+unsigned int jpeg_dec_chroma_formats;
+
 unsigned int has_mpeg2_decoding:1;
 unsigned int has_mpeg2_encoding:1;
 unsigned int has_h264_decoding:1;
-- 
1.7.9.5

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


[Libva] [PATCH v2 intel-driver 0/8] Fix support for grayscale

2014-05-14 Thread Gwenole Beauchesne
Hi,

This patch series fixes and optimizes support for H.grayscale streams
encoded in H.264 (Patch8). This is backwards compatible with solutions
that incorrectly request for YUV 4:2:0 formats instead of the
appropriate Y800 one.

Tested with gstreamer-vaapi and FFmpeg/vaapi on Ivybridge. No
regression.

v2: split into smaller patches, while maintaining regression
testability, drop dead code in the Y800 rendering code path.

Regards,
Gwenole Beauchesne (8):
  surface: fix geometry (size, layout) of grayscale surfaces.
  surface: factor out release of surface buffer storage.
  surface: fix vaDeriveImage() for grayscale.
  config: fix vaGetConfigAttributes() to validate profile/entrypoint.
  config: fix vaCreateConfig() to not override user chroma format.
  config: fix supported set of chroma formats for JPEG decode.
  decoder: h264: factor out allocation of reconstructed surfaces.
  decoder: h264: optimize support for grayscale surfaces.

 src/gen6_mfd.c   |   20 ++-
 src/gen75_mfd.c  |   18 +--
 src/gen7_mfd.c   |   18 +--
 src/gen8_mfd.c   |   18 +--
 src/gen8_render.c|3 +
 src/i965_decoder_utils.c |   75 +++-
 src/i965_decoder_utils.h |8 ++
 src/i965_drv_video.c |  301 +++---
 src/i965_drv_video.h |   14 +++
 src/i965_output_dri.c|6 +-
 src/i965_render.c|3 +
 11 files changed, 326 insertions(+), 158 deletions(-)

-- 
1.7.9.5

___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
 Allow vaDeriveImage() to work with grayscale surfaces by only exposing
 the luma component.
 

This is not necessary as the DeriveImage already supports the Y800
fourcc.

 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/i965_drv_video.c |6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
 index 589c00c..495ad44 100755
 --- a/src/i965_drv_video.c
 +++ b/src/i965_drv_video.c
 @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx,
  image-format.bits_per_pixel = 12;
  
  switch (image-format.fourcc) {
 +case VA_FOURCC_Y800:
 +image-num_planes = 1;
 +image-pitches[0] = w_pitch; /* Y */
 +image-offsets[0] = 0;
 +break;
 +
  case VA_FOURCC_YV12:
  image-num_planes = 3;
  image-pitches[0] = w_pitch; /* Y */


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
 Optimize support for grayscale surfaces in two aspects: (i) space
 by only allocating the luminance component ; (ii) speed by avoiding
 initialization of the (now inexistent) chrominance planes.
 
 Keep backward compatibility with older codec layers that only
 supported YUV 4:2:0 and not grayscale formats properly.

As a whole, I am OK to this version patch except two concerns.

 
 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/gen6_mfd.c   |8 ++--
  src/gen75_mfd.c  |6 +-
  src/gen7_mfd.c   |6 +-
  src/gen8_mfd.c   |6 +-
  src/i965_decoder_utils.c |   23 +++
  src/i965_drv_video.c |   22 ++
  src/i965_drv_video.h |9 +
  7 files changed, 71 insertions(+), 9 deletions(-)
 
 diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c
 index 2092f69..f925d98 100755
 --- a/src/gen6_mfd.c
 +++ b/src/gen6_mfd.c
 @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx,
  {
  struct intel_batchbuffer *batch = gen6_mfd_context-base.batch;
  struct object_surface *obj_surface = decode_state-render_object;
 -
 +unsigned int surface_format;
 +
 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +

Can it work if it still set the PLANAR_420_8 format for the Y800
surface?
If yes, I suggest that this is not updated as the original code follows
the spec. 

  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  19) |
((obj_surface-orig_width - 1)  6));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
(1  27) | /* must be 1 for interleave U/V, hardware 
 requirement */
(0  22) | /* surface object control state, FIXME??? */
((obj_surface-width - 1)  3) | /* pitch */
 diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c
 index 5b023cf..895b194 100644
 --- a/src/gen75_mfd.c
 +++ b/src/gen75_mfd.c
 @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx,
  struct object_surface *obj_surface = decode_state-render_object;
  unsigned int y_cb_offset;
  unsigned int y_cr_offset;
 +unsigned int surface_format;
  
  assert(obj_surface);
  
  y_cb_offset = obj_surface-y_cb_offset;
  y_cr_offset = obj_surface-y_cr_offset;
  
 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +
  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  18) |
((obj_surface-orig_width - 1)  4));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
((standard_select != MFX_FORMAT_JPEG)  27) | /* 
 interleave chroma, set to 0 for JPEG */
(0  22) | /* surface object control state, ignored */
((obj_surface-width - 1)  3) | /* pitch */
 diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c
 index 70b1cec..2e0d653 100755
 --- a/src/gen7_mfd.c
 +++ b/src/gen7_mfd.c
 @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx,
  struct object_surface *obj_surface = decode_state-render_object;
  unsigned int y_cb_offset;
  unsigned int y_cr_offset;
 +unsigned int surface_format;
  
  assert(obj_surface);
  
  y_cb_offset = obj_surface-y_cb_offset;
  y_cr_offset = obj_surface-y_cr_offset;
  
 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +
  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -148,7 +152,7 @@ gen7_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  18) |
((obj_surface-orig_width - 1)  4));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
((standard_select != MFX_FORMAT_JPEG)  27) | /* 
 interleave chroma, set to 0 for JPEG */
(0  22) | /* surface object control state, ignored */
((obj_surface-width - 1)  3) | /* pitch */
 diff --git a/src/gen8_mfd.c 

Re: [Libva] (H264) Reference Picture selection

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 03:50 -0600, Sreerenj wrote:
 On 14.05.2014 10:51, Xiang, Haihao wrote:
  On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote:
  2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote:
  Hi,
 
  2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com:
  On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote:
  Hi,
 
  I have a scheme where the decoder ack's received frames - With this
  information I could resync without an IDR by carefully selecting the
  reference frame(s).
  Will you please describe your idea in detail?
  long term references would typically be employed as well, but it
  seems ref-pic-marking was removed may'13?
  Why do you hope to use the long-term reference? As you know, the
  current encoding is based on hardware-acceleration. When more the
  reference frames can be selected, the driver will be more complex and
  then the encoding speed is also affected.
  It's for implementing Cisco's clearpath technology
  (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa
  re/clearpath/clearpath_whitepaper.pdf)
  Basically one would  periodically mark a P frame as long term 
  reference, in
  case of packet loss one can thus refer to this one instead of 
  restarting the
  stream with an idr.
 
 
  I'm not too concerned about the encoding speed as long as it's above
  realtime (60fps 1920x1080)
  Maybe it's more convenient If libva can have a switch for slice_header 
  made between driver and app. Then some specially cases(ref-reorder, 
  long-term-ref) may let app to manage slice-header.
  Thanks for your suggestion. If the slice-header info is also generated
  by the app and then be passed to the driver, it can be easy to do the
  operation of re-order and long-term-ref
 
  But this will depend on the policy how the slice header info is
  generated. (by app or the driver). Now this is exported by querying the
  capability of the driver. If we move this into the app, the
  infrastructure will be changed a lot.
  Infrastructure of what? The application? No, it doesn't really change
  anything but generating the extra packed slice header.
  Currently the app will query the capability of encoding and determine
  which kind of header should be passed by the upper application(For
  example: PPS/SPS).
  If the Slice_header info is exposed, the driver will assume that it is
  the responsibility of generating the slice header info. In such case it
  is difficult to mix the two working modes without a lot of changes.(One
  is that the driver generates the slice header info. Another is that the
  app generates the slice header info).
  Then the driver needs to be fixed to not assume anything. The API and
  operation points look clear enough, aren't they?
 
  The changes to the codec layer are minimal, and this was done already,
  in multiple instances of codec layers, and with another driver.
  Where is your code for codec layer with packed slice header/packed raw
  data, I want to know the point where is packed slice header / packed raw
  data inserted into the bitstream in the codec layer.
 I have these patches for the codec-layer (gstreaemr-vaapi)  but not yet 
 integrated to gstreamer-vaapi master.
 Submitting all packed_headers in the following order  just after the 
 submission of VAEncSequenceParameterBuffer to vaRenderPicture() : 
 packed_sps, packed_pps, packed_slice.
 And then submitting Miscparam, PictureParam and SliceParam. Is it not 
 enough?

It is enough under the single slice. But it will still change where the
slice header is generated.

The current design is based on the following points:
   1. The driver exposes the SPS/PPS/MISC and the user-space app is
responsible for generating the corresponding packed ata.
   2. The driver is responsible for generating the slice header. 

If the generation of slice header is moved from the driver to the
user-space app, the driver needs a lot of changes and some working cases
will be affected. Another is  the complexity of handling the generation
of slice_headers in multislice. 

 
 Anyway the intel driver seems to be using fixed size arrays for 
 referencing packed headers (packed_header_param[4] and 
 packed_header_data[4]) which needs to be changed if there are
 multiple slices per frame since we have to submit packed headers for 
 each slice. isn't it?
 

Yes. If more packed data is passed, the driver needs to be changed. 

 
 
 
  Another problem is that it is possible that one frame has multiple
  slices. In such case the user should pass the multiple slice header
  info.
  This is not an an issue. The API mandates that the packed headers are
  provided in order.
 
 
  Regards,
  Gwenole.
 
  ___
  Libva mailing list
  Libva@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/libva
 
 
  ___
  Libva mailing list
  

Re: [Libva] [PATCH libva-intel-driver master 2/2] Return error when trying to decoding an interlaced VC-1 video

2014-05-14 Thread Xiang, Haihao
On Fri, 2014-05-09 at 12:20 +0200, Gwenole Beauchesne wrote: 
 Hi,
 
 2014-05-05 7:00 GMT+02:00 Xiang, Haihao haihao.xi...@intel.com:
  From: Xiang, Haihao haihao.xi...@intel.com
 
  https://bugs.freedesktop.org/show_bug.cgi?id=77386
 
  Signed-off-by: Xiang, Haihao haihao.xi...@intel.com
  ---
   src/i965_decoder_utils.c | 7 ++-
   1 file changed, 6 insertions(+), 1 deletion(-)
 
  diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c
  index 2533381..617bc15 100644
  --- a/src/i965_decoder_utils.c
  +++ b/src/i965_decoder_utils.c
  @@ -654,7 +654,12 @@ intel_decoder_check_vc1_parameter(VADriverContextP ctx,
   VAPictureParameterBufferVC1 *pic_param = (VAPictureParameterBufferVC1 
  *)decode_state-pic_param-buffer;
   struct object_surface *obj_surface;
   int i = 0;
  -
  +
  +if (pic_param-sequence_fields.bits.interlace == 1 
  +pic_param-picture_fields.bits.frame_coding_mode != 0) { /* 
  frame-interlace or field-interlace */
  +return VA_STATUS_ERROR_DECODING_ERROR;
  +}
  +
   if (pic_param-picture_fields.bits.picture_type == 0 ||
   pic_param-picture_fields.bits.picture_type == 3) {
   } else if (pic_param-picture_fields.bits.picture_type == 1 ||
  --
 
 In general, returning VA_STATUS_ERROR_DECODING_ERROR means returning a
 valid error for vaQuerySurfaceError() as a next step. If you want to
 return this type of error, you have to support this interface as well.

It is my fault, I didn't go through va.h for
VA_STATUS_ERROR_DECODING_ERROR semantics when I tried to find an
appropriate error code.  

 
 Maybe FLAG_NOT_SUPPORTED would be a more appropriate error, or at
 least the catchall Unknown one?

I also considered VA_STATUS_ERROR_FLAG_NOT_SUPPORTED or
VA_STATUS_ERROR_UNKNOWN. I think FLAG_NOT_SUPPORTED means driver doesn't
support a pre-defined flag in VAAPI which is inappropriate for this
error. I will replace the returned error with VA_STATUS_ERROR_UNKNOWN.
Another option is we may define some new error codes for those
unsupported features in {pic, slice} parameters. 

Thanks
Haihao


 i.e. return something else, unless you
 implement vaQuerySurfaceError() interface, which also means extending
 the API to notify that some of the supplied {pic, slice} param fields
 are not supported.


 
 Thanks,
 Gwenole.


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.

2014-05-14 Thread Gwenole Beauchesne
2014-05-15 2:35 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
 On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
 Allow vaDeriveImage() to work with grayscale surfaces by only exposing
 the luma component.


 This is not necessary as the DeriveImage already supports the Y800
 fourcc.

What? Where? I don't call already supports as returning an error.


 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/i965_drv_video.c |6 ++
  1 file changed, 6 insertions(+)

 diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
 index 589c00c..495ad44 100755
 --- a/src/i965_drv_video.c
 +++ b/src/i965_drv_video.c
 @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx,
  image-format.bits_per_pixel = 12;

  switch (image-format.fourcc) {
 +case VA_FOURCC_Y800:
 +image-num_planes = 1;
 +image-pitches[0] = w_pitch; /* Y */
 +image-offsets[0] = 0;
 +break;
 +
  case VA_FOURCC_YV12:
  image-num_planes = 3;
  image-pitches[0] = w_pitch; /* Y */


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.

2014-05-14 Thread Gwenole Beauchesne
Hi,

2014-05-15 3:34 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
 On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
 Optimize support for grayscale surfaces in two aspects: (i) space
 by only allocating the luminance component ; (ii) speed by avoiding
 initialization of the (now inexistent) chrominance planes.

 Keep backward compatibility with older codec layers that only
 supported YUV 4:2:0 and not grayscale formats properly.

 As a whole, I am OK to this version patch except two concerns.


 Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
 ---
  src/gen6_mfd.c   |8 ++--
  src/gen75_mfd.c  |6 +-
  src/gen7_mfd.c   |6 +-
  src/gen8_mfd.c   |6 +-
  src/i965_decoder_utils.c |   23 +++
  src/i965_drv_video.c |   22 ++
  src/i965_drv_video.h |9 +
  7 files changed, 71 insertions(+), 9 deletions(-)

 diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c
 index 2092f69..f925d98 100755
 --- a/src/gen6_mfd.c
 +++ b/src/gen6_mfd.c
 @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx,
  {
  struct intel_batchbuffer *batch = gen6_mfd_context-base.batch;
  struct object_surface *obj_surface = decode_state-render_object;
 -
 +unsigned int surface_format;
 +
 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +

 Can it work if it still set the PLANAR_420_8 format for the Y800
 surface?

No, MONO is the only specified and supported format to tell the MFX
engine to disregard the chroma components. It does not seem to care of
the supplied chroma_format_idc field to AVC_IMG_STATE.

  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  19) |
((obj_surface-orig_width - 1)  6));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
(1  27) | /* must be 1 for interleave U/V, hardware 
 requirement */
(0  22) | /* surface object control state, FIXME??? */
((obj_surface-width - 1)  3) | /* pitch */
 diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c
 index 5b023cf..895b194 100644
 --- a/src/gen75_mfd.c
 +++ b/src/gen75_mfd.c
 @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx,
  struct object_surface *obj_surface = decode_state-render_object;
  unsigned int y_cb_offset;
  unsigned int y_cr_offset;
 +unsigned int surface_format;

  assert(obj_surface);

  y_cb_offset = obj_surface-y_cb_offset;
  y_cr_offset = obj_surface-y_cr_offset;

 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +
  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  18) |
((obj_surface-orig_width - 1)  4));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
((standard_select != MFX_FORMAT_JPEG)  27) | /* 
 interleave chroma, set to 0 for JPEG */
(0  22) | /* surface object control state, ignored */
((obj_surface-width - 1)  3) | /* pitch */
 diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c
 index 70b1cec..2e0d653 100755
 --- a/src/gen7_mfd.c
 +++ b/src/gen7_mfd.c
 @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx,
  struct object_surface *obj_surface = decode_state-render_object;
  unsigned int y_cb_offset;
  unsigned int y_cr_offset;
 +unsigned int surface_format;

  assert(obj_surface);

  y_cb_offset = obj_surface-y_cb_offset;
  y_cr_offset = obj_surface-y_cr_offset;

 +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
 +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
 +
  BEGIN_BCS_BATCH(batch, 6);
  OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
  OUT_BCS_BATCH(batch, 0);
 @@ -148,7 +152,7 @@ gen7_mfd_surface_state(VADriverContextP ctx,
((obj_surface-orig_height - 1)  18) |
((obj_surface-orig_width - 1)  4));
  OUT_BCS_BATCH(batch,
 -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
 surface */
 +  (surface_format  28) | /* 420 planar YUV surface */
((standard_select != MFX_FORMAT_JPEG)  27) | /* 
 interleave chroma, set to 0 for JPEG */
 

Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 22:14 -0600, Gwenole Beauchesne wrote:
 2014-05-15 2:35 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
  Allow vaDeriveImage() to work with grayscale surfaces by only exposing
  the luma component.
 
 
  This is not necessary as the DeriveImage already supports the Y800
  fourcc.
 
 What? Where? I don't call already supports as returning an error.
 

This is added in the following commit:

commit 9f9c505ed5212ae0704f71f45532b9716ac0bd51
Author: Zhong Li zhong...@intel.com
Date:   Mon Apr 14 02:17:42 2014 -0600

i965_DeriveImage() support JPEG color formats

Signed-off-by: Zhong Li zhong...@intel.com

Thanks.
Yakui

 
  Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
  ---
   src/i965_drv_video.c |6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c
  index 589c00c..495ad44 100755
  --- a/src/i965_drv_video.c
  +++ b/src/i965_drv_video.c
  @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx,
   image-format.bits_per_pixel = 12;
 
   switch (image-format.fourcc) {
  +case VA_FOURCC_Y800:
  +image-num_planes = 1;
  +image-pitches[0] = w_pitch; /* Y */
  +image-offsets[0] = 0;
  +break;
  +
   case VA_FOURCC_YV12:
   image-num_planes = 3;
   image-pitches[0] = w_pitch; /* Y */
 
 


___
Libva mailing list
Libva@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libva


Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.

2014-05-14 Thread Zhao, Yakui
On Wed, 2014-05-14 at 22:28 -0600, Gwenole Beauchesne wrote:
 Hi,
 
 2014-05-15 3:34 GMT+02:00 Zhao, Yakui yakui.z...@intel.com:
  On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote:
  Optimize support for grayscale surfaces in two aspects: (i) space
  by only allocating the luminance component ; (ii) speed by avoiding
  initialization of the (now inexistent) chrominance planes.
 
  Keep backward compatibility with older codec layers that only
  supported YUV 4:2:0 and not grayscale formats properly.
 
  As a whole, I am OK to this version patch except two concerns.
 
 
  Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com
  ---
   src/gen6_mfd.c   |8 ++--
   src/gen75_mfd.c  |6 +-
   src/gen7_mfd.c   |6 +-
   src/gen8_mfd.c   |6 +-
   src/i965_decoder_utils.c |   23 +++
   src/i965_drv_video.c |   22 ++
   src/i965_drv_video.h |9 +
   7 files changed, 71 insertions(+), 9 deletions(-)
 
  diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c
  index 2092f69..f925d98 100755
  --- a/src/gen6_mfd.c
  +++ b/src/gen6_mfd.c
  @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx,
   {
   struct intel_batchbuffer *batch = gen6_mfd_context-base.batch;
   struct object_surface *obj_surface = decode_state-render_object;
  -
  +unsigned int surface_format;
  +
  +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
  +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
  +
 
  Can it work if it still set the PLANAR_420_8 format for the Y800
  surface?
 
 No, MONO is the only specified and supported format to tell the MFX
 engine to disregard the chroma components. It does not seem to care of
 the supplied chroma_format_idc field to AVC_IMG_STATE.

The following is what I got from the spec.
  For video codec, it should  set to 4 always

In such case the the chroma_format_idc field will be configured in
AVC_IMAGE_STATE to assure that the chroma components of grayscale are
not touched in decoding. So the 420_8 format is still appropriate for
the grayscale surface.

 
   BEGIN_BCS_BATCH(batch, 6);
   OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
   OUT_BCS_BATCH(batch, 0);
  @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx,
 ((obj_surface-orig_height - 1)  19) |
 ((obj_surface-orig_width - 1)  6));
   OUT_BCS_BATCH(batch,
  -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
  surface */
  +  (surface_format  28) | /* 420 planar YUV surface */
 (1  27) | /* must be 1 for interleave U/V, hardware 
  requirement */
 (0  22) | /* surface object control state, FIXME??? */
 ((obj_surface-width - 1)  3) | /* pitch */
  diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c
  index 5b023cf..895b194 100644
  --- a/src/gen75_mfd.c
  +++ b/src/gen75_mfd.c
  @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx,
   struct object_surface *obj_surface = decode_state-render_object;
   unsigned int y_cb_offset;
   unsigned int y_cr_offset;
  +unsigned int surface_format;
 
   assert(obj_surface);
 
   y_cb_offset = obj_surface-y_cb_offset;
   y_cr_offset = obj_surface-y_cr_offset;
 
  +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
  +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
  +
   BEGIN_BCS_BATCH(batch, 6);
   OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));
   OUT_BCS_BATCH(batch, 0);
  @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx,
 ((obj_surface-orig_height - 1)  18) |
 ((obj_surface-orig_width - 1)  4));
   OUT_BCS_BATCH(batch,
  -  (MFX_SURFACE_PLANAR_420_8  28) | /* 420 planar YUV 
  surface */
  +  (surface_format  28) | /* 420 planar YUV surface */
 ((standard_select != MFX_FORMAT_JPEG)  27) | /* 
  interleave chroma, set to 0 for JPEG */
 (0  22) | /* surface object control state, ignored */
 ((obj_surface-width - 1)  3) | /* pitch */
  diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c
  index 70b1cec..2e0d653 100755
  --- a/src/gen7_mfd.c
  +++ b/src/gen7_mfd.c
  @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx,
   struct object_surface *obj_surface = decode_state-render_object;
   unsigned int y_cb_offset;
   unsigned int y_cr_offset;
  +unsigned int surface_format;
 
   assert(obj_surface);
 
   y_cb_offset = obj_surface-y_cb_offset;
   y_cr_offset = obj_surface-y_cr_offset;
 
  +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ?
  +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8;
  +
   BEGIN_BCS_BATCH(batch, 6);
   OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));