[Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.
Simplify the construction of the MI_FLUSH_DW command stream. Use ring buffer generic variants of BEGIN, OUT, ADVANCE batch functions. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/intel_batchbuffer.c | 33 ++--- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index 9dc496d..bc564d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { -if (batch-flag == I915_EXEC_BLT) { -BEGIN_BLT_BATCH(batch, 4); -OUT_BLT_BATCH(batch, MI_FLUSH_DW); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -ADVANCE_BLT_BATCH(batch); -}else if (batch-flag == I915_EXEC_VEBOX) { -BEGIN_VEB_BATCH(batch, 4); -OUT_VEB_BATCH(batch, MI_FLUSH_DW); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -ADVANCE_VEB_BATCH(batch); -} else { -assert(batch-flag == I915_EXEC_BSD); -BEGIN_BCS_BATCH(batch, 4); -OUT_BCS_BATCH(batch, MI_FLUSH_DW | MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -ADVANCE_BCS_BATCH(batch); -} +uint32_t cmd = MI_FLUSH_DW; +if (batch-flag == I915_EXEC_BSD) +cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; + +__BEGIN_BATCH(batch, 4, batch-flag); +__OUT_BATCH(batch, cmd); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__ADVANCE_BATCH(batch); } } else { if (batch-flag == I915_EXEC_RENDER) { -- 1.8.3.2 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
[Libva] [PATCH 0/2] Simplify and fix MI_FLUSH_DW for Gen6+
Hi, This patch series simplifies intel_batchbuffer_emit_mi_flush() for the BSD ring, while also providing a minor fix for Broadwell. Note: there are additional things that could be simplified. Please have a look at it afterwards. Thanks, Gwenole Beauchesne (2): batch: factor out MI_FLUSH_DW batch buffer construction. batch: fix MI_FLUSH_DW for Broadwell. src/intel_batchbuffer.c | 36 +--- 1 file changed, 13 insertions(+), 23 deletions(-) -- 1.8.3.2 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] (H264) Reference Picture selection
On Sun, 2014-05-11 at 23:34 -0600, Gwenole Beauchesne wrote: 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote: Hi, I have a scheme where the decoder ack's received frames - With this information I could resync without an IDR by carefully selecting the reference frame(s). Will you please describe your idea in detail? long term references would typically be employed as well, but it seems ref-pic-marking was removed may'13? Why do you hope to use the long-term reference? As you know, the current encoding is based on hardware-acceleration. When more the reference frames can be selected, the driver will be more complex and then the encoding speed is also affected. It's for implementing Cisco's clearpath technology (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa re/clearpath/clearpath_whitepaper.pdf) Basically one would periodically mark a P frame as long term reference, in case of packet loss one can thus refer to this one instead of restarting the stream with an idr. I'm not too concerned about the encoding speed as long as it's above realtime (60fps 1920x1080) Maybe it's more convenient If libva can have a switch for slice_header made between driver and app. Then some specially cases(ref-reorder, long-term-ref) may let app to manage slice-header. Thanks for your suggestion. If the slice-header info is also generated by the app and then be passed to the driver, it can be easy to do the operation of re-order and long-term-ref But this will depend on the policy how the slice header info is generated. (by app or the driver). Now this is exported by querying the capability of the driver. If we move this into the app, the infrastructure will be changed a lot. Infrastructure of what? The application? No, it doesn't really change anything but generating the extra packed slice header. Currently the app will query the capability of encoding and determine which kind of header should be passed by the upper application(For example: PPS/SPS). If the Slice_header info is exposed, the driver will assume that it is the responsibility of generating the slice header info. In such case it is difficult to mix the two working modes without a lot of changes.(One is that the driver generates the slice header info. Another is that the app generates the slice header info). Then the driver needs to be fixed to not assume anything. The API and operation points look clear enough, aren't they? Sorry that something is not described very clearly in my previous email because of one typo error. If the Slice_header info is exposed, app should be responsible of generating the slice header info and then passed it into the driver. If the slice_header info is not exposed, the driver will be responsible of generating the slice header info. And it is difficult to mix the above two working modes without a lot of changes. The changes to the codec layer are minimal, and this was done already, in multiple instances of codec layers, and with another driver. Another problem is that it is possible that one frame has multiple slices. In such case the user should pass the multiple slice header info. This is not an an issue. The API mandates that the packed headers are provided in order. Regards, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] (H264) Reference Picture selection
On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote: 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote: Hi, I have a scheme where the decoder ack's received frames - With this information I could resync without an IDR by carefully selecting the reference frame(s). Will you please describe your idea in detail? long term references would typically be employed as well, but it seems ref-pic-marking was removed may'13? Why do you hope to use the long-term reference? As you know, the current encoding is based on hardware-acceleration. When more the reference frames can be selected, the driver will be more complex and then the encoding speed is also affected. It's for implementing Cisco's clearpath technology (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa re/clearpath/clearpath_whitepaper.pdf) Basically one would periodically mark a P frame as long term reference, in case of packet loss one can thus refer to this one instead of restarting the stream with an idr. I'm not too concerned about the encoding speed as long as it's above realtime (60fps 1920x1080) Maybe it's more convenient If libva can have a switch for slice_header made between driver and app. Then some specially cases(ref-reorder, long-term-ref) may let app to manage slice-header. Thanks for your suggestion. If the slice-header info is also generated by the app and then be passed to the driver, it can be easy to do the operation of re-order and long-term-ref But this will depend on the policy how the slice header info is generated. (by app or the driver). Now this is exported by querying the capability of the driver. If we move this into the app, the infrastructure will be changed a lot. Infrastructure of what? The application? No, it doesn't really change anything but generating the extra packed slice header. Currently the app will query the capability of encoding and determine which kind of header should be passed by the upper application(For example: PPS/SPS). If the Slice_header info is exposed, the driver will assume that it is the responsibility of generating the slice header info. In such case it is difficult to mix the two working modes without a lot of changes.(One is that the driver generates the slice header info. Another is that the app generates the slice header info). Then the driver needs to be fixed to not assume anything. The API and operation points look clear enough, aren't they? The changes to the codec layer are minimal, and this was done already, in multiple instances of codec layers, and with another driver. Where is your code for codec layer with packed slice header/packed raw data, I want to know the point where is packed slice header / packed raw data inserted into the bitstream in the codec layer. Another problem is that it is possible that one frame has multiple slices. In such case the user should pass the multiple slice header info. This is not an an issue. The API mandates that the packed headers are provided in order. Regards, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.
On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote: Simplify the construction of the MI_FLUSH_DW command stream. Use ring buffer generic variants of BEGIN, OUT, ADVANCE batch functions. Hi, Gwenole Thanks for your patch. But I don't think that we need factor out of MI_FLUSH_DW batch buffer construction. 1. Currently it is very lucky that BSD/BLT/VEBOX are using the similar MI_FLUSH_DW. But the command definition is different on these rings. we are not sure whether we will always use the similar command. When some bits are enabled on the corresponding rings, we will have to split it again. 2. we are considering the redefinition of batch-flag especially for the dual VDBOX for BDW. Thanks. Yakui Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/intel_batchbuffer.c | 33 ++--- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index 9dc496d..bc564d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { -if (batch-flag == I915_EXEC_BLT) { -BEGIN_BLT_BATCH(batch, 4); -OUT_BLT_BATCH(batch, MI_FLUSH_DW); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -ADVANCE_BLT_BATCH(batch); -}else if (batch-flag == I915_EXEC_VEBOX) { -BEGIN_VEB_BATCH(batch, 4); -OUT_VEB_BATCH(batch, MI_FLUSH_DW); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -ADVANCE_VEB_BATCH(batch); -} else { -assert(batch-flag == I915_EXEC_BSD); -BEGIN_BCS_BATCH(batch, 4); -OUT_BCS_BATCH(batch, MI_FLUSH_DW | MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -ADVANCE_BCS_BATCH(batch); -} +uint32_t cmd = MI_FLUSH_DW; +if (batch-flag == I915_EXEC_BSD) +cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; + +__BEGIN_BATCH(batch, 4, batch-flag); +__OUT_BATCH(batch, cmd); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__ADVANCE_BATCH(batch); } } else { if (batch-flag == I915_EXEC_RENDER) { ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] (H264) Reference Picture selection
On Wed, 2014-05-14 at 15:49 +0800, Zhao, Yakui wrote: On Sun, 2014-05-11 at 23:34 -0600, Gwenole Beauchesne wrote: 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote: Hi, I have a scheme where the decoder ack's received frames - With this information I could resync without an IDR by carefully selecting the reference frame(s). Will you please describe your idea in detail? long term references would typically be employed as well, but it seems ref-pic-marking was removed may'13? Why do you hope to use the long-term reference? As you know, the current encoding is based on hardware-acceleration. When more the reference frames can be selected, the driver will be more complex and then the encoding speed is also affected. It's for implementing Cisco's clearpath technology (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa re/clearpath/clearpath_whitepaper.pdf) Basically one would periodically mark a P frame as long term reference, in case of packet loss one can thus refer to this one instead of restarting the stream with an idr. I'm not too concerned about the encoding speed as long as it's above realtime (60fps 1920x1080) Maybe it's more convenient If libva can have a switch for slice_header made between driver and app. Then some specially cases(ref-reorder, long-term-ref) may let app to manage slice-header. Thanks for your suggestion. If the slice-header info is also generated by the app and then be passed to the driver, it can be easy to do the operation of re-order and long-term-ref But this will depend on the policy how the slice header info is generated. (by app or the driver). Now this is exported by querying the capability of the driver. If we move this into the app, the infrastructure will be changed a lot. Infrastructure of what? The application? No, it doesn't really change anything but generating the extra packed slice header. Currently the app will query the capability of encoding and determine which kind of header should be passed by the upper application(For example: PPS/SPS). If the Slice_header info is exposed, the driver will assume that it is the responsibility of generating the slice header info. In such case it is difficult to mix the two working modes without a lot of changes.(One is that the driver generates the slice header info. Another is that the app generates the slice header info). Then the driver needs to be fixed to not assume anything. The API and operation points look clear enough, aren't they? Sorry that something is not described very clearly in my previous email because of one typo error. If the Slice_header info is exposed, app should be responsible of generating the slice header info and then passed it into the driver. If the slice_header info is not exposed, the driver will be responsible of generating the slice header info. And it is difficult to mix the above two working modes without a lot of changes. Another thing is for CBR, the QP might be changed per frame in the driver and app/middleware doesn't not know the final QP. How does app get the right slice_qp_delta to build the right packed slice header ? does app always use a fixed slice_qp_delta ? The changes to the codec layer are minimal, and this was done already, in multiple instances of codec layers, and with another driver. Another problem is that it is possible that one frame has multiple slices. In such case the user should pass the multiple slice header info. This is not an an issue. The API mandates that the packed headers are provided in order. Regards, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.
On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote: The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra dword for the high order bits of the Address field. Thanks for your patch. What is wrong if this is applied? The 4 dwords are still ok to Broadwell. And the address/data field are not used in the command of MI_FLUSH_DW. So the current command is correct for Broadwell. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/intel_batchbuffer.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index bc564d5..01e04d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { +const uint32_t extra_dword = !!IS_GEN8(intel-device_id); uint32_t cmd = MI_FLUSH_DW; if (batch-flag == I915_EXEC_BSD) cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; -__BEGIN_BATCH(batch, 4, batch-flag); -__OUT_BATCH(batch, cmd); +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag); +__OUT_BATCH(batch, cmd|extra_dword); __OUT_BATCH(batch, 0); +if (extra_dword) +__OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __ADVANCE_BATCH(batch); ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.
Hi, 2014-05-14 10:10 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote: The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra dword for the high order bits of the Address field. Thanks for your patch. What is wrong if this is applied? The 4 dwords are still ok to Broadwell. And the address/data field are not used in the command of MI_FLUSH_DW. So the current command is correct for Broadwell. Yes, it's still correct... until we use post-sync ops. But anyway, if you want to keep at the minimum similar semantics to what used to be used, then it's preferred to have the required provisions for handling that. Before, we used the defaults (QWORD immediate, if any). Now, the default semantics changed. (Un)fortunately, a command stream verifier won't notice that, but if you want to strictly implement the specs for consistency, an Address is always 64-bits on Broadwell. If you don't want to follow the specs, that's your call, just remember that in the future and update the code accordingly. Regards, Gwenole. --- src/intel_batchbuffer.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index bc564d5..01e04d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { +const uint32_t extra_dword = !!IS_GEN8(intel-device_id); uint32_t cmd = MI_FLUSH_DW; if (batch-flag == I915_EXEC_BSD) cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; -__BEGIN_BATCH(batch, 4, batch-flag); -__OUT_BATCH(batch, cmd); +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag); +__OUT_BATCH(batch, cmd|extra_dword); __OUT_BATCH(batch, 0); +if (extra_dword) +__OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __ADVANCE_BATCH(batch); ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 1/2] batch: factor out MI_FLUSH_DW batch buffer construction.
Hi, 2014-05-14 10:05 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote: Simplify the construction of the MI_FLUSH_DW command stream. Use ring buffer generic variants of BEGIN, OUT, ADVANCE batch functions. Hi, Gwenole Thanks for your patch. But I don't think that we need factor out of MI_FLUSH_DW batch buffer construction. 1. Currently it is very lucky that BSD/BLT/VEBOX are using the similar MI_FLUSH_DW. But the command definition is different on these rings. we are not sure whether we will always use the similar command. When some bits are enabled on the corresponding rings, we will have to split it again. 2. we are considering the redefinition of batch-flag especially for the dual VDBOX for BDW. By clean-up, I meant something that would not cause you to have a bunch of if()s either. You can use hooks while initializing the batch buffer for a particular ring. Besides, there are higher order optimizations possible in the whole intel_batchbuffer.[ch], which forked off ancient versions that were used in other projects before. Either way, I don't care but the second patch for behavioural consistency. :) Thanks, Gwenole. diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index 9dc496d..bc564d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,29 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { -if (batch-flag == I915_EXEC_BLT) { -BEGIN_BLT_BATCH(batch, 4); -OUT_BLT_BATCH(batch, MI_FLUSH_DW); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -OUT_BLT_BATCH(batch, 0); -ADVANCE_BLT_BATCH(batch); -}else if (batch-flag == I915_EXEC_VEBOX) { -BEGIN_VEB_BATCH(batch, 4); -OUT_VEB_BATCH(batch, MI_FLUSH_DW); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -OUT_VEB_BATCH(batch, 0); -ADVANCE_VEB_BATCH(batch); -} else { -assert(batch-flag == I915_EXEC_BSD); -BEGIN_BCS_BATCH(batch, 4); -OUT_BCS_BATCH(batch, MI_FLUSH_DW | MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -OUT_BCS_BATCH(batch, 0); -ADVANCE_BCS_BATCH(batch); -} +uint32_t cmd = MI_FLUSH_DW; +if (batch-flag == I915_EXEC_BSD) +cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; + +__BEGIN_BATCH(batch, 4, batch-flag); +__OUT_BATCH(batch, cmd); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__OUT_BATCH(batch, 0); +__ADVANCE_BATCH(batch); } } else { if (batch-flag == I915_EXEC_RENDER) { ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 2/2] batch: fix MI_FLUSH_DW for Broadwell.
On Wed, 2014-05-14 at 10:27 +0200, Gwenole Beauchesne wrote: Hi, 2014-05-14 10:10 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 01:47 -0600, Gwenole Beauchesne wrote: The MI_FLUSH_DW command contains 5 dwords on Broadwell, i.e. one extra dword for the high order bits of the Address field. Thanks for your patch. What is wrong if this is applied? The 4 dwords are still ok to Broadwell. And the address/data field are not used in the command of MI_FLUSH_DW. So the current command is correct for Broadwell. Yes, it's still correct... until we use post-sync ops. But anyway, if you want to keep at the minimum similar semantics to what used to be used, then it's preferred to have the required provisions for handling that. Before, we used the defaults (QWORD immediate, if any). Now, the default semantics changed. (Un)fortunately, a command stream verifier won't notice that, but if you want to strictly implement the specs for consistency, an Address is always 64-bits on Broadwell. If you don't want to follow the specs, that's your call, just remember that in the future and update the code accordingly. Thanks for your reminder. As Haihao mentioned, currently we don't use the post-sync ops. In such case the 4 or 5 dwords follow the spec for BDW. In fact there is no post-sync operation, QW/DW doesn't matter. If the post-sync is essential/required, we will take care of updating the code in future. Thanks. Yakui Regards, Gwenole. --- src/intel_batchbuffer.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/intel_batchbuffer.c b/src/intel_batchbuffer.c index bc564d5..01e04d5 100644 --- a/src/intel_batchbuffer.c +++ b/src/intel_batchbuffer.c @@ -247,13 +247,16 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch) } } else { +const uint32_t extra_dword = !!IS_GEN8(intel-device_id); uint32_t cmd = MI_FLUSH_DW; if (batch-flag == I915_EXEC_BSD) cmd |= MI_FLUSH_DW_VIDEO_PIPELINE_CACHE_INVALIDATE; -__BEGIN_BATCH(batch, 4, batch-flag); -__OUT_BATCH(batch, cmd); +__BEGIN_BATCH(batch, 4|extra_dword, batch-flag); +__OUT_BATCH(batch, cmd|extra_dword); __OUT_BATCH(batch, 0); +if (extra_dword) +__OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __OUT_BATCH(batch, 0); __ADVANCE_BATCH(batch); ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH 2/3] Convert encoding misc param type to index
Hi, This is close. When I meant do the conversion once while parsing/accumulating, this is so that to further optimize the translation/usage process. In this patch, you still have to perform conversions afterwards, this is suboptimal. Here is the exact approach I had in mind: enum { /* This strictly follows the base VAEncMiscParameterTypeXXX */ VAIntelEncMiscParameterTypeFrameRate, VAIntelEncMiscParameterTypeRateControl, ... /* This strictly follows the Intel specific extensions, in order */ VAIntelEncMiscParameterTypeExtBase, VAIntelEncMiscParameterTypeFEIFrameControl = VAIntelEncMiscParameterTypeExtBase, VAIntelEncMiscParameterTypeCount }; Your conversion function: switch (type) { case VAEncMiscParameterTypeFrameRate: case VAEncMiscParameterTypeRateControl: ... intel_type = VAIntelEncMiscParameterTypeFrameRate + (type - VAEncMiscParameterTypeFrameRate); break; case VAEncMiscParameterTypeFEIFrameControlIntel: ... intel_type = VAIntelEncMiscParameterTypeFEIFrameControl + (type - VAEncMiscParameterTypeFEIFrameControlIntel); break; default: /* error out | fix */ /* XXX: or generate a garbage index that will hold them. Since we don't need them, it's fine that unused/unsupported buffers replace older while parsing */ break; } So, once you filled up the internal buffer store in context, you can easily access them as misc_buffer[VAIntelEncMiscParameterTypeXXX]; No more conversion. And the initial conversion function will likely be compiler optimized too with 2 to 3 branches and a couple of subs. Thanks, Gwenole. 2014-05-13 11:20 GMT+02:00 Zhong Li zhong...@intel.com: Add function va_enc_misc_param_type_to_idx(), it is helpful to reduce misc_param of encode_state buffer size. Signed-off-by: Zhong Li zhong...@intel.com --- src/gen6_mfc_common.c |3 ++- src/i965_drv_video.c | 45 ++--- src/i965_drv_video.h |3 +++ 3 files changed, 47 insertions(+), 4 deletions(-) diff --git a/src/gen6_mfc_common.c b/src/gen6_mfc_common.c index 33b9d55..faad699 100644 --- a/src/gen6_mfc_common.c +++ b/src/gen6_mfc_common.c @@ -134,7 +134,8 @@ static void intel_mfc_brc_init(struct encode_state *encode_state, { struct gen6_mfc_context *mfc_context = encoder_context-mfc_context; VAEncSequenceParameterBufferH264 *pSequenceParameter = (VAEncSequenceParameterBufferH264 *)encode_state-seq_param_ext-buffer; -VAEncMiscParameterBuffer* pMiscParamHRD = (VAEncMiscParameterBuffer*)encode_state-misc_param[VAEncMiscParameterTypeHRD]-buffer; +int idx = va_enc_misc_param_type_to_idx(VAEncMiscParameterTypeHRD); +VAEncMiscParameterBuffer* pMiscParamHRD = (VAEncMiscParameterBuffer*)encode_state-misc_param[idx]-buffer; VAEncMiscParameterHRD* pParameterHRD = (VAEncMiscParameterHRD*)pMiscParamHRD-data; double bitrate = pSequenceParameter-bits_per_second; double framerate = (double)pSequenceParameter-time_scale /(2 * (double)pSequenceParameter-num_units_in_tick); diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index e505e4a..808c0d5 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -261,6 +261,43 @@ va_enc_packed_type_to_idx(int packed_type) return idx; } +#define I965_ENC_MISC_PARAM_TYPE_FEI_FRAME_CONTROL_INDEX 14 + +int +va_enc_misc_param_type_to_idx(int misc_param_type) +{ +int idx = 0; + +switch (misc_param_type) { +case VAEncMiscParameterTypeFrameRate: +case VAEncMiscParameterTypeRateControl: +case VAEncMiscParameterTypeMaxSliceSize: +case VAEncMiscParameterTypeAIR: +case VAEncMiscParameterTypeMaxFrameSize: +case VAEncMiscParameterTypeHRD: +case VAEncMiscParameterTypeQualityLevel: +case VAEncMiscParameterTypeRIR: +case VAEncMiscParameterTypeQuantization: +case VAEncMiscParameterTypeSkipFrame: +case VAEncMiscParameterTypeROI: +case VAEncMiscParameterTypeCIR: +idx = misc_param_type; +break; + +case VAEncMiscParameterTypeFEIFrameControlIntel: +idx = I965_ENC_MISC_PARAM_TYPE_FEI_FRAME_CONTROL_INDEX; +break; + +default: +/* Should not get here */ +ASSERT_RET(0, 0); +break; +} + +ASSERT_RET(idx 16, 0); +return idx; +} + VAStatus i965_QueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, /* out */ @@ -2171,17 +2208,19 @@ i965_encoder_render_misc_parameter_buffer(VADriverContextP ctx, { struct encode_state *encode = obj_context-codec_state.encode; VAEncMiscParameterBuffer *param = NULL; +int type_index; ASSERT_RET(obj_buffer-buffer_store-bo == NULL, VA_STATUS_ERROR_INVALID_BUFFER); ASSERT_RET(obj_buffer-buffer_store-buffer, VA_STATUS_ERROR_INVALID_BUFFER); param = (VAEncMiscParameterBuffer *)obj_buffer-buffer_store-buffer; +type_index = va_enc_misc_param_type_to_idx(param-type); -if
Re: [Libva] [PATCH intel-driver 1/4] decoder: h264: don't deallocate surface storage of older frames.
Hi, 2014-05-12 8:17 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 23:31 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 7:21 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:35 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 2:55 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 08:34 -0600, Gwenole Beauchesne wrote: Drop the optimization whereby surfaces that are no longer marked as reference and that were already displayed are to be destroyed. This is wrong mainly for two reasons: 1. The surface was displayed... once but it may still be needed for subsequent operations like displaying it again, using it for a transcode pipeline (encode) for instance, etc. 2. The new set of ReferenceFrames[] correspond to the active set of reference frames used for decoding the current slice. In presence of Multiview Coding (MVC), that could correspond to the current view, in view order index, but the surface may still be needed for decoding the next view with the same view_id, while also decoding other views with another set of reference frames for them. Hi, Gwenole Thanks for working on the h264 reference frame list. And I have some concerns about the patch set: 1. When will the unused reference surface be destroyed? As it ought to be: at vaDestroySurfaces() time. 2. How to find one free slot in decode_state-reference_surfaces array to store the corresponding reference frame? It's the same algorithm. :) At this time, the reference_surfaces[] contents exactly matches ReferenceFrames[], hole for hole, if there is any. The former holds object_surface, the latter holds the associated VAPictureH264. i.e. pointer to object_surface [i] is really the reference_surface[i] that matches VAPictureH264 at ReferenceFrames[i]. Previously, there could be a mismatch between both. As you know, the current policy in the driver is based on the following flowchart: 1. the driver maintains one array for the hardware-requirement. 2. When one new frame is decoded, it will check whether the array element maintained by the driver is still used by the DPB. If not, the internal data structure will be free and then be used for the later selection. 3. Find the free slot in the array to store the reference frames in DPB. But I don't see how this patch set stores/updates the reference frames of DPB into the driver array and how to decide which slot can be replaced by the reference frames in new DPB. I don't understand your concern. It works the same as before, but replaces ReferenceFrames[] with reference_surface[], which now hold the same contents, in order. The free_slots[] array is filled in order too so that to avoid the extraneous iterations again to look for the same free slot id. My point is that you can describe how you find the free slot for the later update about the reference frame in DPB. OK, will do but there is really zero change in the algorithm beyond the natural and higher order optimization. It will be better that you can describe the keypoint in the patch especially when it updates the content required by the hardware. What contents do you think we need to maintain for the hardware? The hardware is designed to be stateless, so it normally doesn't keep anything. This is necessary for context switching performance. As I said in the previous email, the array of reference surface is one content required by the hardware. I think there was a misunderstanding. The reference_surface[] is not changed (yet), only the reference_objects[]. That's totally different. The former is the frame store to maintain for the HW (and internal info we have in the DMV buffers actually); the latter should be a 1:1 correspondence with VA surface ids in ReferenceFrames[] to the associated object_surface objects (stored in reference_surface). I will provide another patch set with a couple more patches: 1. Definitely fix MVC decode use cases ; 2. Optimize the process for Haswell and newer. (3. if this really interests anyone, I may submit doc fixes and clarifications if I find time to, but the code will be used as reference). Regards, Gwenole. src/i965_decoder_utils.c | 13 - 1 file changed, 13 deletions(-) diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c index 2533381..9d537ff 100644 --- a/src/i965_decoder_utils.c +++ b/src/i965_decoder_utils.c @@ -382,19 +382,6 @@ intel_update_avc_frame_store_index(VADriverContextP ctx, /* remove it from the internal DPB */ if (!found) { -struct object_surface *obj_surface = frame_store[i].obj_surface; - -obj_surface-flags = ~SURFACE_REFERENCED; - -if ((obj_surface-flags SURFACE_ALL_MASK) == SURFACE_DISPLAYED) { -
Re: [Libva] (H264) Reference Picture selection
On 14.05.2014 10:51, Xiang, Haihao wrote: On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote: 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote: Hi, I have a scheme where the decoder ack's received frames - With this information I could resync without an IDR by carefully selecting the reference frame(s). Will you please describe your idea in detail? long term references would typically be employed as well, but it seems ref-pic-marking was removed may'13? Why do you hope to use the long-term reference? As you know, the current encoding is based on hardware-acceleration. When more the reference frames can be selected, the driver will be more complex and then the encoding speed is also affected. It's for implementing Cisco's clearpath technology (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa re/clearpath/clearpath_whitepaper.pdf) Basically one would periodically mark a P frame as long term reference, in case of packet loss one can thus refer to this one instead of restarting the stream with an idr. I'm not too concerned about the encoding speed as long as it's above realtime (60fps 1920x1080) Maybe it's more convenient If libva can have a switch for slice_header made between driver and app. Then some specially cases(ref-reorder, long-term-ref) may let app to manage slice-header. Thanks for your suggestion. If the slice-header info is also generated by the app and then be passed to the driver, it can be easy to do the operation of re-order and long-term-ref But this will depend on the policy how the slice header info is generated. (by app or the driver). Now this is exported by querying the capability of the driver. If we move this into the app, the infrastructure will be changed a lot. Infrastructure of what? The application? No, it doesn't really change anything but generating the extra packed slice header. Currently the app will query the capability of encoding and determine which kind of header should be passed by the upper application(For example: PPS/SPS). If the Slice_header info is exposed, the driver will assume that it is the responsibility of generating the slice header info. In such case it is difficult to mix the two working modes without a lot of changes.(One is that the driver generates the slice header info. Another is that the app generates the slice header info). Then the driver needs to be fixed to not assume anything. The API and operation points look clear enough, aren't they? The changes to the codec layer are minimal, and this was done already, in multiple instances of codec layers, and with another driver. Where is your code for codec layer with packed slice header/packed raw data, I want to know the point where is packed slice header / packed raw data inserted into the bitstream in the codec layer. I have these patches for the codec-layer (gstreaemr-vaapi) but not yet integrated to gstreamer-vaapi master. Submitting all packed_headers in the following order just after the submission of VAEncSequenceParameterBuffer to vaRenderPicture() : packed_sps, packed_pps, packed_slice. And then submitting Miscparam, PictureParam and SliceParam. Is it not enough? Anyway the intel driver seems to be using fixed size arrays for referencing packed headers (packed_header_param[4] and packed_header_data[4]) which needs to be changed if there are multiple slices per frame since we have to submit packed headers for each slice. isn't it? Another problem is that it is possible that one frame has multiple slices. In such case the user should pass the multiple slice header info. This is not an an issue. The API mandates that the packed headers are provided in order. Regards, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva -- Thanks Sree - Intel Finland Oy Registered Address: PL 281, 00181 Helsinki Business Identity Code: 0357606 - 4 Domiciled in Helsinki This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
[Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.
Allow vaDeriveImage() to work with grayscale surfaces by only exposing the luma component. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 589c00c..495ad44 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx, image-format.bits_per_pixel = 12; switch (image-format.fourcc) { +case VA_FOURCC_Y800: +image-num_planes = 1; +image-pitches[0] = w_pitch; /* Y */ +image-offsets[0] = 0; +break; + case VA_FOURCC_YV12: image-num_planes = 3; image-pitches[0] = w_pitch; /* Y */ -- 1.7.9.5 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
[Libva] [PATCH v2 intel-driver 2/8] surface: factor out release of surface buffer storage.
Introduce a new i965_destroy_surface_storage() helper function to unreference the underlying GEM buffer object, and any associated private data, if any. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_decoder_utils.c |6 +- src/i965_drv_video.c | 14 +++--- src/i965_drv_video.h |3 +++ src/i965_output_dri.c|6 +- 4 files changed, 16 insertions(+), 13 deletions(-) diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c index 617bc15..fac9628 100644 --- a/src/i965_decoder_utils.c +++ b/src/i965_decoder_utils.c @@ -387,14 +387,10 @@ intel_update_avc_frame_store_index(VADriverContextP ctx, obj_surface-flags = ~SURFACE_REFERENCED; if ((obj_surface-flags SURFACE_ALL_MASK) == SURFACE_DISPLAYED) { -dri_bo_unreference(obj_surface-bo); -obj_surface-bo = NULL; obj_surface-flags = ~SURFACE_REF_DIS_MASK; +i965_destroy_surface_storage(obj_surface); } -if (obj_surface-free_private_data) -obj_surface-free_private_data(obj_surface-private_data); - frame_store[i].surface_id = VA_INVALID_ID; frame_store[i].frame_store_id = -1; frame_store[i].obj_surface = NULL; diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 45a5639..589c00c 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -749,10 +749,11 @@ VAStatus i965_QueryConfigAttributes(VADriverContextP ctx, return vaStatus; } -static void -i965_destroy_surface(struct object_heap *heap, struct object_base *obj) +void +i965_destroy_surface_storage(struct object_surface *obj_surface) { -struct object_surface *obj_surface = (struct object_surface *)obj; +if (!obj_surface) +return; dri_bo_unreference(obj_surface-bo); obj_surface-bo = NULL; @@ -761,7 +762,14 @@ i965_destroy_surface(struct object_heap *heap, struct object_base *obj) obj_surface-free_private_data(obj_surface-private_data); obj_surface-private_data = NULL; } +} + +static void +i965_destroy_surface(struct object_heap *heap, struct object_base *obj) +{ +struct object_surface *obj_surface = (struct object_surface *)obj; +i965_destroy_surface_storage(obj_surface); object_heap_free(heap, obj); } diff --git a/src/i965_drv_video.h b/src/i965_drv_video.h index 535402f..8f2d288 100644 --- a/src/i965_drv_video.h +++ b/src/i965_drv_video.h @@ -423,4 +423,7 @@ extern VAStatus i965_DestroySurfaces(VADriverContextP ctx, #define I965_SURFACE_MEM_GEM_FLINK 1 #define I965_SURFACE_MEM_DRM_PRIME 2 +void +i965_destroy_surface_storage(struct object_surface *obj_surface); + #endif /* _I965_DRV_VIDEO_H_ */ diff --git a/src/i965_output_dri.c b/src/i965_output_dri.c index 717ee9a..fdd69ce 100644 --- a/src/i965_output_dri.c +++ b/src/i965_output_dri.c @@ -209,12 +209,8 @@ i965_put_surface_dri( obj_surface-flags |= SURFACE_DISPLAYED; if ((obj_surface-flags SURFACE_ALL_MASK) == SURFACE_DISPLAYED) { -dri_bo_unreference(obj_surface-bo); -obj_surface-bo = NULL; obj_surface-flags = ~SURFACE_REF_DIS_MASK; - -if (obj_surface-free_private_data) -obj_surface-free_private_data(obj_surface-private_data); +i965_destroy_surface_storage(obj_surface); } _i965UnlockMutex(i965-render_mutex); -- 1.7.9.5 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
[Libva] [PATCH v2 intel-driver 4/8] config: fix vaGetConfigAttributes() to validate profile/entrypoint.
Factor out code to validate profile/entrypoint per the underlying hardware capabilities. Also fix vaGetConfigAttributes() to really validate the profile/entrypoint pair. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c | 145 +++--- 1 file changed, 78 insertions(+), 67 deletions(-) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 495ad44..a7b7e9b 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -512,6 +512,78 @@ i965_QueryConfigEntrypoints(VADriverContextP ctx, return n 0 ? VA_STATUS_SUCCESS : VA_STATUS_ERROR_UNSUPPORTED_PROFILE; } +static VAStatus +i965_validate_config(VADriverContextP ctx, VAProfile profile, +VAEntrypoint entrypoint) +{ +struct i965_driver_data * const i965 = i965_driver_data(ctx); +VAStatus va_status; + +/* Validate profile entrypoint */ +switch (profile) { +case VAProfileMPEG2Simple: +case VAProfileMPEG2Main: +if ((HAS_MPEG2_DECODING(i965) entrypoint == VAEntrypointVLD) || +(HAS_MPEG2_ENCODING(i965) entrypoint == VAEntrypointEncSlice)) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +case VAProfileH264ConstrainedBaseline: +case VAProfileH264Main: +case VAProfileH264High: +if ((HAS_H264_DECODING(i965) entrypoint == VAEntrypointVLD) || +(HAS_H264_ENCODING(i965) entrypoint == VAEntrypointEncSlice)) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +case VAProfileVC1Simple: +case VAProfileVC1Main: +case VAProfileVC1Advanced: +if (HAS_VC1_DECODING(i965) entrypoint == VAEntrypointVLD) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +case VAProfileNone: +if (HAS_VPP(i965) VAEntrypointVideoProc == entrypoint) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +case VAProfileJPEGBaseline: +if (HAS_JPEG_DECODING(i965) entrypoint == VAEntrypointVLD) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +case VAProfileVP8Version0_3: +if ((HAS_VP8_DECODING(i965) entrypoint == VAEntrypointVLD) || +(HAS_VP8_ENCODING(i965) entrypoint == VAEntrypointEncSlice)) { +va_status = VA_STATUS_SUCCESS; +} else { +va_status = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; +} +break; + +default: +va_status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE; +break; +} +return va_status; +} + VAStatus i965_GetConfigAttributes(VADriverContextP ctx, VAProfile profile, @@ -519,8 +591,13 @@ i965_GetConfigAttributes(VADriverContextP ctx, VAConfigAttrib *attrib_list, /* in/out */ int num_attribs) { +VAStatus va_status; int i; +va_status = i965_validate_config(ctx, profile, entrypoint); +if (va_status != VA_STATUS_SUCCESS) +return va_status; + /* Other attributes don't seem to be defined */ /* What to do if we don't know the attribute? */ for (i = 0; i num_attribs; i++) { @@ -606,73 +683,7 @@ i965_CreateConfig(VADriverContextP ctx, int i; VAStatus vaStatus; -/* Validate profile entrypoint */ -switch (profile) { -case VAProfileMPEG2Simple: -case VAProfileMPEG2Main: -if ((HAS_MPEG2_DECODING(i965) VAEntrypointVLD == entrypoint) || -(HAS_MPEG2_ENCODING(i965) VAEntrypointEncSlice == entrypoint)) { -vaStatus = VA_STATUS_SUCCESS; -} else { -vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; -} -break; - -case VAProfileH264ConstrainedBaseline: -case VAProfileH264Main: -case VAProfileH264High: -if ((HAS_H264_DECODING(i965) VAEntrypointVLD == entrypoint) || -(HAS_H264_ENCODING(i965) VAEntrypointEncSlice == entrypoint)) { -vaStatus = VA_STATUS_SUCCESS; -} else { -vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; -} - -break; - -case VAProfileVC1Simple: -case VAProfileVC1Main: -case VAProfileVC1Advanced: -if (HAS_VC1_DECODING(i965) VAEntrypointVLD == entrypoint) { -vaStatus = VA_STATUS_SUCCESS; -} else { -vaStatus = VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT; -} - -break; - -case VAProfileNone: -if (HAS_VPP(i965) VAEntrypointVideoProc == entrypoint) { -vaStatus =
[Libva] [PATCH v2 intel-driver 5/8] config: fix vaCreateConfig() to not override user chroma format.
Only validate the user-defined chroma format (VAConfigAttribRTFormat) attribute, if any. Don't override it. i.e. append a pre-defined value only if it was not defined by the user beforehand. Propertly return VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT if the supplied chroma format is not supported. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c | 90 -- 1 file changed, 66 insertions(+), 24 deletions(-) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index a7b7e9b..e60760f 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -584,6 +584,20 @@ i965_validate_config(VADriverContextP ctx, VAProfile profile, return va_status; } +static uint32_t +i965_get_default_chroma_formats(VADriverContextP ctx, VAProfile profile, +VAEntrypoint entrypoint) +{ +struct i965_driver_data * const i965 = i965_driver_data(ctx); +uint32_t chroma_formats = VA_RT_FORMAT_YUV420; + +switch (profile) { +default: +break; +} +return chroma_formats; +} + VAStatus i965_GetConfigAttributes(VADriverContextP ctx, VAProfile profile, @@ -603,7 +617,8 @@ i965_GetConfigAttributes(VADriverContextP ctx, for (i = 0; i num_attribs; i++) { switch (attrib_list[i].type) { case VAConfigAttribRTFormat: -attrib_list[i].value = VA_RT_FORMAT_YUV420; +attrib_list[i].value = i965_get_default_chroma_formats(ctx, +profile, entrypoint); break; case VAConfigAttribRateControl: @@ -644,29 +659,49 @@ i965_destroy_config(struct object_heap *heap, struct object_base *obj) object_heap_free(heap, obj); } -static VAStatus -i965_update_attribute(struct object_config *obj_config, VAConfigAttrib *attrib) +static VAConfigAttrib * +i965_lookup_config_attribute(struct object_config *obj_config, +VAConfigAttribType type) { int i; -/* Check existing attrbiutes */ for (i = 0; i obj_config-num_attribs; i++) { -if (obj_config-attrib_list[i].type == attrib-type) { -/* Update existing attribute */ -obj_config-attrib_list[i].value = attrib-value; -return VA_STATUS_SUCCESS; -} +VAConfigAttrib * const attrib = obj_config-attrib_list[i]; +if (attrib-type == type) +return attrib; } +return NULL; +} -if (obj_config-num_attribs I965_MAX_CONFIG_ATTRIBUTES) { -i = obj_config-num_attribs; -obj_config-attrib_list[i].type = attrib-type; -obj_config-attrib_list[i].value = attrib-value; -obj_config-num_attribs++; +static VAStatus +i965_append_config_attribute(struct object_config *obj_config, +const VAConfigAttrib *new_attrib) +{ +VAConfigAttrib *attrib; + +if (obj_config-num_attribs = I965_MAX_CONFIG_ATTRIBUTES) +return VA_STATUS_ERROR_MAX_NUM_EXCEEDED; + +attrib = obj_config-attrib_list[obj_config-num_attribs++]; +attrib-type = new_attrib-type; +attrib-value = new_attrib-value; +return VA_STATUS_SUCCESS; +} + +static VAStatus +i965_ensure_config_attribute(struct object_config *obj_config, +const VAConfigAttrib *new_attrib) +{ +VAConfigAttrib *attrib; + +/* Check for existing attributes */ +attrib = i965_lookup_config_attribute(obj_config, new_attrib-type); +if (attrib) { +/* Update existing attribute */ +attrib-value = new_attrib-value; return VA_STATUS_SUCCESS; } - -return VA_STATUS_ERROR_MAX_NUM_EXCEEDED; +return i965_append_config_attribute(obj_config, new_attrib); } VAStatus @@ -698,16 +733,23 @@ i965_CreateConfig(VADriverContextP ctx, obj_config-profile = profile; obj_config-entrypoint = entrypoint; -obj_config-attrib_list[0].type = VAConfigAttribRTFormat; -obj_config-attrib_list[0].value = VA_RT_FORMAT_YUV420; -obj_config-num_attribs = 1; +obj_config-num_attribs = 0; -for(i = 0; i num_attribs; i++) { -vaStatus = i965_update_attribute(obj_config, (attrib_list[i])); - -if (VA_STATUS_SUCCESS != vaStatus) { +for (i = 0; i num_attribs; i++) { +vaStatus = i965_ensure_config_attribute(obj_config, attrib_list[i]); +if (vaStatus != VA_STATUS_SUCCESS) break; -} +} + +if (vaStatus == VA_STATUS_SUCCESS) { +VAConfigAttrib attrib, *attrib_found; +attrib.type = VAConfigAttribRTFormat; +attrib.value = i965_get_default_chroma_formats(ctx, profile, entrypoint); +attrib_found = i965_lookup_config_attribute(obj_config, attrib.type); +if (!attrib_found || !attrib_found-value) +vaStatus = i965_append_config_attribute(obj_config, attrib); +else if (!(attrib_found-value attrib.value)) +vaStatus = VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT; } /* Error recovery */ -- 1.7.9.5
[Libva] [PATCH v2 intel-driver 7/8] decoder: h264: factor out allocation of reconstructed surfaces.
Add new avc_ensure_surface_bo() helper function to factor out the allocatiion and initialization processes of the reconstructed VA surface buffer stores. Keep preferred native format (NV12) and initialize chroma values to 0.0 (0x80) when needed for fake grayscale (Y800) surfaces implemented on top of existing NV12. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/gen6_mfd.c | 12 +-- src/gen75_mfd.c | 12 +-- src/gen7_mfd.c | 12 +-- src/gen8_mfd.c | 12 +-- src/i965_decoder_utils.c | 54 +- src/i965_decoder_utils.h |8 +++ 6 files changed, 65 insertions(+), 45 deletions(-) diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c index 4a22052..2092f69 100755 --- a/src/gen6_mfd.c +++ b/src/gen6_mfd.c @@ -840,18 +840,8 @@ gen6_mfd_avc_decode_init(VADriverContextP ctx, obj_surface = decode_state-render_object; obj_surface-flags = ~SURFACE_REF_DIS_MASK; obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? SURFACE_REFERENCED : 0); -i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, SUBSAMPLE_YUV420); - -/* initial uv component for YUV400 case */ -if (pic_param-seq_fields.bits.chroma_format_idc == 0) { - unsigned int uv_offset = obj_surface-width * obj_surface-height; - unsigned int uv_size = obj_surface-width * obj_surface-height / 2; - - drm_intel_gem_bo_map_gtt(obj_surface-bo); - memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size); - drm_intel_gem_bo_unmap_gtt(obj_surface-bo); -} +avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param); gen6_mfd_init_avc_surface(ctx, pic_param, obj_surface); dri_bo_unreference(gen6_mfd_context-post_deblocking_output.bo); diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c index 2d4e236..5b023cf 100644 --- a/src/gen75_mfd.c +++ b/src/gen75_mfd.c @@ -1084,18 +1084,8 @@ gen75_mfd_avc_decode_init(VADriverContextP ctx, obj_surface = decode_state-render_object; obj_surface-flags = ~SURFACE_REF_DIS_MASK; obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? SURFACE_REFERENCED : 0); -i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, SUBSAMPLE_YUV420); - -/* initial uv component for YUV400 case */ -if (pic_param-seq_fields.bits.chroma_format_idc == 0) { - unsigned int uv_offset = obj_surface-width * obj_surface-height; - unsigned int uv_size = obj_surface-width * obj_surface-height / 2; - - drm_intel_gem_bo_map_gtt(obj_surface-bo); - memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size); - drm_intel_gem_bo_unmap_gtt(obj_surface-bo); -} +avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param); gen75_mfd_init_avc_surface(ctx, pic_param, obj_surface); dri_bo_unreference(gen7_mfd_context-post_deblocking_output.bo); diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c index e91cfd3..70b1cec 100755 --- a/src/gen7_mfd.c +++ b/src/gen7_mfd.c @@ -758,18 +758,8 @@ gen7_mfd_avc_decode_init(VADriverContextP ctx, obj_surface = decode_state-render_object; obj_surface-flags = ~SURFACE_REF_DIS_MASK; obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? SURFACE_REFERENCED : 0); -i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, SUBSAMPLE_YUV420); - -/* initial uv component for YUV400 case */ -if (pic_param-seq_fields.bits.chroma_format_idc == 0) { - unsigned int uv_offset = obj_surface-width * obj_surface-height; - unsigned int uv_size = obj_surface-width * obj_surface-height / 2; - - drm_intel_gem_bo_map_gtt(obj_surface-bo); - memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size); - drm_intel_gem_bo_unmap_gtt(obj_surface-bo); -} +avc_ensure_surface_bo(ctx, decode_state, obj_surface, pic_param); gen7_mfd_init_avc_surface(ctx, pic_param, obj_surface); dri_bo_unreference(gen7_mfd_context-post_deblocking_output.bo); diff --git a/src/gen8_mfd.c b/src/gen8_mfd.c index 1742bea..e3e71fb 100644 --- a/src/gen8_mfd.c +++ b/src/gen8_mfd.c @@ -845,18 +845,8 @@ gen8_mfd_avc_decode_init(VADriverContextP ctx, obj_surface = decode_state-render_object; obj_surface-flags = ~SURFACE_REF_DIS_MASK; obj_surface-flags |= (pic_param-pic_fields.bits.reference_pic_flag ? SURFACE_REFERENCED : 0); -i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, SUBSAMPLE_YUV420); - -/* initial uv component for YUV400 case */ -if (pic_param-seq_fields.bits.chroma_format_idc == 0) { - unsigned int uv_offset = obj_surface-width * obj_surface-height; - unsigned int uv_size = obj_surface-width * obj_surface-height / 2; - - drm_intel_gem_bo_map_gtt(obj_surface-bo); - memset(obj_surface-bo-virtual + uv_offset, 0x80, uv_size); -
[Libva] [PATCH v2 intel-driver 6/8] config: fix supported set of chroma formats for JPEG decode.
If the hardware supports JPEG decoding, then we have to expose the right set of chroma formats for the output (decoded) VA surface. In particular, we could support YUV 4:1:0, 4:2:2 and 4:4:4. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c | 15 +++ src/i965_drv_video.h |2 ++ 2 files changed, 17 insertions(+) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index e60760f..768469a 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -214,6 +214,10 @@ get_subpic_format(const VAImageFormat *va_format) return NULL; } +/* Extra set of chroma formats supported for JPEG decoding (beyond YUV 4:2:0) */ +#define EXTRA_JPEG_DEC_CHROMA_FORMATS \ +(VA_RT_FORMAT_YUV411 | VA_RT_FORMAT_YUV422 | VA_RT_FORMAT_YUV444) + extern struct hw_context *i965_proc_context_init(VADriverContextP, struct object_config *); extern struct hw_context *g4x_dec_hw_context_init(VADriverContextP, struct object_config *); static struct hw_codec_info g4x_hw_codec_info = { @@ -278,6 +282,8 @@ static struct hw_codec_info gen7_hw_codec_info = { .max_width = 4096, .max_height = 4096, +.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS, + .has_mpeg2_decoding = 1, .has_mpeg2_encoding = 1, .has_h264_decoding = 1, @@ -305,6 +311,8 @@ static struct hw_codec_info gen75_hw_codec_info = { .max_width = 4096, .max_height = 4096, +.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS, + .has_mpeg2_decoding = 1, .has_mpeg2_encoding = 1, .has_h264_decoding = 1, @@ -336,6 +344,8 @@ static struct hw_codec_info gen8_hw_codec_info = { .max_width = 4096, .max_height = 4096, +.jpeg_dec_chroma_formats = EXTRA_JPEG_DEC_CHROMA_FORMATS, + .has_mpeg2_decoding = 1, .has_mpeg2_encoding = 1, .has_h264_decoding = 1, @@ -592,6 +602,11 @@ i965_get_default_chroma_formats(VADriverContextP ctx, VAProfile profile, uint32_t chroma_formats = VA_RT_FORMAT_YUV420; switch (profile) { +case VAProfileJPEGBaseline: +if (HAS_JPEG_DECODING(i965) entrypoint == VAEntrypointVLD) +chroma_formats |= i965-codec_info-jpeg_dec_chroma_formats; +break; + default: break; } diff --git a/src/i965_drv_video.h b/src/i965_drv_video.h index 8f2d288..e70852b 100644 --- a/src/i965_drv_video.h +++ b/src/i965_drv_video.h @@ -289,6 +289,8 @@ struct hw_codec_info int max_width; int max_height; +unsigned int jpeg_dec_chroma_formats; + unsigned int has_mpeg2_decoding:1; unsigned int has_mpeg2_encoding:1; unsigned int has_h264_decoding:1; -- 1.7.9.5 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
[Libva] [PATCH v2 intel-driver 0/8] Fix support for grayscale
Hi, This patch series fixes and optimizes support for H.grayscale streams encoded in H.264 (Patch8). This is backwards compatible with solutions that incorrectly request for YUV 4:2:0 formats instead of the appropriate Y800 one. Tested with gstreamer-vaapi and FFmpeg/vaapi on Ivybridge. No regression. v2: split into smaller patches, while maintaining regression testability, drop dead code in the Y800 rendering code path. Regards, Gwenole Beauchesne (8): surface: fix geometry (size, layout) of grayscale surfaces. surface: factor out release of surface buffer storage. surface: fix vaDeriveImage() for grayscale. config: fix vaGetConfigAttributes() to validate profile/entrypoint. config: fix vaCreateConfig() to not override user chroma format. config: fix supported set of chroma formats for JPEG decode. decoder: h264: factor out allocation of reconstructed surfaces. decoder: h264: optimize support for grayscale surfaces. src/gen6_mfd.c | 20 ++- src/gen75_mfd.c | 18 +-- src/gen7_mfd.c | 18 +-- src/gen8_mfd.c | 18 +-- src/gen8_render.c|3 + src/i965_decoder_utils.c | 75 +++- src/i965_decoder_utils.h |8 ++ src/i965_drv_video.c | 301 +++--- src/i965_drv_video.h | 14 +++ src/i965_output_dri.c|6 +- src/i965_render.c|3 + 11 files changed, 326 insertions(+), 158 deletions(-) -- 1.7.9.5 ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.
On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Allow vaDeriveImage() to work with grayscale surfaces by only exposing the luma component. This is not necessary as the DeriveImage already supports the Y800 fourcc. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 589c00c..495ad44 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx, image-format.bits_per_pixel = 12; switch (image-format.fourcc) { +case VA_FOURCC_Y800: +image-num_planes = 1; +image-pitches[0] = w_pitch; /* Y */ +image-offsets[0] = 0; +break; + case VA_FOURCC_YV12: image-num_planes = 3; image-pitches[0] = w_pitch; /* Y */ ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.
On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Optimize support for grayscale surfaces in two aspects: (i) space by only allocating the luminance component ; (ii) speed by avoiding initialization of the (now inexistent) chrominance planes. Keep backward compatibility with older codec layers that only supported YUV 4:2:0 and not grayscale formats properly. As a whole, I am OK to this version patch except two concerns. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/gen6_mfd.c |8 ++-- src/gen75_mfd.c |6 +- src/gen7_mfd.c |6 +- src/gen8_mfd.c |6 +- src/i965_decoder_utils.c | 23 +++ src/i965_drv_video.c | 22 ++ src/i965_drv_video.h |9 + 7 files changed, 71 insertions(+), 9 deletions(-) diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c index 2092f69..f925d98 100755 --- a/src/gen6_mfd.c +++ b/src/gen6_mfd.c @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx, { struct intel_batchbuffer *batch = gen6_mfd_context-base.batch; struct object_surface *obj_surface = decode_state-render_object; - +unsigned int surface_format; + +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + Can it work if it still set the PLANAR_420_8 format for the Y800 surface? If yes, I suggest that this is not updated as the original code follows the spec. BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 19) | ((obj_surface-orig_width - 1) 6)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ (1 27) | /* must be 1 for interleave U/V, hardware requirement */ (0 22) | /* surface object control state, FIXME??? */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c index 5b023cf..895b194 100644 --- a/src/gen75_mfd.c +++ b/src/gen75_mfd.c @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 18) | ((obj_surface-orig_width - 1) 4)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ ((standard_select != MFX_FORMAT_JPEG) 27) | /* interleave chroma, set to 0 for JPEG */ (0 22) | /* surface object control state, ignored */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c index 70b1cec..2e0d653 100755 --- a/src/gen7_mfd.c +++ b/src/gen7_mfd.c @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -148,7 +152,7 @@ gen7_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 18) | ((obj_surface-orig_width - 1) 4)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ ((standard_select != MFX_FORMAT_JPEG) 27) | /* interleave chroma, set to 0 for JPEG */ (0 22) | /* surface object control state, ignored */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen8_mfd.c
Re: [Libva] (H264) Reference Picture selection
On Wed, 2014-05-14 at 03:50 -0600, Sreerenj wrote: On 14.05.2014 10:51, Xiang, Haihao wrote: On Mon, 2014-05-12 at 07:34 +0200, Gwenole Beauchesne wrote: 2014-05-12 7:29 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Sun, 2014-05-11 at 22:41 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-12 3:34 GMT+02:00 Zhao Yakui yakui.z...@intel.com: On Fri, 2014-05-09 at 03:06 -0600, Yuan, Feng wrote: Hi, I have a scheme where the decoder ack's received frames - With this information I could resync without an IDR by carefully selecting the reference frame(s). Will you please describe your idea in detail? long term references would typically be employed as well, but it seems ref-pic-marking was removed may'13? Why do you hope to use the long-term reference? As you know, the current encoding is based on hardware-acceleration. When more the reference frames can be selected, the driver will be more complex and then the encoding speed is also affected. It's for implementing Cisco's clearpath technology (http://www.cisco.com/c/dam/en/us/td/docs/telepresence/endpoint/softwa re/clearpath/clearpath_whitepaper.pdf) Basically one would periodically mark a P frame as long term reference, in case of packet loss one can thus refer to this one instead of restarting the stream with an idr. I'm not too concerned about the encoding speed as long as it's above realtime (60fps 1920x1080) Maybe it's more convenient If libva can have a switch for slice_header made between driver and app. Then some specially cases(ref-reorder, long-term-ref) may let app to manage slice-header. Thanks for your suggestion. If the slice-header info is also generated by the app and then be passed to the driver, it can be easy to do the operation of re-order and long-term-ref But this will depend on the policy how the slice header info is generated. (by app or the driver). Now this is exported by querying the capability of the driver. If we move this into the app, the infrastructure will be changed a lot. Infrastructure of what? The application? No, it doesn't really change anything but generating the extra packed slice header. Currently the app will query the capability of encoding and determine which kind of header should be passed by the upper application(For example: PPS/SPS). If the Slice_header info is exposed, the driver will assume that it is the responsibility of generating the slice header info. In such case it is difficult to mix the two working modes without a lot of changes.(One is that the driver generates the slice header info. Another is that the app generates the slice header info). Then the driver needs to be fixed to not assume anything. The API and operation points look clear enough, aren't they? The changes to the codec layer are minimal, and this was done already, in multiple instances of codec layers, and with another driver. Where is your code for codec layer with packed slice header/packed raw data, I want to know the point where is packed slice header / packed raw data inserted into the bitstream in the codec layer. I have these patches for the codec-layer (gstreaemr-vaapi) but not yet integrated to gstreamer-vaapi master. Submitting all packed_headers in the following order just after the submission of VAEncSequenceParameterBuffer to vaRenderPicture() : packed_sps, packed_pps, packed_slice. And then submitting Miscparam, PictureParam and SliceParam. Is it not enough? It is enough under the single slice. But it will still change where the slice header is generated. The current design is based on the following points: 1. The driver exposes the SPS/PPS/MISC and the user-space app is responsible for generating the corresponding packed ata. 2. The driver is responsible for generating the slice header. If the generation of slice header is moved from the driver to the user-space app, the driver needs a lot of changes and some working cases will be affected. Another is the complexity of handling the generation of slice_headers in multislice. Anyway the intel driver seems to be using fixed size arrays for referencing packed headers (packed_header_param[4] and packed_header_data[4]) which needs to be changed if there are multiple slices per frame since we have to submit packed headers for each slice. isn't it? Yes. If more packed data is passed, the driver needs to be changed. Another problem is that it is possible that one frame has multiple slices. In such case the user should pass the multiple slice header info. This is not an an issue. The API mandates that the packed headers are provided in order. Regards, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva ___ Libva mailing list
Re: [Libva] [PATCH libva-intel-driver master 2/2] Return error when trying to decoding an interlaced VC-1 video
On Fri, 2014-05-09 at 12:20 +0200, Gwenole Beauchesne wrote: Hi, 2014-05-05 7:00 GMT+02:00 Xiang, Haihao haihao.xi...@intel.com: From: Xiang, Haihao haihao.xi...@intel.com https://bugs.freedesktop.org/show_bug.cgi?id=77386 Signed-off-by: Xiang, Haihao haihao.xi...@intel.com --- src/i965_decoder_utils.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/i965_decoder_utils.c b/src/i965_decoder_utils.c index 2533381..617bc15 100644 --- a/src/i965_decoder_utils.c +++ b/src/i965_decoder_utils.c @@ -654,7 +654,12 @@ intel_decoder_check_vc1_parameter(VADriverContextP ctx, VAPictureParameterBufferVC1 *pic_param = (VAPictureParameterBufferVC1 *)decode_state-pic_param-buffer; struct object_surface *obj_surface; int i = 0; - + +if (pic_param-sequence_fields.bits.interlace == 1 +pic_param-picture_fields.bits.frame_coding_mode != 0) { /* frame-interlace or field-interlace */ +return VA_STATUS_ERROR_DECODING_ERROR; +} + if (pic_param-picture_fields.bits.picture_type == 0 || pic_param-picture_fields.bits.picture_type == 3) { } else if (pic_param-picture_fields.bits.picture_type == 1 || -- In general, returning VA_STATUS_ERROR_DECODING_ERROR means returning a valid error for vaQuerySurfaceError() as a next step. If you want to return this type of error, you have to support this interface as well. It is my fault, I didn't go through va.h for VA_STATUS_ERROR_DECODING_ERROR semantics when I tried to find an appropriate error code. Maybe FLAG_NOT_SUPPORTED would be a more appropriate error, or at least the catchall Unknown one? I also considered VA_STATUS_ERROR_FLAG_NOT_SUPPORTED or VA_STATUS_ERROR_UNKNOWN. I think FLAG_NOT_SUPPORTED means driver doesn't support a pre-defined flag in VAAPI which is inappropriate for this error. I will replace the returned error with VA_STATUS_ERROR_UNKNOWN. Another option is we may define some new error codes for those unsupported features in {pic, slice} parameters. Thanks Haihao i.e. return something else, unless you implement vaQuerySurfaceError() interface, which also means extending the API to notify that some of the supplied {pic, slice} param fields are not supported. Thanks, Gwenole. ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.
2014-05-15 2:35 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Allow vaDeriveImage() to work with grayscale surfaces by only exposing the luma component. This is not necessary as the DeriveImage already supports the Y800 fourcc. What? Where? I don't call already supports as returning an error. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 589c00c..495ad44 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx, image-format.bits_per_pixel = 12; switch (image-format.fourcc) { +case VA_FOURCC_Y800: +image-num_planes = 1; +image-pitches[0] = w_pitch; /* Y */ +image-offsets[0] = 0; +break; + case VA_FOURCC_YV12: image-num_planes = 3; image-pitches[0] = w_pitch; /* Y */ ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.
Hi, 2014-05-15 3:34 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Optimize support for grayscale surfaces in two aspects: (i) space by only allocating the luminance component ; (ii) speed by avoiding initialization of the (now inexistent) chrominance planes. Keep backward compatibility with older codec layers that only supported YUV 4:2:0 and not grayscale formats properly. As a whole, I am OK to this version patch except two concerns. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/gen6_mfd.c |8 ++-- src/gen75_mfd.c |6 +- src/gen7_mfd.c |6 +- src/gen8_mfd.c |6 +- src/i965_decoder_utils.c | 23 +++ src/i965_drv_video.c | 22 ++ src/i965_drv_video.h |9 + 7 files changed, 71 insertions(+), 9 deletions(-) diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c index 2092f69..f925d98 100755 --- a/src/gen6_mfd.c +++ b/src/gen6_mfd.c @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx, { struct intel_batchbuffer *batch = gen6_mfd_context-base.batch; struct object_surface *obj_surface = decode_state-render_object; - +unsigned int surface_format; + +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + Can it work if it still set the PLANAR_420_8 format for the Y800 surface? No, MONO is the only specified and supported format to tell the MFX engine to disregard the chroma components. It does not seem to care of the supplied chroma_format_idc field to AVC_IMG_STATE. BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 19) | ((obj_surface-orig_width - 1) 6)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ (1 27) | /* must be 1 for interleave U/V, hardware requirement */ (0 22) | /* surface object control state, FIXME??? */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c index 5b023cf..895b194 100644 --- a/src/gen75_mfd.c +++ b/src/gen75_mfd.c @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 18) | ((obj_surface-orig_width - 1) 4)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ ((standard_select != MFX_FORMAT_JPEG) 27) | /* interleave chroma, set to 0 for JPEG */ (0 22) | /* surface object control state, ignored */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c index 70b1cec..2e0d653 100755 --- a/src/gen7_mfd.c +++ b/src/gen7_mfd.c @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -148,7 +152,7 @@ gen7_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 18) | ((obj_surface-orig_width - 1) 4)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ ((standard_select != MFX_FORMAT_JPEG) 27) | /* interleave chroma, set to 0 for JPEG */
Re: [Libva] [PATCH v2 intel-driver 3/8] surface: fix vaDeriveImage() for grayscale.
On Wed, 2014-05-14 at 22:14 -0600, Gwenole Beauchesne wrote: 2014-05-15 2:35 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Allow vaDeriveImage() to work with grayscale surfaces by only exposing the luma component. This is not necessary as the DeriveImage already supports the Y800 fourcc. What? Where? I don't call already supports as returning an error. This is added in the following commit: commit 9f9c505ed5212ae0704f71f45532b9716ac0bd51 Author: Zhong Li zhong...@intel.com Date: Mon Apr 14 02:17:42 2014 -0600 i965_DeriveImage() support JPEG color formats Signed-off-by: Zhong Li zhong...@intel.com Thanks. Yakui Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/i965_drv_video.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/i965_drv_video.c b/src/i965_drv_video.c index 589c00c..495ad44 100755 --- a/src/i965_drv_video.c +++ b/src/i965_drv_video.c @@ -3222,6 +3222,12 @@ VAStatus i965_DeriveImage(VADriverContextP ctx, image-format.bits_per_pixel = 12; switch (image-format.fourcc) { +case VA_FOURCC_Y800: +image-num_planes = 1; +image-pitches[0] = w_pitch; /* Y */ +image-offsets[0] = 0; +break; + case VA_FOURCC_YV12: image-num_planes = 3; image-pitches[0] = w_pitch; /* Y */ ___ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva
Re: [Libva] [PATCH v2 intel-driver 8/8] decoder: h264: optimize support for grayscale surfaces.
On Wed, 2014-05-14 at 22:28 -0600, Gwenole Beauchesne wrote: Hi, 2014-05-15 3:34 GMT+02:00 Zhao, Yakui yakui.z...@intel.com: On Wed, 2014-05-14 at 07:13 -0600, Gwenole Beauchesne wrote: Optimize support for grayscale surfaces in two aspects: (i) space by only allocating the luminance component ; (ii) speed by avoiding initialization of the (now inexistent) chrominance planes. Keep backward compatibility with older codec layers that only supported YUV 4:2:0 and not grayscale formats properly. As a whole, I am OK to this version patch except two concerns. Signed-off-by: Gwenole Beauchesne gwenole.beauche...@intel.com --- src/gen6_mfd.c |8 ++-- src/gen75_mfd.c |6 +- src/gen7_mfd.c |6 +- src/gen8_mfd.c |6 +- src/i965_decoder_utils.c | 23 +++ src/i965_drv_video.c | 22 ++ src/i965_drv_video.h |9 + 7 files changed, 71 insertions(+), 9 deletions(-) diff --git a/src/gen6_mfd.c b/src/gen6_mfd.c index 2092f69..f925d98 100755 --- a/src/gen6_mfd.c +++ b/src/gen6_mfd.c @@ -130,7 +130,11 @@ gen6_mfd_surface_state(VADriverContextP ctx, { struct intel_batchbuffer *batch = gen6_mfd_context-base.batch; struct object_surface *obj_surface = decode_state-render_object; - +unsigned int surface_format; + +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + Can it work if it still set the PLANAR_420_8 format for the Y800 surface? No, MONO is the only specified and supported format to tell the MFX engine to disregard the chroma components. It does not seem to care of the supplied chroma_format_idc field to AVC_IMG_STATE. The following is what I got from the spec. For video codec, it should set to 4 always In such case the the chroma_format_idc field will be configured in AVC_IMAGE_STATE to assure that the chroma components of grayscale are not touched in decoding. So the 420_8 format is still appropriate for the grayscale surface. BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -138,7 +142,7 @@ gen6_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 19) | ((obj_surface-orig_width - 1) 6)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ (1 27) | /* must be 1 for interleave U/V, hardware requirement */ (0 22) | /* surface object control state, FIXME??? */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen75_mfd.c b/src/gen75_mfd.c index 5b023cf..895b194 100644 --- a/src/gen75_mfd.c +++ b/src/gen75_mfd.c @@ -137,12 +137,16 @@ gen75_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2)); OUT_BCS_BATCH(batch, 0); @@ -150,7 +154,7 @@ gen75_mfd_surface_state(VADriverContextP ctx, ((obj_surface-orig_height - 1) 18) | ((obj_surface-orig_width - 1) 4)); OUT_BCS_BATCH(batch, - (MFX_SURFACE_PLANAR_420_8 28) | /* 420 planar YUV surface */ + (surface_format 28) | /* 420 planar YUV surface */ ((standard_select != MFX_FORMAT_JPEG) 27) | /* interleave chroma, set to 0 for JPEG */ (0 22) | /* surface object control state, ignored */ ((obj_surface-width - 1) 3) | /* pitch */ diff --git a/src/gen7_mfd.c b/src/gen7_mfd.c index 70b1cec..2e0d653 100755 --- a/src/gen7_mfd.c +++ b/src/gen7_mfd.c @@ -135,12 +135,16 @@ gen7_mfd_surface_state(VADriverContextP ctx, struct object_surface *obj_surface = decode_state-render_object; unsigned int y_cb_offset; unsigned int y_cr_offset; +unsigned int surface_format; assert(obj_surface); y_cb_offset = obj_surface-y_cb_offset; y_cr_offset = obj_surface-y_cr_offset; +surface_format = obj_surface-fourcc == VA_FOURCC_Y800 ? +MFX_SURFACE_MONOCHROME : MFX_SURFACE_PLANAR_420_8; + BEGIN_BCS_BATCH(batch, 6); OUT_BCS_BATCH(batch, MFX_SURFACE_STATE | (6 - 2));