2014-06-06 3:45 GMT+02:00 Zhao, Yakui <yakui.z...@intel.com>: > On Thu, 2014-06-05 at 10:20 -0600, Gwenole Beauchesne wrote: >> 2014-06-05 17:29 GMT+02:00 Xiang, Haihao <haihao.xi...@intel.com>: >> > >> > >> >> -----Original Message----- >> >> From: Gwenole Beauchesne [mailto:gb.de...@gmail.com] >> >> Sent: Thursday, June 05, 2014 9:29 PM >> >> To: Xiang, Haihao >> >> Cc: Zhao, Yakui; libva@lists.freedesktop.org >> >> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify >> >> and >> >> optimize reference frame store updates. >> >> >> >> Hi, >> >> >> >> 2014-06-05 14:44 GMT+02:00 Xiang, Haihao <haihao.xi...@intel.com>: >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: Libva [mailto:libva-boun...@lists.freedesktop.org] On Behalf Of >> >> >> Gwenole Beauchesne >> >> >> Sent: Thursday, June 05, 2014 5:20 PM >> >> >> To: Zhao, Yakui >> >> >> Cc: libva@lists.freedesktop.org >> >> >> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: >> >> >> simplify and optimize reference frame store updates. >> >> >> >> >> >> 2014-06-04 7:30 GMT+02:00 Zhao, Yakui <yakui.z...@intel.com>: >> >> >> > On Tue, 2014-06-03 at 22:59 -0600, Gwenole Beauchesne wrote: >> >> >> >> 2014-06-04 4:11 GMT+02:00 Zhao, Yakui <yakui.z...@intel.com>: >> >> >> >> > On Tue, 2014-06-03 at 10:43 -0600, Gwenole Beauchesne wrote: >> >> >> >> >> Simplify and optimize the update process of the reference frame >> >> store. >> >> >> >> >> Use less iterations to look up existing objects. Use a cache to >> >> >> >> >> store the free'd slots. >> >> >> >> >> >> >> >> >> >> Prerequisite: the reference_objects[] array was previously >> >> >> >> >> arranged in a way that the element at index i is exactly the >> >> >> >> >> object_surface that corresponds to the VA surface identified by >> >> >> >> >> the VAPictureH264.picture_id located at index i in the >> >> >> >> >> ReferenceFrames[] >> >> >> array. >> >> >> >> > >> >> >> >> > As a whole, this patch looks good to me. Based on the hardware >> >> >> >> > requirement, it will be better that one reference picture later >> >> >> >> > used is not assigned for the other purpose even when it is not >> >> >> >> > used in current reference frame list. In such case this patch >> >> >> >> > does some optimization about DPB, which can defer the usage of >> >> >> >> > one picture later used that is not in current reference >> >> >> >> > frame.especially >> >> when: >> >> >> >> > one picture is not used during decoding frame X while it is used >> >> >> >> > during >> >> >> decoding frame X + 1. >> >> >> >> >> >> >> >> The case you mention was supposed to be fixed with the other patch >> >> >> >> series where the re-factorisation is a pre-requisite. :) A very >> >> >> >> strict GenFrameStoreContext needs to be maintained with the set of >> >> >> >> used_frames[] as it is today, and an additional set of >> >> >> >> free_frames[] *candidates*. Though, this case is not covered by >> >> >> >> the conformance suite, and I haven't found any real world case >> >> >> >> exhibiting that, neither did I try to generate one yet. So we can >> >> >> >> defer this >> >> one. >> >> >> >> >> >> >> >> > Although it still has no improvment for the following scenario, >> >> >> >> > it still is an improvement. >> >> >> >> > >One picture is not used during decoding frame X and X + 1 >> >> >> >> > while it is used during decoding frame X + 2. >> >> >> >> > >> >> >> >> > Of course the perfect solution is that the upper layer can >> >> >> >> > assure that one picture later used is still added into the DPB >> >> >> >> > of current frame although it is not used in current reference >> >> >> >> > list. >> >> >> >> >> >> >> >> No, this is not going to happen. This would be an implementation >> >> >> >> error of the standard. >> >> >> > >> >> >> > In theory it is no problem for normal H264. But it is true for MVC >> >> >> > case as some reference frames for view 1 is not used when decoding >> >> >> > view >> >> 0. >> >> >> >> >> >> When I said this was not going to happen, this concerned the "perfect >> >> >> solution" you proposed, which is not one. That would be a workaround >> >> >> to driver bugs, and an error of implementation of the standard. Of >> >> >> course, the DPB holds all the necessary view components, but the >> >> >> VAPictureH264.ReferenceFrames[] is not exactly the DPB, and it cannot >> >> >> be. >> >> > >> >> > The number of elements in VAPictureH264.ReferenceFrames[] is 16, so it >> >> > is enough to hold All reference surfaces for H.264. and the patch works >> >> > as >> >> before, I am OK for your patch. >> >> > >> >> >> >> >> >> The case you mention is what I said a paragraph above, and the only >> >> "perfect" >> >> >> way to solve it is to correctly track the references in the driver >> >> >> side. This was what the GenFrameStoreContext was for. However, as I >> >> >> mentioned, there is no supporting case in the conformance test suite, >> >> >> and I haven't come by a real-world sample exhibiting this. >> >> > >> >> > The case Yakui mentioned is not for H.264 as a surface should be in >> >> > DPB if it is referenced by later Surface. >> >> > >> >> > Xiaowei told us each view uses a separated DPB buffer for MVC >> >> >> >> There is no such thing in MVC. There is a unique DPB with a size scalable >> >> by a >> >> log2 number of views. >> >> >> >> > A case might occurs as below: >> >> > >> >> > View0: f0, f1, f2, f3, ... >> >> > View1: s0, s1, s2, s3, ... >> >> > >> >> > f0 is referenced by f1, but f0 isn't referenced by other surfaces in >> >> > view0, f1 >> >> is reference by f2. >> >> > F0 is also referenced by s1 , both f0 and f1 are referenced by S2 in >> >> > view1 >> >> > >> >> > When decoding f1, f0 is added into View0's DPB, so it is added into >> >> > frame store too, the frame store index for f0 is *0*. >> >> > >> >> > When decoding s1, f0 is added into view1's DPB, it has been in the frame >> >> store, the frame store index for f0 is still *0*. >> >> > >> >> > When decoding f2, f0 is removed from view0's DPB and f1 is added into >> >> > view0's DPB, so f0 is removed from frame store and f1 is added into >> >> > frame store too, now the frame store index For f1 is 0, >> >> > >> >> > When decoding s2, f0 is still in View1's DPB so it will be added into >> >> > the frame store again, and now the frame store index for f0 is *1* >> >> > >> >> > The frame store index for f0 is changed from 0 to 1 and the decoding >> >> > result >> >> will be wrong. >> >> >> >> That's exactly the thing we are discussing since the beginning. :) >> > >> > The above case doesn't exist if MVC uses a unique DBP buffer as f0 is >> > still in the DBP after >> > Decoding f2, hence the driver won't remove f0 from the frame store. >> >> Not necessarily. Anyway, your described case does not exist at all >> because you can only have inter-view prediction with frames from the >> same access unit. So, you cannot have f0+f1 referenced by s2, only f2 >> and s0+s1 could be referenced by s2. But anyway, that's the idea. A >> valid example could be f1+f2 for s2, then f1+f2 for f3, then f3+s1+s2 >> for s3. >> >> Because of the additional constraint I gave that ReferenceFrames[] >> cannot always be the DPB because MVC is not limited to 2 views to >> begin with, the case I describe is what will give issues. This is >> because you cannot simply always append inter-view reference >> components in the ReferenceFrames[] array because this is not scalable >> for MultiView High with a higher number of views. This could work for >> Stereo High because the DPB is always capped to 16 frames at most. > >> Besides, you'd better see the ReferenceFrames[] array as RefPicList0[] >> U RefPicList1[] rather than the DPB. The DPB, as per the H.264 spec >> definition, could have more than 16 entries for Multiview High >> profile. And I don't really want to workaround/do different things for >> Stereo High profile vs. Multiview High profile. So, the >> ReferenceFrames[] is to be filled in with the reference frames >> effectively used (L0+L1) for decoding the current picture. > > If the ReferenceFrames[] only holds the RefPicList0 + RefPicList1 > instead of DPB, then it will trigger the issue Haihao and I mentioned > several times in previous emails. Some reference picture is not used in > RefPicList0/1 during decoding frame X while it will be used later in > decoding frame X+1. So the upper middles will add it into the > ReferenceFrames[] even although it is not used when decoding the current > frame. If not, the driver still can trigger the corresponding issue > although it can reduce the possibility of triggering issue after some > workaround.
Good, so you start to understand what we were discussing and the constraints around it. So, I am glad you finally see the neccessity to some code refactoring for tracking the refs frames correctly. :) So, see the patch11 of the new series, or go back to v2 with the GenFrameStoreContext addition. Regards, Gwenole. >> >> And there is no way to handle that from the unique GenFrameStore array as >> >> it >> >> is today, even if you delay the marking as "free", or whatever. >> >> You need a more elaborated data struct. And this is what was to be >> >> provided >> >> with the GenFrameStoreContext and additional re-factoring so that I don't >> >> have to copy code around again. >> >> >> >> I am working on an alternative that wouldn't need too many changes at this >> >> time. >> >> >> >> Regards, >> >> Gwenole. >> >> >> >> > >> >> >> >> >> >> >> >> Theory of operations: >> >> >> >> >> >> >> >> >> >> 1. Obsolete entries are removed first, i.e. entries in the >> >> >> >> >> internal DPB >> >> >> >> >> that no longer have a match in the supplied ReferenceFrames[] >> >> array. >> >> >> >> >> That obsolete entry index is stored in a local cache: >> >> >> >> >> free_slots[]. >> >> >> >> >> >> >> >> >> >> 2. This cache is completed with entries considered as "invalid" >> >> >> >> >> or "not >> >> >> >> >> present", sequentially while traversing the frame store for >> >> obsolete >> >> >> >> >> entries. At the end of this removal process, the free_slots[] >> >> >> >> >> array >> >> >> >> >> represents all possible indices in there that could be >> >> >> >> >> re-used for >> >> >> >> >> new reference frames to track. >> >> >> >> >> >> >> >> >> >> 3. The list of ReferenceFrames[] objects is traversed for new >> >> >> >> >> entries >> >> >> >> >> that are not already in the frame store. If an entry needs to >> >> >> >> >> be >> >> >> >> >> added, it is placed at the index obtained from the next >> >> >> >> >> free_slots[] >> >> >> >> >> element. There is no need to traverse the frame store array >> >> >> >> >> again, >> >> >> >> >> the next available slot can be known from that free_slots[] >> >> >> >> >> cache. >> >> >> >> >> >> >> >> >> >> Signed-off-by: Gwenole Beauchesne >> >> >> >> >> <gwenole.beauche...@intel.com> >> >> >> >> >> --- >> >> >> >> >> src/i965_decoder_utils.c | 107 >> >> >> >> >> ++++++++++++++++++++-------------------------- >> >> >> >> >> 1 file changed, 46 insertions(+), 61 deletions(-) >> >> >> >> >> >> >> >> >> >> diff --git a/src/i965_decoder_utils.c >> >> >> >> >> b/src/i965_decoder_utils.c index 2570be9..c686235 100644 >> >> >> >> >> --- a/src/i965_decoder_utils.c >> >> >> >> >> +++ b/src/i965_decoder_utils.c >> >> >> >> >> @@ -418,88 +418,73 @@ gen6_send_avc_ref_idx_state( } >> >> >> >> >> >> >> >> >> >> void >> >> >> >> >> -intel_update_avc_frame_store_index(VADriverContextP ctx, >> >> >> >> >> - struct decode_state >> >> >> *decode_state, >> >> >> >> >> - >> >> >> VAPictureParameterBufferH264 *pic_param, >> >> >> >> >> - GenFrameStore >> >> >> frame_store[MAX_GEN_REFERENCE_FRAMES]) >> >> >> >> >> +intel_update_avc_frame_store_index( >> >> >> >> >> + VADriverContextP ctx, >> >> >> >> >> + struct decode_state *decode_state, >> >> >> >> >> + VAPictureParameterBufferH264 *pic_param, >> >> >> >> >> + GenFrameStore >> >> >> frame_store[MAX_GEN_REFERENCE_FRAMES] >> >> >> >> >> +) >> >> >> >> >> { >> >> >> >> >> - int i, j; >> >> >> >> >> - >> >> >> >> >> - assert(MAX_GEN_REFERENCE_FRAMES == >> >> >> ARRAY_ELEMS(pic_param->ReferenceFrames)); >> >> >> >> >> - >> >> >> >> >> - for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { >> >> >> >> >> - int found = 0; >> >> >> >> >> - >> >> >> >> >> - if (frame_store[i].surface_id == VA_INVALID_ID || >> >> >> >> >> - frame_store[i].obj_surface == NULL) >> >> >> >> >> + int free_slots[MAX_GEN_REFERENCE_FRAMES]; >> >> >> >> >> + int i, j, n, num_free_slots; >> >> >> >> >> + bool found; >> >> >> >> >> + >> >> >> >> > >> >> >> >> > It will be better to fill the free_slots through the following >> >> >> >> > two loops instead of one loop, which is to try to defer the >> >> >> >> > usage of the reference picture later used that is not used in >> >> >> >> > current reference frame >> >> >> list. >> >> >> >> > a. check the invalid surface_id or NULL surface and then add it >> >> >> >> > into free_slots. >> >> >> >> > b. check whether one picture is used by the current reference >> >> >> >> > frame list. If not, add it into the free_slots. >> >> >> >> >> >> >> >> The previous algorithm was filling in the holes. This one does the >> >> >> >> same. I don't see where your proposal could bring an improvement? >> >> >> > >> >> >> > In your patch you use the free_slots to find one free slot to hold >> >> >> > a new reference picture in DPB. Right? It will firstly try to find >> >> >> > one free_slot with invalid picture_id or surface, which is used to >> >> >> > hold a new reference picture in DPB. >> >> >> >> >> >> No, it will take whatever comes first between: >> >> >> - a slot with an invalid picture_id or surface ; >> >> >> - or a slot with a retired reference picture candidate, i.e. >> >> >> something that cannot be found in the ReferenceFrames[] array. >> >> >> And this works exactly the same way as before. No functional change >> >> >> yet. >> >> >> >> >> >> > It can remove the possibility that one picture later used is >> >> >> > immediately used to hold a new reference picture in DPB. So based >> >> >> > on the above point IMO it is better to find the free_slots through >> >> >> > two steps to hold the new reference picture in DPB. >> >> >> >> >> >> This would only hide further this situation, but OK. >> >> >> >> >> >> Regards, >> >> >> Gwenole. >> >> >> >> >> >> >> >> + /* Remove obsolete entries from the internal DPB */ >> >> >> >> >> + for (i = 0, n = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { >> >> >> >> >> + GenFrameStore * const fs = &frame_store[i]; >> >> >> >> >> + if (fs->surface_id == VA_INVALID_ID || >> >> >> >> >> !fs->obj_surface) { >> >> >> >> >> + free_slots[n++] = i; >> >> >> >> >> continue; >> >> >> >> >> + } >> >> >> >> >> >> >> >> >> >> - assert(frame_store[i].frame_store_id != -1); >> >> >> >> >> - >> >> >> >> >> - for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { >> >> >> >> >> - VAPictureH264 *ref_pic = >> >> >> &pic_param->ReferenceFrames[j]; >> >> >> >> >> - if (ref_pic->flags & VA_PICTURE_H264_INVALID) >> >> >> >> >> + // Find whether the current entry is still a valid >> >> >> >> >> + reference >> >> >> frame >> >> >> >> >> + found = false; >> >> >> >> >> + for (j = 0; j < >> >> >> >> >> + ARRAY_ELEMS(decode_state->reference_objects); >> >> >> j++) { >> >> >> >> >> + struct object_surface * const obj_surface = >> >> >> >> >> + decode_state->reference_objects[j]; >> >> >> >> >> + if (!obj_surface) >> >> >> >> >> continue; >> >> >> >> >> - >> >> >> >> >> - if (frame_store[i].surface_id == >> >> >> >> >> ref_pic->picture_id) { >> >> >> >> >> - found = 1; >> >> >> >> >> + if ((found = obj_surface == fs->obj_surface)) >> >> >> >> >> break; >> >> >> >> >> - } >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> - /* remove it from the internal DPB */ >> >> >> >> >> + // ... or remove it >> >> >> >> >> if (!found) { >> >> >> >> >> - frame_store[i].surface_id = VA_INVALID_ID; >> >> >> >> >> - frame_store[i].frame_store_id = -1; >> >> >> >> >> - frame_store[i].obj_surface = NULL; >> >> >> >> >> + fs->surface_id = VA_INVALID_ID; >> >> >> >> >> + fs->obj_surface = NULL; >> >> >> >> >> + fs->frame_store_id = -1; >> >> >> >> >> + free_slots[n++] = i; >> >> >> >> >> } >> >> >> >> >> } >> >> >> >> >> + num_free_slots = n; >> >> >> >> >> >> >> >> >> >> - for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { >> >> >> >> >> - VAPictureH264 *ref_pic = &pic_param->ReferenceFrames[i]; >> >> >> >> >> - int found = 0; >> >> >> >> >> - >> >> >> >> >> - if (ref_pic->flags & VA_PICTURE_H264_INVALID || >> >> >> >> >> - ref_pic->picture_id == VA_INVALID_SURFACE || >> >> >> >> >> - decode_state->reference_objects[i] == NULL) >> >> >> >> >> + /* Append the new reference frames */ >> >> >> >> >> + for (i = 0, n = 0; i < >> >> >> >> >> + ARRAY_ELEMS(decode_state->reference_objects); >> >> >> i++) { >> >> >> >> >> + struct object_surface * const obj_surface = >> >> >> >> >> + decode_state->reference_objects[i]; >> >> >> >> >> + if (!obj_surface) >> >> >> >> >> continue; >> >> >> >> >> >> >> >> >> >> + // Find whether the current frame is not already in >> >> >> >> >> + our frame >> >> >> store >> >> >> >> >> + found = false; >> >> >> >> >> for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { >> >> >> >> >> - if (frame_store[j].surface_id == >> >> >> >> >> ref_pic->picture_id) { >> >> >> >> >> - found = 1; >> >> >> >> >> + if ((found = frame_store[j].obj_surface == >> >> >> >> >> + obj_surface)) >> >> >> >> >> break; >> >> >> >> >> - } >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> - /* add the new reference frame into the internal DPB */ >> >> >> >> >> + // ... or add it >> >> >> >> >> if (!found) { >> >> >> >> >> - int frame_idx; >> >> >> >> >> - int slot_found; >> >> >> >> >> - struct object_surface *obj_surface = >> >> >> decode_state->reference_objects[i]; >> >> >> >> >> - >> >> >> >> >> - slot_found = 0; >> >> >> >> >> - frame_idx = -1; >> >> >> >> >> - /* Find a free frame store index */ >> >> >> >> >> - for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { >> >> >> >> >> - if (frame_store[j].surface_id == VA_INVALID_ID >> >> >> >> >> || >> >> >> >> >> - frame_store[j].obj_surface == NULL) { >> >> >> >> >> - frame_idx = j; >> >> >> >> >> - slot_found = 1; >> >> >> >> >> - break; >> >> >> >> >> - } >> >> >> >> >> + if (n < num_free_slots) { >> >> >> >> >> + GenFrameStore * const fs = >> >> >> &frame_store[free_slots[n++]]; >> >> >> >> >> + fs->surface_id = obj_surface->base.id; >> >> >> >> >> + fs->obj_surface = obj_surface; >> >> >> >> >> + fs->frame_store_id = fs - frame_store; >> >> >> >> >> + } >> >> >> >> >> + else { >> >> >> >> >> + WARN_ONCE("No free slot found for DPB >> >> >> >> >> + reference list!!!\n"); >> >> >> >> >> } >> >> >> >> >> - >> >> >> >> >> - >> >> >> >> >> - if (slot_found) { >> >> >> >> >> - frame_store[j].surface_id = ref_pic->picture_id; >> >> >> >> >> - frame_store[j].frame_store_id = frame_idx; >> >> >> >> >> - frame_store[j].obj_surface = obj_surface; >> >> >> >> >> - } else { >> >> >> >> >> - WARN_ONCE("Not free slot for DPB reference >> >> >> list!!!\n"); >> >> >> >> >> - } >> >> >> >> >> } >> >> >> >> >> } >> >> >> >> >> - >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> void >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> > >> >> >> _______________________________________________ >> >> >> Libva mailing list >> >> >> Libva@lists.freedesktop.org >> >> >> http://lists.freedesktop.org/mailman/listinfo/libva > > _______________________________________________ Libva mailing list Libva@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libva