On Tue, 4 Jun 2019 09:15:28 +0000
Jonas Karlman <jo...@kwiboo.se> wrote:

> On 2019-06-04 11:06, Thierry Reding wrote:
> > On Tue, Jun 04, 2019 at 10:49:21AM +0200, Boris Brezillon wrote:  
> >> On Tue, 4 Jun 2019 10:31:57 +0200
> >> Thierry Reding <thierry.red...@gmail.com> wrote:
> >>  
> >>>>>>> - Using flags
> >>>>>>>
> >>>>>>> The current MPEG-2 controls have lots of u8 values that can be
> >>>>>>> represented as flags. Using flags also helps with padding.
> >>>>>>> It's unlikely that we'll get more than 64 flags, so using a u64 by
> >>>>>>> default for that sounds fine (we definitely do want to keep some room
> >>>>>>> available and I don't think using 32 bits as a default is good 
> >>>>>>> enough).
> >>>>>>>
> >>>>>>> I think H.264/HEVC per-control flags should also be moved to u64.     
> >>>>>>>  
> >>>>>> There was also some concensus on this, that u64 should be good enough
> >>>>>> for anything out there, though we obviously don't know what the future
> >>>>>> will hold, so perhaps adding some way for possible extending this in 
> >>>>>> the
> >>>>>> future might be good. I guess we'll get new controls for new codecs
> >>>>>> anyway, so we can punt on this until then.
> >>>>>>       
> >>>>>>> - Clear split of controls and terminology
> >>>>>>>
> >>>>>>> Some codecs have explicit NAL units that are good fits to match as
> >>>>>>> controls: e.g. slice header, pps, sps. I think we should stick to the
> >>>>>>> bitstream element names for those.
> >>>>>>>
> >>>>>>> For H.264, that would suggest the following changes:
> >>>>>>> - renaming v4l2_ctrl_h264_decode_param to v4l2_ctrl_h264_slice_header;
> >>>>>>> - killing v4l2_ctrl_h264_decode_param and having the reference lists
> >>>>>>> where they belong, which seems to be slice_header;      
> >>>>> But now here it's being described per slice. When I look at the slice
> >>>>> header, I only see list of modifications and when I look at userspace,
> >>>>> That list is simply built from DPB, the modifications list found in the
> >>>>> slice header seems to be only used to craft the l0/l1 list.    
> >>>> Yes, I think there was a misunderstanding which was then clarified
> >>>> (unfortunately it happened on IRC, so we don't have a trace of this
> >>>> discussion). The reference list should definitely be per-frame, and the
> >>>> L0/L1 slice reflists are referring to the per-frame reference list (it's
> >>>> just a sub-set of the per-frame reflist re-ordered differently).
> >>>>     
> >>>>> There is one thing that come up though, if we enable per-frame decoding
> >>>>> on top of per-slice decoder (like Cedrus), won't we force userspace to
> >>>>> always compute l0/l1 even though the HW might be handling that ?    
> >>>> That's true, the question is, what's the cost of this extra re-ordering? 
> >>>>    
> >>> I think ultimately userspace is already forced to compute these lists
> >>> even if some hardware may be able to do it in hardware. There's going to
> >>> be other hardware that userspace wants to support that can't do it by
> >>> itself, so userspace has to at least have the code anyway. What it could
> >>> do on top of that decide not to run that code if it somehow detects that
> >>> hardware can do it already. On the other hand this means that we have to
> >>> expose a whole lot of capabilities to userspace and userspace has to go
> >>> and detect all of them in order to parameterize all of the code.
> >>>
> >>> Ultimately I suspect many applications will just choose to pass the data
> >>> all the time out of simplicity. I mean drivers that don't need it will
> >>> already ignore it (i.e. they must not break if they get the extra data)
> >>> so other than the potential runtime savings on some hardware, there are
> >>> no advantages.
> >>>
> >>> Given that other APIs don't bother exposing this level of control to
> >>> applications makes me think that it's just not worth it from a
> >>> performance point of view.  
> >> That's not exactly what Nicolas proposed. He was suggesting that we
> >> build those reflists kernel-side: V4L would provide an helper and
> >> drivers that need those lists would use it, others won't. This way we
> >> have no useless computation done, and userspace doesn't even have to
> >> bother checking the device caps to avoid this extra step.  
> > Oh yeah, that sounds much better. I suppose one notable differences to
> > other APIs is that they have to pass in buffers for all the frames in
> > the DPB, so they basically have to build the lists in userspace. Since
> > we'll end up looking up the frames in the kernel, it sounds reasonable
> > to also build the lists in the kernel.  
> 
> Userspace must already process the modification list or it wont have correct 
> DPB for next frame.

Can you point us to the code or the section in the spec that
mentions/proves this dependency? I might have missed something, but my
understanding was that the slice ref lists (or the list of
modifications to apply to the list of long/short refs attached to a
frame) had no impact on the list of long/short refs attached to the
following frame.

Reply via email to