On Tue, 4 Jun 2019 09:15:28 +0000 Jonas Karlman <jo...@kwiboo.se> wrote:
> On 2019-06-04 11:06, Thierry Reding wrote: > > On Tue, Jun 04, 2019 at 10:49:21AM +0200, Boris Brezillon wrote: > >> On Tue, 4 Jun 2019 10:31:57 +0200 > >> Thierry Reding <thierry.red...@gmail.com> wrote: > >> > >>>>>>> - Using flags > >>>>>>> > >>>>>>> The current MPEG-2 controls have lots of u8 values that can be > >>>>>>> represented as flags. Using flags also helps with padding. > >>>>>>> It's unlikely that we'll get more than 64 flags, so using a u64 by > >>>>>>> default for that sounds fine (we definitely do want to keep some room > >>>>>>> available and I don't think using 32 bits as a default is good > >>>>>>> enough). > >>>>>>> > >>>>>>> I think H.264/HEVC per-control flags should also be moved to u64. > >>>>>>> > >>>>>> There was also some concensus on this, that u64 should be good enough > >>>>>> for anything out there, though we obviously don't know what the future > >>>>>> will hold, so perhaps adding some way for possible extending this in > >>>>>> the > >>>>>> future might be good. I guess we'll get new controls for new codecs > >>>>>> anyway, so we can punt on this until then. > >>>>>> > >>>>>>> - Clear split of controls and terminology > >>>>>>> > >>>>>>> Some codecs have explicit NAL units that are good fits to match as > >>>>>>> controls: e.g. slice header, pps, sps. I think we should stick to the > >>>>>>> bitstream element names for those. > >>>>>>> > >>>>>>> For H.264, that would suggest the following changes: > >>>>>>> - renaming v4l2_ctrl_h264_decode_param to v4l2_ctrl_h264_slice_header; > >>>>>>> - killing v4l2_ctrl_h264_decode_param and having the reference lists > >>>>>>> where they belong, which seems to be slice_header; > >>>>> But now here it's being described per slice. When I look at the slice > >>>>> header, I only see list of modifications and when I look at userspace, > >>>>> That list is simply built from DPB, the modifications list found in the > >>>>> slice header seems to be only used to craft the l0/l1 list. > >>>> Yes, I think there was a misunderstanding which was then clarified > >>>> (unfortunately it happened on IRC, so we don't have a trace of this > >>>> discussion). The reference list should definitely be per-frame, and the > >>>> L0/L1 slice reflists are referring to the per-frame reference list (it's > >>>> just a sub-set of the per-frame reflist re-ordered differently). > >>>> > >>>>> There is one thing that come up though, if we enable per-frame decoding > >>>>> on top of per-slice decoder (like Cedrus), won't we force userspace to > >>>>> always compute l0/l1 even though the HW might be handling that ? > >>>> That's true, the question is, what's the cost of this extra re-ordering? > >>>> > >>> I think ultimately userspace is already forced to compute these lists > >>> even if some hardware may be able to do it in hardware. There's going to > >>> be other hardware that userspace wants to support that can't do it by > >>> itself, so userspace has to at least have the code anyway. What it could > >>> do on top of that decide not to run that code if it somehow detects that > >>> hardware can do it already. On the other hand this means that we have to > >>> expose a whole lot of capabilities to userspace and userspace has to go > >>> and detect all of them in order to parameterize all of the code. > >>> > >>> Ultimately I suspect many applications will just choose to pass the data > >>> all the time out of simplicity. I mean drivers that don't need it will > >>> already ignore it (i.e. they must not break if they get the extra data) > >>> so other than the potential runtime savings on some hardware, there are > >>> no advantages. > >>> > >>> Given that other APIs don't bother exposing this level of control to > >>> applications makes me think that it's just not worth it from a > >>> performance point of view. > >> That's not exactly what Nicolas proposed. He was suggesting that we > >> build those reflists kernel-side: V4L would provide an helper and > >> drivers that need those lists would use it, others won't. This way we > >> have no useless computation done, and userspace doesn't even have to > >> bother checking the device caps to avoid this extra step. > > Oh yeah, that sounds much better. I suppose one notable differences to > > other APIs is that they have to pass in buffers for all the frames in > > the DPB, so they basically have to build the lists in userspace. Since > > we'll end up looking up the frames in the kernel, it sounds reasonable > > to also build the lists in the kernel. > > Userspace must already process the modification list or it wont have correct > DPB for next frame. Can you point us to the code or the section in the spec that mentions/proves this dependency? I might have missed something, but my understanding was that the slice ref lists (or the list of modifications to apply to the list of long/short refs attached to a frame) had no impact on the list of long/short refs attached to the following frame.