Re: [Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Christian König Mon, 21 Nov 2011 08:49:36 -0800

On 16.11.2011 15:38, Maarten Lankhorst wrote:

Hey,


On 11/16/2011 02:47 PM, Christian König wrote:

On 15.11.2011 17:52, Maarten Lankhorst wrote:

Deleted:
- begin_frame/end_frame: Was only useful for XvMC, should be folded into flush..

I'm not completely happy with the current interface also, but if you remove the 
state tracker ability to control how many buffers are used, which in turn lets 
the VDPAU and XvMC state tracker use the same buffering algorithm. This is 
quite bad, because for XvMC we are forced to use a buffer for each destination 
surface, while in the VDPAU/VAAPI case a much simpler method should be used.

I wasn't aware of that, the patch could easily be changed for entrypoint<= 
bitstream.
Only create a single decoder buffer, otherwise attach it to the surface.
I still think it should be removed from the state tracker though, and it would
still be possible..

I would rather pass another parameter to the decoder create function,like "scatter_decode" or something like that. Younes originally had aparameter just to describe the possible different calling conventionsfor decode, but that covered only slice vs. frame based buffering and Ididn't understand at what he tried todo with it at that time.

I would also suggest to separate the detection the start of a new frameinto a function and make this function a public interface invl_mpeg12_decoder_h, so that other drivers can use this one for theirhandling as well.

- set_quant_matrix/set_reference_frames:
      they should become part of picture_desc,
      not all codecs deal with it in the same way,
      and some may not have all of the calls.

That is true, but also intended. The idea behind it is that it is not necessary 
for all codecs to set all buffer types, but instead inform the driver that a 
specific buffer type has changed. This gives the driver the ability to know 
which parameters has changed between calls to decode, and so only perform the 
(probably expensive) update of those parameters on the hardware side.

This is why I used the flush call for XvMC as a way to signal parameter 
changes. If you looked at the
decode_macroblock for g3dvl, it checks if (dec->current_buffer == 
target->decode_buffer) in
begin_frame, so it doesn't flush state.

For VA-API, I haven't had the .. pleasure(?) of playing around with it, and it 
seems
to be only stubs.

That's correct, the VA-API implementation isn't completed, and Iwouldn't call it a pleasure to work with it. Intels VA-API specificationseems to be a bit in a flux, moving structure members around andsometimes even changing some calling conventions from one version toanother.

But they still have one very nice idea to set certain parameters onceand then never change seem, for example:


|/*
 *  For application, e.g. set a new bitrate
 *    VABufferID buf_id;
 *    VAEncMiscParameterBuffer *misc_param;
 *    VAEncMiscParameterRateControl *misc_rate_ctrl;
 *
 *    vaCreateBuffer(dpy, context, VAEncMiscParameterBufferType,
 *              sizeof(VAEncMiscParameterBuffer) + 
sizeof(VAEncMiscParameterRateControl),
 *              1, NULL,&buf_id);
 *
 *    vaMapBuffer(dpy,buf_id,(void **)&misc_param);
 *    misc_param->type = VAEncMiscParameterTypeRateControl;
 *    misc_rate_ctrl= (VAEncMiscParameterRateControl *)misc_param->data;
 *    misc_rate_ctrl->bits_per_second = 6400000;
 *    vaUnmapBuffer(dpy, buf_id);
 *    vaRenderPicture(dpy, context,&buf_id, 1);
 */

|

For me that means a big simplification, because I don't need to checkevery frame if certain parameters has changed that would otherwise leadto a whole bunch of overhead like flushes or even the need to reset thewhole GPU block and load new parameters etc....

If the decode_bitstream interface is changed to get all bitstream buffers at 
the same time,
there wouldn't be overhead to doing it like this. For a single picture it's 
supposed to stay constant,
so for vdpau the sane way would be: set picture parameters for hardware 
(includes EVERYTHING),
write all bitstream buffers to a hardware bo, wait until magic is done. Afaict, 
there isn't even a sane
way to only submit partial buffers, so it's just a bunch of overhead for me.

nvidia doesn't support va-api, it handles the entire process from picture 
parameters
to a decoded buffer internally so it always convert the picture parameters into
something the hardware can understand, every frame.

I'm not arguing against removing the scattered calls todecode_bitstream. I just don't want to lose information while passingthe parameters from the state tracker down to the driver. But we canalso add this information as a flag to the function later, so on asecond thought that seems to be ok.

- set_picture_parameters: Can be passed to decode_bitstream/macroblock

Same as above, take a look at the enum VABufferType  in the VAAPI specification 
(http://cgit.freedesktop.org/libva/tree/va/va.h).

- set_decode_target: Idem
- create_decode_buffer/set_decode_buffer/destroy_decode_buffer:
      Even if a decoder wants it, the state tracker has no business knowing
      about it.

Unfortunately that isn't true, only VDPAU hasn't a concept of giving the 
application a certain control over the buffers. XvMC has the implicit 
association of internal decode buffers with destination surfaces, and VAAPI 
gives the application both an one shot copy and a map/unmap interface to 
provide buffer content.

When we want to avoid unnecessary coping around of buffer content then this 
interface has to become even more complicated, like new buffer types, using 
create user buffer, mapping/unmapping of to be decoded data buffers etc.

There is no guarantee a user created video buffer can be used as reference 
frame, I honestly don't even know if it's possible.

Oh, I'm not talking about reference frames here, I'm talking about ALLkind of buffers. As far as I know using a manually uploaded video bufferas reference frame isn't possible with any of the major hardwarevendors, and I can tell that it definitely doesn't work for Radeon hardware.

Furthermore I don't see any code in state_trackers/va except some stubs, so I 
cant discuss an interface that's not even there to begin with.

The VA-API state tracker is only a stub, but the specification is fairlycomplete, and it was a merge requirement to support at leastXvMC,VDPAU,VA-API and even DXVA with it.

flush is changed to only flush a single pipe_video_buffer,
this should reduce the state that needs to be maintained for XvMC otherwise.

I can't really see how is that improving the situation, all it does is reducing 
the interface to what is needed for VDPAU, but totally ignoring requirements 
for VAAPI for example.

See above.

Note: internally you can still use those calls, as long as the *public* api
would be reduced to this, pipe_video_buffer is specific to each driver,
so you can put in a lot of state there if you choose to do so. For example,
quant matrix is not used in vc-1, and h264 will need a different
way for set_reference_frames, see struct VdpReferenceFrameH264. This is why
I want to remove those calls, and put them into picture_desc..

Correct, those calls need to be extended to support more codecs, but just 
stuffing everything into a single structure also doesn't seems to be a solution 
that will work for a wider view of different state trackers or hardware.

All hardware has its own unique approach with their own api, so it would make 
sense if there is a separate call for decode_va too,
with maybe some bits that tell what changed, so it only has to upload those 
specific parts..

Hui? Are you kidding? Gallium3Ds goal is to provide an unified andabstracted view of todays video hardware, and by introducing just aspecific call to implement a single API we definitely don't work intothat direction.

struct pipe_video_buffer I'm less sure about, especially wrt interlacing
and alignment it needs more thought. height and width are aligned to 64
on nvidia, and requires a special tiling flag. The compositor will need
to become aware of interlacing, to be able to play back interlaced videos.

Alignment isn't so much of an issue. Take a look at vl_video_buffer_create, it 
is already aligning the width/height to a power of two if the hardware needs 
that. So it is perfectly legal for a video buffer to be larger than the 
requested size, this is already tested with the current color conversion code 
and should work fine with both XvMC and VDPAU.

I honestly haven't read up on interlacing yet, and haven't found a video
to test it with, so that part is just speculative crap, and might change
when I find out more about it.

Regarding deinterlacing I'm also not really sure what we should do about it, 
but all interfaces (XvMC/VDPAU/VAAPI/DXVA) seems to treat surfaces as 
progressive. So I think the output of a decode call should already be in a 
progressive format (or can be at least handled as one for sampling).

No, they present the surfaces as progressive, but that doesn't necessarily mean 
they are,
nothing would prevent it from just seeing GetBitsYCbCr as a special (slow) case.

Yeah, exactly that was what I had in mind, but since we doesn't have aconsensus about how the internal memory layout of the buffers really aredon't win anything by making them a public interface.

Considering how different the approach for vdpau and va-api is,
wouldn't it just make sense to handle va-api separately then?

No I don't think so, the rest of the gallium interfaces are just madearound mapping/unmapping of buffers with the help of transfer objects.This interface has proven to be flexible to implement a couple ofdifferent apis, so I don't really see a need to create another interfacejust for VDPAU.

Testing interlaced  videos that decode correctly with nvidia vdpau would help
a lot to figure out what the proper way to handle interlacing would be, so if
someone has a bunch that play with nvidia accelerated vdpau&   mplayer 
correctly,
could you please link them? ;)

I will try to look around some more, but at least for MPEG2 I haven't found any 
interlaced videos for testing also. I will leave you a note if I find something.

/**
   * output for decoding / input for displaying
   */
struct pipe_video_buffer
{
     struct pipe_context *context;

     enum pipe_format buffer_format;
     // Note: buffer_format may change as a result of put_bits, or call to 
decode_bitstream
     // afaict there is no guarantee a buffer filled with put_bits can be used 
as reference
     // frame to decode_bitstream

Altering the video buffer format as a result to an application putting video 
content manually into it is something very VDPAU specific. So I think it would 
be better to let the state tracker check if the new content matches what is in 
the buffer currently and if the need arise recreate the buffer with a new 
format.

This is only an optimization, I did it for g3dvl because it's faster in the 
case where PutBitsYCbCr is used when
mplayer is not using vdpau. Eventually all buffers will have the correct 
layout, and it ends up being a few
memcpy's. For nvidia the format will have to stay NV12, so I can't change 
format in that case..

     enum pipe_video_chroma_format chroma_format;
     unsigned width;
     unsigned height;

     enum pipe_video_interlace_type layout;
     // progressive
     // even and odd lines are split
     // interlaced, top field valid only (half height)
     // interlaced, bottom field valid only
     // I'm really drawing a blank what would be sane here, since interlacing 
has a ton of
     // codec specific information, and current design doesn't handle it at 
all..

The concrete memory layout of the buffer is implementation specific, so I think 
things like interlacing and/or special tilling modes are something that the 
driver needs to handle and should not be exposed to the state tracker.

I don't think so..

  * \defgroup VdpVideoMixer VdpVideoMixer; Video Post-processing \
  *           and Compositing object
  *
  * VdpVideoMixer can perform some subset of the following
  * post-processing steps on video:
  * - De-interlacing
  *   - Various types, with or without inverse telecine
  * - Noise-reduction
  * - Sharpness adjustment
  * - Color space conversion to RGB
  * - Chroma format upscaling to 4:4:4

Regardless of being va-api/vdpau/xvmc, those steps need to be performed, for 
all hardware.
Deinterlacing shouldn't be done in hardware specific bits, and this step could 
also easily
be changed to include frames that are split up too..

Wait a moment, you are mixing two different things here:

It's one thing to provide a top/bottom/planar view to use a video bufferas a source for a shader, that's what get_sampler_view_planes is goodfor (we still have to add that parameter to that function).

And it is a complete different thing to have a internal memory layout ofseparated top and bottom lines, because this layout is hardwarespecific, for example you could have all top lines first (luma andchroma) and the the bottom lines, or you can have luma top, then lumabottom, then chroma top then chroma bottom etc... Paired with differenttilling modes that gives you probably a couple of hundred differentmemory layouts. It doesn't make sense to export this to a state tracker.

     /**
      * destroy this video buffer
      */
     void (*destroy)(struct pipe_video_buffer *buffer);

     /**
      * get a individual sampler view for each component
      */
     struct pipe_sampler_view **(*get_sampler_view_components)(struct 
pipe_video_buffer *buffer);
     // Note: for progressive split in 2 halfs, would probably need up 6...
     // top Y, bottom Y, top CbCr, bottom CbCr

     /**
      * get a individual surfaces for each plane
      */
     struct pipe_surface **(*get_surfaces)(struct pipe_video_buffer *buffer);

     /**
      * write bits to a video buffer, possibly altering the format of this 
video buffer
      */
     void (*put_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
                      void const *const *source_data, uint32_t const 
*source_pitches);

     /**
      * read bits from a video buffer
      */
     void (*get_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
                      void *const*source_data, uint32_t const *source_pitches);
};

Having put_bits and get_bits implementations are actually a bad idea, because 
they assume that the state tracker has a pointer where data is stored. But some 
state trackers (especially VAAPI) needs an map/unmap like interface, where the 
driver is returning a pointer and pitch to some linear aligned memory. Like 
with tilling that sometimes requires the buffer to be copied into a linear 
memory layout, that was at least the way we handled it inside gallium so far.

Over all it is good to know that somebody is still looking into it.

No, it's a good idea if you have hardware that requires a specific layout in 
order to use the video buffer as a reference frame...

No, a buffer that was filled with put_bits isn't usable as a referenceframe, so all you have todo is keeping an internal information about thebuffer layout, and use that while creating the samplers, a decodeoperation on a buffer is setting the layout to whatever your decoderoutputs, and mapping a buffer sets that layout to linear. That's atleast how it is handled for tilling, and it seems to work fine there.


Regards,
Christian.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] reworking pipe_video_decoder / pipe_video_buffer

Reply via email to