On 16.11.2011 15:38, Maarten Lankhorst wrote:
Hey,

On 11/16/2011 02:47 PM, Christian König wrote:
On 15.11.2011 17:52, Maarten Lankhorst wrote:
Deleted:
- begin_frame/end_frame: Was only useful for XvMC, should be folded into flush..
I'm not completely happy with the current interface also, but if you remove the 
state tracker ability to control how many buffers are used, which in turn lets 
the VDPAU and XvMC state tracker use the same buffering algorithm. This is 
quite bad, because for XvMC we are forced to use a buffer for each destination 
surface, while in the VDPAU/VAAPI case a much simpler method should be used.
I wasn't aware of that, the patch could easily be changed for entrypoint<= 
bitstream.
Only create a single decoder buffer, otherwise attach it to the surface.
I still think it should be removed from the state tracker though, and it would
still be possible..
I would rather pass another parameter to the decoder create function, like "scatter_decode" or something like that. Younes originally had a parameter just to describe the possible different calling conventions for decode, but that covered only slice vs. frame based buffering and I didn't understand at what he tried todo with it at that time.

I would also suggest to separate the detection the start of a new frame into a function and make this function a public interface in vl_mpeg12_decoder_h, so that other drivers can use this one for their handling as well.

- set_quant_matrix/set_reference_frames:
      they should become part of picture_desc,
      not all codecs deal with it in the same way,
      and some may not have all of the calls.
That is true, but also intended. The idea behind it is that it is not necessary 
for all codecs to set all buffer types, but instead inform the driver that a 
specific buffer type has changed. This gives the driver the ability to know 
which parameters has changed between calls to decode, and so only perform the 
(probably expensive) update of those parameters on the hardware side.
This is why I used the flush call for XvMC as a way to signal parameter 
changes. If you looked at the
decode_macroblock for g3dvl, it checks if (dec->current_buffer == 
target->decode_buffer) in
begin_frame, so it doesn't flush state.

For VA-API, I haven't had the .. pleasure(?) of playing around with it, and it 
seems
to be only stubs.
That's correct, the VA-API implementation isn't completed, and I wouldn't call it a pleasure to work with it. Intels VA-API specification seems to be a bit in a flux, moving structure members around and sometimes even changing some calling conventions from one version to another.

But they still have one very nice idea to set certain parameters once and then never change seem, for example:

|/*
 *  For application, e.g. set a new bitrate
 *    VABufferID buf_id;
 *    VAEncMiscParameterBuffer *misc_param;
 *    VAEncMiscParameterRateControl *misc_rate_ctrl;
 *
 *    vaCreateBuffer(dpy, context, VAEncMiscParameterBufferType,
 *              sizeof(VAEncMiscParameterBuffer) + 
sizeof(VAEncMiscParameterRateControl),
 *              1, NULL,&buf_id);
 *
 *    vaMapBuffer(dpy,buf_id,(void **)&misc_param);
 *    misc_param->type = VAEncMiscParameterTypeRateControl;
 *    misc_rate_ctrl= (VAEncMiscParameterRateControl *)misc_param->data;
 *    misc_rate_ctrl->bits_per_second = 6400000;
 *    vaUnmapBuffer(dpy, buf_id);
 *    vaRenderPicture(dpy, context,&buf_id, 1);
 */

|

For me that means a big simplification, because I don't need to check every frame if certain parameters has changed that would otherwise lead to a whole bunch of overhead like flushes or even the need to reset the whole GPU block and load new parameters etc....

If the decode_bitstream interface is changed to get all bitstream buffers at 
the same time,
there wouldn't be overhead to doing it like this. For a single picture it's 
supposed to stay constant,
so for vdpau the sane way would be: set picture parameters for hardware 
(includes EVERYTHING),
write all bitstream buffers to a hardware bo, wait until magic is done. Afaict, 
there isn't even a sane
way to only submit partial buffers, so it's just a bunch of overhead for me.

nvidia doesn't support va-api, it handles the entire process from picture 
parameters
to a decoded buffer internally so it always convert the picture parameters into
something the hardware can understand, every frame.
I'm not arguing against removing the scattered calls to decode_bitstream. I just don't want to lose information while passing the parameters from the state tracker down to the driver. But we can also add this information as a flag to the function later, so on a second thought that seems to be ok.

- set_picture_parameters: Can be passed to decode_bitstream/macroblock
Same as above, take a look at the enum VABufferType  in the VAAPI specification 
(http://cgit.freedesktop.org/libva/tree/va/va.h).
- set_decode_target: Idem
- create_decode_buffer/set_decode_buffer/destroy_decode_buffer:
      Even if a decoder wants it, the state tracker has no business knowing
      about it.
Unfortunately that isn't true, only VDPAU hasn't a concept of giving the 
application a certain control over the buffers. XvMC has the implicit 
association of internal decode buffers with destination surfaces, and VAAPI 
gives the application both an one shot copy and a map/unmap interface to 
provide buffer content.

When we want to avoid unnecessary coping around of buffer content then this 
interface has to become even more complicated, like new buffer types, using 
create user buffer, mapping/unmapping of to be decoded data buffers etc.
There is no guarantee a user created video buffer can be used as reference 
frame, I honestly don't even know if it's possible.
Oh, I'm not talking about reference frames here, I'm talking about ALL kind of buffers. As far as I know using a manually uploaded video buffer as reference frame isn't possible with any of the major hardware vendors, and I can tell that it definitely doesn't work for Radeon hardware.

Furthermore I don't see any code in state_trackers/va except some stubs, so I 
cant discuss an interface that's not even there to begin with.
The VA-API state tracker is only a stub, but the specification is fairly complete, and it was a merge requirement to support at least XvMC,VDPAU,VA-API and even DXVA with it.

flush is changed to only flush a single pipe_video_buffer,
this should reduce the state that needs to be maintained for XvMC otherwise.
I can't really see how is that improving the situation, all it does is reducing 
the interface to what is needed for VDPAU, but totally ignoring requirements 
for VAAPI for example.
See above.

Note: internally you can still use those calls, as long as the *public* api
would be reduced to this, pipe_video_buffer is specific to each driver,
so you can put in a lot of state there if you choose to do so. For example,
quant matrix is not used in vc-1, and h264 will need a different
way for set_reference_frames, see struct VdpReferenceFrameH264. This is why
I want to remove those calls, and put them into picture_desc..
Correct, those calls need to be extended to support more codecs, but just 
stuffing everything into a single structure also doesn't seems to be a solution 
that will work for a wider view of different state trackers or hardware.
All hardware has its own unique approach with their own api, so it would make 
sense if there is a separate call for decode_va too,
with maybe some bits that tell what changed, so it only has to upload those 
specific parts..
Hui? Are you kidding? Gallium3Ds goal is to provide an unified and abstracted view of todays video hardware, and by introducing just a specific call to implement a single API we definitely don't work into that direction.

struct pipe_video_buffer I'm less sure about, especially wrt interlacing
and alignment it needs more thought. height and width are aligned to 64
on nvidia, and requires a special tiling flag. The compositor will need
to become aware of interlacing, to be able to play back interlaced videos.
Alignment isn't so much of an issue. Take a look at vl_video_buffer_create, it 
is already aligning the width/height to a power of two if the hardware needs 
that. So it is perfectly legal for a video buffer to be larger than the 
requested size, this is already tested with the current color conversion code 
and should work fine with both XvMC and VDPAU.

I honestly haven't read up on interlacing yet, and haven't found a video
to test it with, so that part is just speculative crap, and might change
when I find out more about it.
Regarding deinterlacing I'm also not really sure what we should do about it, 
but all interfaces (XvMC/VDPAU/VAAPI/DXVA) seems to treat surfaces as 
progressive. So I think the output of a decode call should already be in a 
progressive format (or can be at least handled as one for sampling).
No, they present the surfaces as progressive, but that doesn't necessarily mean 
they are,
nothing would prevent it from just seeing GetBitsYCbCr as a special (slow) case.
Yeah, exactly that was what I had in mind, but since we doesn't have a consensus about how the internal memory layout of the buffers really are don't win anything by making them a public interface.

Considering how different the approach for vdpau and va-api is,
wouldn't it just make sense to handle va-api separately then?
No I don't think so, the rest of the gallium interfaces are just made around mapping/unmapping of buffers with the help of transfer objects. This interface has proven to be flexible to implement a couple of different apis, so I don't really see a need to create another interface just for VDPAU.

Testing interlaced  videos that decode correctly with nvidia vdpau would help
a lot to figure out what the proper way to handle interlacing would be, so if
someone has a bunch that play with nvidia accelerated vdpau&   mplayer 
correctly,
could you please link them? ;)
I will try to look around some more, but at least for MPEG2 I haven't found any 
interlaced videos for testing also. I will leave you a note if I find something.
/**
   * output for decoding / input for displaying
   */
struct pipe_video_buffer
{
     struct pipe_context *context;

     enum pipe_format buffer_format;
     // Note: buffer_format may change as a result of put_bits, or call to 
decode_bitstream
     // afaict there is no guarantee a buffer filled with put_bits can be used 
as reference
     // frame to decode_bitstream
Altering the video buffer format as a result to an application putting video 
content manually into it is something very VDPAU specific. So I think it would 
be better to let the state tracker check if the new content matches what is in 
the buffer currently and if the need arise recreate the buffer with a new 
format.
This is only an optimization, I did it for g3dvl because it's faster in the 
case where PutBitsYCbCr is used when
mplayer is not using vdpau. Eventually all buffers will have the correct 
layout, and it ends up being a few
memcpy's. For nvidia the format will have to stay NV12, so I can't change 
format in that case..

     enum pipe_video_chroma_format chroma_format;
     unsigned width;
     unsigned height;

     enum pipe_video_interlace_type layout;
     // progressive
     // even and odd lines are split
     // interlaced, top field valid only (half height)
     // interlaced, bottom field valid only
     // I'm really drawing a blank what would be sane here, since interlacing 
has a ton of
     // codec specific information, and current design doesn't handle it at 
all..
The concrete memory layout of the buffer is implementation specific, so I think 
things like interlacing and/or special tilling modes are something that the 
driver needs to handle and should not be exposed to the state tracker.
I don't think so..

  * \defgroup VdpVideoMixer VdpVideoMixer; Video Post-processing \
  *           and Compositing object
  *
  * VdpVideoMixer can perform some subset of the following
  * post-processing steps on video:
  * - De-interlacing
  *   - Various types, with or without inverse telecine
  * - Noise-reduction
  * - Sharpness adjustment
  * - Color space conversion to RGB
  * - Chroma format upscaling to 4:4:4

Regardless of being va-api/vdpau/xvmc, those steps need to be performed, for 
all hardware.
Deinterlacing shouldn't be done in hardware specific bits, and this step could 
also easily
be changed to include frames that are split up too..
Wait a moment, you are mixing two different things here:

It's one thing to provide a top/bottom/planar view to use a video buffer as a source for a shader, that's what get_sampler_view_planes is good for (we still have to add that parameter to that function).

And it is a complete different thing to have a internal memory layout of separated top and bottom lines, because this layout is hardware specific, for example you could have all top lines first (luma and chroma) and the the bottom lines, or you can have luma top, then luma bottom, then chroma top then chroma bottom etc... Paired with different tilling modes that gives you probably a couple of hundred different memory layouts. It doesn't make sense to export this to a state tracker.

     /**
      * destroy this video buffer
      */
     void (*destroy)(struct pipe_video_buffer *buffer);

     /**
      * get a individual sampler view for each component
      */
     struct pipe_sampler_view **(*get_sampler_view_components)(struct 
pipe_video_buffer *buffer);
     // Note: for progressive split in 2 halfs, would probably need up 6...
     // top Y, bottom Y, top CbCr, bottom CbCr

     /**
      * get a individual surfaces for each plane
      */
     struct pipe_surface **(*get_surfaces)(struct pipe_video_buffer *buffer);

     /**
      * write bits to a video buffer, possibly altering the format of this 
video buffer
      */
     void (*put_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
                      void const *const *source_data, uint32_t const 
*source_pitches);

     /**
      * read bits from a video buffer
      */
     void (*get_bits)(struct pipe_video_buffer *buffer, enum pipe_format format,
                      void *const*source_data, uint32_t const *source_pitches);
};
Having put_bits and get_bits implementations are actually a bad idea, because 
they assume that the state tracker has a pointer where data is stored. But some 
state trackers (especially VAAPI) needs an map/unmap like interface, where the 
driver is returning a pointer and pitch to some linear aligned memory. Like 
with tilling that sometimes requires the buffer to be copied into a linear 
memory layout, that was at least the way we handled it inside gallium so far.

Over all it is good to know that somebody is still looking into it.
No, it's a good idea if you have hardware that requires a specific layout in 
order to use the video buffer as a reference frame...
No, a buffer that was filled with put_bits isn't usable as a reference frame, so all you have todo is keeping an internal information about the buffer layout, and use that while creating the samplers, a decode operation on a buffer is setting the layout to whatever your decoder outputs, and mapping a buffer sets that layout to linear. That's at least how it is handled for tilling, and it seems to work fine there.

Regards,
Christian.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to