Re: [FFmpeg-devel] [PATCH] lavf/mov: ignore ctts entries that do not apply to a least one sample
On Fri, Jun 17, 2016 at 01:26:10AM +0200, Michael Niedermayer wrote: > On Thu, Jun 16, 2016 at 05:26:14PM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > Fixes packet pts of samples which contain ctts entries with count=0. > > --- > > > > Hello, > > > > The following patch fixes packet pts of samples which contain ctts values > > with > > count=0 (so the ctts entry does not apply to any sample if I understand > > correctly). Such samples are produced by a LG G4 phone. I don't have any > > sample I can share at the moment (and thus no fate test following this patch > > yet). > > > > An alternative to this patch is to remove directly the entry when the ctts > > atom > > is parsed. Would you prefer this alternative ? > > i dont know what is preferred but i agree about either solution > > removing them on load would avoid any issues with ctts_count > 0 > and no real entries, i dont know though if that ever matters I've attached the alternative patch that removes the CTTS entries with count <= 0 at parsing time. I think it's better in the end (I first liked the idea to keep the ctts table as is in memory but after some thoughts I think it's really useful). Anyway I'll go with whatever patch you prefer. Matthieu [...] >From 3bf2a6a81b8cca09bee4c0b6ef6f6ce78e276f0d Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Thu, 16 Jun 2016 13:16:52 +0200 Subject: [PATCH] lavf/mov: ignore ctts that do not apply to a least one sample Fixes packet pts of samples which contain ctts values with count <= 0. --- libavformat/mov.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/libavformat/mov.c b/libavformat/mov.c index 57a0354..8eab34c 100644 --- a/libavformat/mov.c +++ b/libavformat/mov.c @@ -2574,7 +2574,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom) { AVStream *st; MOVStreamContext *sc; -unsigned int i, entries; +unsigned int i, entries, ctts_count = 0; if (c->fc->nb_streams < 1) return 0; @@ -2600,8 +2600,16 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom) int count=avio_rb32(pb); int duration =avio_rb32(pb); -sc->ctts_data[i].count = count; -sc->ctts_data[i].duration= duration; +if (count <= 0) { +av_log(c->fc, AV_LOG_TRACE, +"ignoring CTTS entry with count=%d duration=%d\n", +count, duration); +continue; +} + +sc->ctts_data[ctts_count].count= count; +sc->ctts_data[ctts_count].duration = duration; +ctts_count++; av_log(c->fc, AV_LOG_TRACE, "count=%d, duration=%d\n", count, duration); @@ -2617,7 +2625,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom) mov_update_dts_shift(sc, duration); } -sc->ctts_count = i; +sc->ctts_count = ctts_count; if (pb->eof_reached) return AVERROR_EOF; -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability
On Sun, Jun 19, 2016 at 06:01:49PM +0200, Matthieu Bouron wrote: > On Fri, Jun 17, 2016 at 09:47:35AM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils > > to overwrite the field with incorrect values as the decoder is > > asynchronous. > > If there is no objection, I will push the patch in one day. Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS
On Mon, Jun 13, 2016 at 02:37:29PM +0200, Matthieu Bouron wrote: > On Mon, Jun 13, 2016 at 12:23:07PM +0200, Hendrik Leppkes wrote: > > On Mon, Jun 13, 2016 at 11:51 AM, Matthieu Bouron > > wrote: > > > On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote: > > >> From: Matthieu Bouron > > >> > > >> Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS > > >> to be set in their respective csd-{0,1} buffers. > > >> --- > > >> > > >> Hello, > > >> > > >> The attached patch fixes playback of HLS streams on MediaTek devices > > >> which > > >> requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead > > >> of > > >> having sps+pps in the csd-0 which works on other devices). > > >> > > >> I'm not sure if I can use the ff_h264_decode_extradata this way (or at > > >> least > > >> initialize the H264Context with zeroes minus the avctx field). > > > > > > Rebased patch (after the h264 ps merged) attached. > > > > > > I still have the same question, is my use of > > > H264Context + ff_h264_decode_extradata correct ? > > > > > > > Using H264 decoder internals seems to be a rather unfortunate > > solution, as its prone to breakage, often subtle, as the h264 decoder > > gets changed and not all inter-module dependencies are known. > > So if possible at all, not using something that uses H264Context for > > example would be nice. > > > > For the record, ff_h264_decode_extradata is scheduled for refactoring > > to make it independent of H264Context so it can be more easily shared > > with the h264 decoder and the h264 parser. > > Once that is done, it may give you a cleaner interface to use it from > > mediacodec as well. > > Ok. I can wait for the refactor to be merged but the MediaCodec decoder > will remain broken on those devices. I'm not too happy about that if a > release is to be made in those following days. Do we have an ETA ? I'm > also not too happy to write the same parsing code as I did before for the > AVCC format to split/extract the PPS/SPS. > > Or ..., I can push this code (if its use is valid) and update it when the > merge lands (I'm helping Clément with the merges, so I will take care > about this part). Updated patch attached (using the new ff_h264_decode_extradata API). Matthieu >From 30d70187e10f09231a59a255204c810d1662336b Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 10 Jun 2016 13:16:09 +0200 Subject: [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS to be set in their respective csd-{0,1} buffers. --- configure | 2 +- libavcodec/mediacodecdec_h264.c | 140 +--- 2 files changed, 30 insertions(+), 112 deletions(-) diff --git a/configure b/configure index a220fa1..eb08478 100755 --- a/configure +++ b/configure @@ -2548,7 +2548,7 @@ h264_d3d11va_hwaccel_select="h264_decoder" h264_dxva2_hwaccel_deps="dxva2" h264_dxva2_hwaccel_select="h264_decoder" h264_mediacodec_decoder_deps="mediacodec" -h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser" +h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser" h264_mmal_decoder_deps="mmal" h264_mmal_decoder_select="mmal" h264_mmal_hwaccel_deps="mmal" diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 52e48ae..b63b395 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -32,6 +32,7 @@ #include "libavutil/atomic.h" #include "avcodec.h" +#include "h264.h" #include "internal.h" #include "mediacodecdec.h" #include "mediacodec_wrapper.h" @@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext { } MediaCodecH264DecContext; -static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx, -uint8_t **extradata_annexb, int *extradata_annexb_size, -int *sps_offset, int *sps_size, -int *pps_offset, int *pps_size) -{ -uint16_t unit_size; -uint64_t total_size = 0; - -uint8_t i, j, unit_nb; -uint8_t sps_seen = 0; -uint8_t pps_seen = 0; - -const uint8_t *extradata; -static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 }; - -if (avctx->extradata_size < 8) { -av_log(avctx, AV_LOG_ERROR, -"Too small extradata size, corrupted stream or invalid MP4/AVCC bitstream\n"); -
Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability
On Fri, Jun 17, 2016 at 09:47:35AM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils > to overwrite the field with incorrect values as the decoder is > asynchronous. If there is no objection, I will push the patch in one day. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability
From: Matthieu Bouron And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils to overwrite the field with incorrect values as the decoder is asynchronous. --- libavcodec/mediacodecdec.c | 1 + libavcodec/mediacodecdec_h264.c | 1 + 2 files changed, 2 insertions(+) diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c index 0b08f020..68df885 100644 --- a/libavcodec/mediacodecdec.c +++ b/libavcodec/mediacodecdec.c @@ -162,6 +162,7 @@ static int mediacodec_wrap_buffer(AVCodecContext *avctx, * * N avpackets can be pushed before 1 frame is actually returned * * 0-sized avpackets are pushed to flush remaining frames at EOS */ frame->pkt_pts = info->presentationTimeUs; +frame->pkt_dts = AV_NOPTS_VALUE; av_log(avctx, AV_LOG_DEBUG, "Frame: width=%d stride=%d height=%d slice-height=%d " diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 52e48ae..0f90606 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -344,4 +344,5 @@ AVCodec ff_h264_mediacodec_decoder = { .flush = mediacodec_decode_flush, .close = mediacodec_decode_close, .capabilities = CODEC_CAP_DELAY, +.caps_internal = FF_CODEC_CAP_SETS_PKT_DTS, }; -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavf/mov: ignore ctts entries that do not apply to a least one sample
From: Matthieu Bouron Fixes packet pts of samples which contain ctts entries with count=0. --- Hello, The following patch fixes packet pts of samples which contain ctts values with count=0 (so the ctts entry does not apply to any sample if I understand correctly). Such samples are produced by a LG G4 phone. I don't have any sample I can share at the moment (and thus no fate test following this patch yet). An alternative to this patch is to remove directly the entry when the ctts atom is parsed. Would you prefer this alternative ? What happens without the patch is that the ctts_index is never incremented if the current ctts entry count is 0. Matthieu --- libavformat/mov.c | 5 + 1 file changed, 5 insertions(+) diff --git a/libavformat/mov.c b/libavformat/mov.c index 57a0354..7fbad22 100644 --- a/libavformat/mov.c +++ b/libavformat/mov.c @@ -5175,6 +5175,11 @@ static int mov_read_packet(AVFormatContext *s, AVPacket *pkt) pkt->stream_index = sc->ffindex; pkt->dts = sample->timestamp; + +if (sc->ctts_data && sc->ctts_index < sc->ctts_count && +sc->ctts_data[sc->ctts_index].count == 0) +sc->ctts_index++; + if (sc->ctts_data && sc->ctts_index < sc->ctts_count) { pkt->pts = pkt->dts + sc->dts_shift + sc->ctts_data[sc->ctts_index].duration; /* update ctts context */ -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType
On Mon, Jun 13, 2016 at 02:47:45PM +0200, Matthieu Bouron wrote: > On Wed, Jun 08, 2016 at 11:19:51PM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > Allows to select a codec (encoder or decoder) only if it supports a > > specific profile. > > > > Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an > > AVCodecContext profile to a MediaCodec profile. It only supports H264 > > for now. > > > > The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0) > > has been dropped as this method does not allow to select a decoder > > compatible with a specific profile. > > --- > > If there is no objection, I will push this patch in one day. Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType
On Wed, Jun 08, 2016 at 11:19:51PM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > Allows to select a codec (encoder or decoder) only if it supports a > specific profile. > > Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an > AVCodecContext profile to a MediaCodec profile. It only supports H264 > for now. > > The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0) > has been dropped as this method does not allow to select a decoder > compatible with a specific profile. > --- If there is no objection, I will push this patch in one day. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS
On Mon, Jun 13, 2016 at 12:23:07PM +0200, Hendrik Leppkes wrote: > On Mon, Jun 13, 2016 at 11:51 AM, Matthieu Bouron > wrote: > > On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote: > >> From: Matthieu Bouron > >> > >> Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS > >> to be set in their respective csd-{0,1} buffers. > >> --- > >> > >> Hello, > >> > >> The attached patch fixes playback of HLS streams on MediaTek devices which > >> requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of > >> having sps+pps in the csd-0 which works on other devices). > >> > >> I'm not sure if I can use the ff_h264_decode_extradata this way (or at > >> least > >> initialize the H264Context with zeroes minus the avctx field). > > > > Rebased patch (after the h264 ps merged) attached. > > > > I still have the same question, is my use of > > H264Context + ff_h264_decode_extradata correct ? > > > > Using H264 decoder internals seems to be a rather unfortunate > solution, as its prone to breakage, often subtle, as the h264 decoder > gets changed and not all inter-module dependencies are known. > So if possible at all, not using something that uses H264Context for > example would be nice. > > For the record, ff_h264_decode_extradata is scheduled for refactoring > to make it independent of H264Context so it can be more easily shared > with the h264 decoder and the h264 parser. > Once that is done, it may give you a cleaner interface to use it from > mediacodec as well. Ok. I can wait for the refactor to be merged but the MediaCodec decoder will remain broken on those devices. I'm not too happy about that if a release is to be made in those following days. Do we have an ETA ? I'm also not too happy to write the same parsing code as I did before for the AVCC format to split/extract the PPS/SPS. Or ..., I can push this code (if its use is valid) and update it when the merge lands (I'm helping Clément with the merges, so I will take care about this part). Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS
On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS > to be set in their respective csd-{0,1} buffers. > --- > > Hello, > > The attached patch fixes playback of HLS streams on MediaTek devices which > requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of > having sps+pps in the csd-0 which works on other devices). > > I'm not sure if I can use the ff_h264_decode_extradata this way (or at least > initialize the H264Context with zeroes minus the avctx field). Rebased patch (after the h264 ps merged) attached. I still have the same question, is my use of H264Context + ff_h264_decode_extradata correct ? Thanks in advance, Matthieu [...] >From 30d70187e10f09231a59a255204c810d1662336b Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 10 Jun 2016 13:16:09 +0200 Subject: [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS to be set in their respective csd-{0,1} buffers. --- configure | 2 +- libavcodec/mediacodecdec_h264.c | 140 +--- 2 files changed, 30 insertions(+), 112 deletions(-) diff --git a/configure b/configure index a220fa1..eb08478 100755 --- a/configure +++ b/configure @@ -2548,7 +2548,7 @@ h264_d3d11va_hwaccel_select="h264_decoder" h264_dxva2_hwaccel_deps="dxva2" h264_dxva2_hwaccel_select="h264_decoder" h264_mediacodec_decoder_deps="mediacodec" -h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser" +h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser" h264_mmal_decoder_deps="mmal" h264_mmal_decoder_select="mmal" h264_mmal_hwaccel_deps="mmal" diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 52e48ae..b63b395 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -32,6 +32,7 @@ #include "libavutil/atomic.h" #include "avcodec.h" +#include "h264.h" #include "internal.h" #include "mediacodecdec.h" #include "mediacodec_wrapper.h" @@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext { } MediaCodecH264DecContext; -static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx, -uint8_t **extradata_annexb, int *extradata_annexb_size, -int *sps_offset, int *sps_size, -int *pps_offset, int *pps_size) -{ -uint16_t unit_size; -uint64_t total_size = 0; - -uint8_t i, j, unit_nb; -uint8_t sps_seen = 0; -uint8_t pps_seen = 0; - -const uint8_t *extradata; -static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 }; - -if (avctx->extradata_size < 8) { -av_log(avctx, AV_LOG_ERROR, -"Too small extradata size, corrupted stream or invalid MP4/AVCC bitstream\n"); -return AVERROR(EINVAL); -} - -*extradata_annexb = NULL; -*extradata_annexb_size = 0; - -*sps_offset = *sps_size = 0; -*pps_offset = *pps_size = 0; - -extradata = avctx->extradata + 4; - -/* skip length size */ -extradata++; - -for (j = 0; j < 2; j ++) { - -if (j == 0) { -/* number of sps unit(s) */ -unit_nb = *extradata++ & 0x1f; -} else { -/* number of pps unit(s) */ -unit_nb = *extradata++; -} - -for (i = 0; i < unit_nb; i++) { -int err; - -unit_size = AV_RB16(extradata); -total_size += unit_size + 4; - -if (total_size > INT_MAX) { -av_log(avctx, AV_LOG_ERROR, -"Too big extradata size, corrupted stream or invalid MP4/AVCC bitstream\n"); -av_freep(extradata_annexb); -return AVERROR(EINVAL); -} - -if (extradata + 2 + unit_size > avctx->extradata + avctx->extradata_size) { -av_log(avctx, AV_LOG_ERROR, "Packet header is not contained in global extradata, " -"corrupted stream or invalid MP4/AVCC bitstream\n"); -av_freep(extradata_annexb); -return AVERROR(EINVAL); -} - -if ((err = av_reallocp(extradata_annexb, total_size)) < 0) { -return err; -} - -memcpy(*extradata_annexb + total_size - unit_size - 4, nalu_header, 4); -memcpy(*extradata_annexb + total_size - unit_size, extradata + 2, unit_size); -extradata += 2 + unit_size; -} - -if (unit_nb) { -if (j == 0) { -sps_seen = 1; -*sps_size = total_
[FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS
From: Matthieu Bouron Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS to be set in their respective csd-{0,1} buffers. --- Hello, The attached patch fixes playback of HLS streams on MediaTek devices which requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of having sps+pps in the csd-0 which works on other devices). I'm not sure if I can use the ff_h264_decode_extradata this way (or at least initialize the H264Context with zeroes minus the avctx field). Matthieu --- configure | 2 +- libavcodec/mediacodecdec_h264.c | 140 +--- 2 files changed, 30 insertions(+), 112 deletions(-) diff --git a/configure b/configure index 7c463a5..508affe 100755 --- a/configure +++ b/configure @@ -2544,7 +2544,7 @@ h264_d3d11va_hwaccel_select="h264_decoder" h264_dxva2_hwaccel_deps="dxva2" h264_dxva2_hwaccel_select="h264_decoder" h264_mediacodec_decoder_deps="mediacodec" -h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser" +h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser" h264_mmal_decoder_deps="mmal" h264_mmal_decoder_select="mmal" h264_mmal_hwaccel_deps="mmal" diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 52e48ae..69e9122 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -32,6 +32,7 @@ #include "libavutil/atomic.h" #include "avcodec.h" +#include "h264.h" #include "internal.h" #include "mediacodecdec.h" #include "mediacodec_wrapper.h" @@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext { } MediaCodecH264DecContext; -static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx, -uint8_t **extradata_annexb, int *extradata_annexb_size, -int *sps_offset, int *sps_size, -int *pps_offset, int *pps_size) -{ -uint16_t unit_size; -uint64_t total_size = 0; - -uint8_t i, j, unit_nb; -uint8_t sps_seen = 0; -uint8_t pps_seen = 0; - -const uint8_t *extradata; -static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 }; - -if (avctx->extradata_size < 8) { -av_log(avctx, AV_LOG_ERROR, -"Too small extradata size, corrupted stream or invalid MP4/AVCC bitstream\n"); -return AVERROR(EINVAL); -} - -*extradata_annexb = NULL; -*extradata_annexb_size = 0; - -*sps_offset = *sps_size = 0; -*pps_offset = *pps_size = 0; - -extradata = avctx->extradata + 4; - -/* skip length size */ -extradata++; - -for (j = 0; j < 2; j ++) { - -if (j == 0) { -/* number of sps unit(s) */ -unit_nb = *extradata++ & 0x1f; -} else { -/* number of pps unit(s) */ -unit_nb = *extradata++; -} - -for (i = 0; i < unit_nb; i++) { -int err; - -unit_size = AV_RB16(extradata); -total_size += unit_size + 4; - -if (total_size > INT_MAX) { -av_log(avctx, AV_LOG_ERROR, -"Too big extradata size, corrupted stream or invalid MP4/AVCC bitstream\n"); -av_freep(extradata_annexb); -return AVERROR(EINVAL); -} - -if (extradata + 2 + unit_size > avctx->extradata + avctx->extradata_size) { -av_log(avctx, AV_LOG_ERROR, "Packet header is not contained in global extradata, " -"corrupted stream or invalid MP4/AVCC bitstream\n"); -av_freep(extradata_annexb); -return AVERROR(EINVAL); -} - -if ((err = av_reallocp(extradata_annexb, total_size)) < 0) { -return err; -} - -memcpy(*extradata_annexb + total_size - unit_size - 4, nalu_header, 4); -memcpy(*extradata_annexb + total_size - unit_size, extradata + 2, unit_size); -extradata += 2 + unit_size; -} - -if (unit_nb) { -if (j == 0) { -sps_seen = 1; -*sps_size = total_size; -} else { -pps_seen = 1; -*pps_size = total_size - *sps_size; -*pps_offset = *sps_size; -} -} -} - -*extradata_annexb_size = total_size; - -if (!sps_seen) -av_log(avctx, AV_LOG_WARNING, - "Warning: SPS NALU missing or invalid. " - "The resulting stream may not play.\n"); - -if (!pps_seen) -av_log(avctx, AV_LOG_WARNING, - "Warning: PPS NALU missing or invalid. " - "The resulting stream may not play.\n"); - -return 0; -} - static
Re: [FFmpeg-devel] FFmpeg 3.1
On Mon, Jun 06, 2016 at 10:23:20AM +0200, Matthieu Bouron wrote: > On Mon, Jun 06, 2016 at 03:28:19AM +0200, Michael Niedermayer wrote: > > Hi all > > > > its time for making the next major release > > If you want something in dont forget to push it to git master > > I'd like to have the 3 pending MediaCodec patches merged before the > release. I'd like to have an upcoming MediaCodec patch which fixes playback of HLS streams (and more generally annex-b streams) on MediaTek devices. The issue has been reported by an user. Thanks, Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType
From: Matthieu Bouron Allows to select a codec (encoder or decoder) only if it supports a specific profile. Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an AVCodecContext profile to a MediaCodec profile. It only supports H264 for now. The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0) has been dropped as this method does not allow to select a decoder compatible with a specific profile. --- libavcodec/mediacodec_wrapper.c | 277 ++-- libavcodec/mediacodec_wrapper.h | 4 +- libavcodec/mediacodecdec.c | 8 +- 3 files changed, 216 insertions(+), 73 deletions(-) diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c index c05b6fd..b87e62a 100644 --- a/libavcodec/mediacodec_wrapper.c +++ b/libavcodec/mediacodec_wrapper.c @@ -26,6 +26,7 @@ #include "libavutil/mem.h" #include "libavutil/avstring.h" +#include "avcodec.h" #include "ffjni.h" #include "version.h" #include "mediacodec_wrapper.h" @@ -41,9 +42,26 @@ struct JNIAMediaCodecListFields { jclass mediacodec_info_class; jmethodID get_name_id; +jmethodID get_codec_capabilities_id; jmethodID get_supported_types_id; jmethodID is_encoder_id; +jclass codec_capabilities_class; +jfieldID color_formats_id; +jfieldID profile_levels_id; + +jclass codec_profile_level_class; +jfieldID profile_id; +jfieldID level_id; + +jfieldID avc_profile_baseline_id; +jfieldID avc_profile_main_id; +jfieldID avc_profile_extended_id; +jfieldID avc_profile_high_id; +jfieldID avc_profile_high10_id; +jfieldID avc_profile_high422_id; +jfieldID avc_profile_high444_id; + } JNIAMediaCodecListFields; static const struct FFJniField jni_amediacodeclist_mapping[] = { @@ -56,9 +74,26 @@ static const struct FFJniField jni_amediacodeclist_mapping[] = { { "android/media/MediaCodecInfo", NULL, NULL, FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, mediacodec_info_class), 1 }, { "android/media/MediaCodecInfo", "getName", "()Ljava/lang/String;", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, get_name_id), 1 }, +{ "android/media/MediaCodecInfo", "getCapabilitiesForType", "(Ljava/lang/String;)Landroid/media/MediaCodecInfo$CodecCapabilities;", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, get_codec_capabilities_id), 1 }, { "android/media/MediaCodecInfo", "getSupportedTypes", "()[Ljava/lang/String;", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, get_supported_types_id), 1 }, { "android/media/MediaCodecInfo", "isEncoder", "()Z", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, is_encoder_id), 1 }, +{ "android/media/MediaCodecInfo$CodecCapabilities", NULL, NULL, FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, codec_capabilities_class), 1 }, +{ "android/media/MediaCodecInfo$CodecCapabilities", "colorFormats", "[I", FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, color_formats_id), 1 }, +{ "android/media/MediaCodecInfo$CodecCapabilities", "profileLevels", "[Landroid/media/MediaCodecInfo$CodecProfileLevel;", FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, profile_levels_id), 1 }, + +{ "android/media/MediaCodecInfo$CodecProfileLevel", NULL, NULL, FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, codec_profile_level_class), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileLevel", "profile", "I", FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, profile_id), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileLevel", "level", "I", FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, level_id), 1 }, + +{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileBaseline", "I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, avc_profile_baseline_id), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileMain", "I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, avc_profile_main_id), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileExtended", "I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, avc_profile_extended_id), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileHigh", "I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, avc_profile_high_id), 1 }, +{ "android/media/MediaCodecInfo$CodecProfileL
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
On Mon, Jun 06, 2016 at 11:41:41AM +0200, Matthieu Bouron wrote: > On Mon, Jun 06, 2016 at 11:29:03AM +0200, Hendrik Leppkes wrote: > > On Mon, Jun 6, 2016 at 9:54 AM, Matthieu Bouron > > wrote: > > > On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote: > > >> On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote: > > >> > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote: > > >> > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron > > >> > > wrote: > > >> > > > From: Matthieu Bouron > > >> > > > > > >> > > > Codec width/height restrictions seem hardcoded at the OMX level and > > >> > > > seem arbitrary. Bypassing those restrictions allows a device to > > >> > > > decode > > >> > > > streams at higher resolutions. > > >> > > > > > >> > > > For example it allows a Nexus 5 to decode h264 streams with a > > >> > > > resolution > > >> > > > higher than 1920x1080. > > >> > > > > >> > > > > >> > > What happens if the resolution actually exceeds the devices > > >> > > capabilities? > > >> > > > >> > The patch has been tested on various devices and it has been working so > > >> > far. When the resolution actually exceeds the device capabilities the > > >> > codec just fails to configure itself. > > >> > > > >> > However I did not try to craft samples with really high resolutions > > >> > (higher > > >> > than ~4K) to test the patch against. > > >> > > > >> > I will double check what is happening with both SW output and surface > > >> > output. > > >> > > >> I tested on a bunch of devices with different chipsets and they all fail > > >> at > > >> the configuration step. > > >> > > > > > > If there is no objection, I will push the patchset in one day. > > > > > > > If you have confirmed that it still fails gracefully but accepts more > > streams, then LGTM. > > Thanks. Pushed a different version of the patchset (struct declarations have been moved at the beginning of the file so MediaFormat methods can be re-used in ff_AMediaCodecList_getCodecByName (and are not redeclared)). Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: improve error messages
On Mon, Jun 06, 2016 at 10:08:10PM +0200, Michael Niedermayer wrote: > On Mon, Jun 06, 2016 at 10:05:38AM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/mediacodecdec.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > LGTM > > thx Pushed. Thanks. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
On Mon, Jun 06, 2016 at 11:29:03AM +0200, Hendrik Leppkes wrote: > On Mon, Jun 6, 2016 at 9:54 AM, Matthieu Bouron > wrote: > > On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote: > >> On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote: > >> > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote: > >> > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron > >> > > wrote: > >> > > > From: Matthieu Bouron > >> > > > > >> > > > Codec width/height restrictions seem hardcoded at the OMX level and > >> > > > seem arbitrary. Bypassing those restrictions allows a device to > >> > > > decode > >> > > > streams at higher resolutions. > >> > > > > >> > > > For example it allows a Nexus 5 to decode h264 streams with a > >> > > > resolution > >> > > > higher than 1920x1080. > >> > > > >> > > > >> > > What happens if the resolution actually exceeds the devices > >> > > capabilities? > >> > > >> > The patch has been tested on various devices and it has been working so > >> > far. When the resolution actually exceeds the device capabilities the > >> > codec just fails to configure itself. > >> > > >> > However I did not try to craft samples with really high resolutions > >> > (higher > >> > than ~4K) to test the patch against. > >> > > >> > I will double check what is happening with both SW output and surface > >> > output. > >> > >> I tested on a bunch of devices with different chipsets and they all fail at > >> the configuration step. > >> > > > > If there is no objection, I will push the patchset in one day. > > > > If you have confirmed that it still fails gracefully but accepts more > streams, then LGTM. Thanks. I'm working on an another patch to check what profile the codec supports and fail at init time if the stream profile is too high (currently the init passes but it fails afterwards while trying to decode frames, which is annoying if an application wants to do some kind of fallback at init time). Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] FFmpeg 3.1
On Mon, Jun 06, 2016 at 03:28:19AM +0200, Michael Niedermayer wrote: > Hi all > > its time for making the next major release > If you want something in dont forget to push it to git master I'd like to have the 3 pending MediaCodec patches merged before the release. I'll re-send to the ml the MediaCodec hwaccel patch after the release. Thanks, Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/mediacodec: improve error messages
From: Matthieu Bouron --- libavcodec/mediacodecdec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c index 712f984..676ade7 100644 --- a/libavcodec/mediacodecdec.c +++ b/libavcodec/mediacodecdec.c @@ -198,7 +198,7 @@ static int mediacodec_wrap_buffer(AVCodecContext *avctx, done: status = ff_AMediaCodec_releaseOutputBuffer(s->codec, index, 0); if (status < 0) { -av_log(NULL, AV_LOG_ERROR, "Failed to release output buffer\n"); +av_log(avctx, AV_LOG_ERROR, "Failed to release output buffer\n"); ret = AVERROR_EXTERNAL; } @@ -539,7 +539,7 @@ int ff_mediacodec_dec_flush(AVCodecContext *avctx, MediaCodecDecContext *s) status = ff_AMediaCodec_flush(codec); if (status < 0) { -av_log(NULL, AV_LOG_ERROR, "Failed to flush MediaCodec %p", codec); +av_log(avctx, AV_LOG_ERROR, "Failed to flush codec\n"); return AVERROR_EXTERNAL; } -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote: > On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote: > > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote: > > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron > > > wrote: > > > > From: Matthieu Bouron > > > > > > > > Codec width/height restrictions seem hardcoded at the OMX level and > > > > seem arbitrary. Bypassing those restrictions allows a device to decode > > > > streams at higher resolutions. > > > > > > > > For example it allows a Nexus 5 to decode h264 streams with a resolution > > > > higher than 1920x1080. > > > > > > > > > What happens if the resolution actually exceeds the devices capabilities? > > > > The patch has been tested on various devices and it has been working so > > far. When the resolution actually exceeds the device capabilities the > > codec just fails to configure itself. > > > > However I did not try to craft samples with really high resolutions (higher > > than ~4K) to test the patch against. > > > > I will double check what is happening with both SW output and surface > > output. > > I tested on a bunch of devices with different chipsets and they all fail at > the configuration step. > If there is no objection, I will push the patchset in one day. Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] vaapi_encode_h26[45]: Reject bitrate targets higher than 2^31
On Thu, Jun 02, 2016 at 10:36:21PM +0100, Mark Thompson wrote: > On 02/06/16 22:00, Matthieu Bouron wrote: > > On Thu, Jun 02, 2016 at 07:13:39PM +0100, Mark Thompson wrote: > >> --- > >> ... something like this. > >> > >> libavcodec/vaapi_encode_h264.c | 6 ++ > >> libavcodec/vaapi_encode_h265.c | 6 ++ > >> 2 files changed, 12 insertions(+) > >> > >> diff --git a/libavcodec/vaapi_encode_h264.c > >> b/libavcodec/vaapi_encode_h264.c > >> index 0a99bb1..019ed1f 100644 > >> --- a/libavcodec/vaapi_encode_h264.c > >> +++ b/libavcodec/vaapi_encode_h264.c > >> @@ -731,6 +731,12 @@ static av_cold int > >> vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx > >> int hrd_buffer_size; > >> int hrd_initial_buffer_fullness; > >> > >> +if (avctx->bit_rate >= 1u << 31) { > > > > Wouldn't INT32_MAX be more aproriate ? > > Hmm. No preference - I went for 1u << 31 to match the 2^31 in the error > message, but maybe INT32_MAX makes the code constraint slightly clearer. IMHO, I think it's clearer to use INT32_MAX but as you are the maintainer of those encoders, it's up to you to decide. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/vaapi_encoder_{h264, h265}: fix bad format warning
On Thu, Jun 02, 2016 at 07:09:16PM +0100, Mark Thompson wrote: > On 02/06/16 17:20, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/vaapi_encode_h264.c | 2 +- > > libavcodec/vaapi_encode_h265.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c > > index 0a99bb1..dc7774b 100644 > > --- a/libavcodec/vaapi_encode_h264.c > > +++ b/libavcodec/vaapi_encode_h264.c > > @@ -769,7 +769,7 @@ static av_cold int > > vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx > > priv->fixed_qp_p = 26; > > priv->fixed_qp_b = 26; > > > > -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n", > > +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" > > bps.\n", > > avctx->bit_rate); > > return 0; > > } > > diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c > > index 05d3aa4..17cd900 100644 > > --- a/libavcodec/vaapi_encode_h265.c > > +++ b/libavcodec/vaapi_encode_h265.c > > @@ -1196,7 +1196,7 @@ static av_cold int > > vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx > > priv->fixed_qp_p = 30; > > priv->fixed_qp_b = 30; > > > > -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n", > > +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" > > bps.\n", > > avctx->bit_rate); > > return 0; > > } > > > > LGTM to fix the warning. > > I didn't realise that bit_rate has a different type in the two tines - I > think a bit more is needed here to just reject higher numbers because all of > the relevant fields in va.h structures are 32-bit anyway... Pushed. Thanks. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API
On Wed, Jun 01, 2016 at 11:25:07AM +0200, Matthieu Bouron wrote: > On Tue, May 31, 2016 at 10:13:40AM +0200, Matthieu Bouron wrote: > > On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote: > > > On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote: > > > > From: Matthieu Bouron > > > > > > > > --- > > > > libavcodec/mediacodecdec_h264.c | 61 > > > > + > > > > 1 file changed, 37 insertions(+), 24 deletions(-) > > > > > > > > diff --git a/libavcodec/mediacodecdec_h264.c > > > > b/libavcodec/mediacodecdec_h264.c > > > > index 2d1d525..7f764e9 100644 > > > > --- a/libavcodec/mediacodecdec_h264.c > > > > +++ b/libavcodec/mediacodecdec_h264.c > > > > @@ -23,6 +23,7 @@ > > > > #include > > > > #include > > > > > > > > +#include "libavutil/avassert.h" > > > > #include "libavutil/common.h" > > > > #include "libavutil/fifo.h" > > > > #include "libavutil/opt.h" > > > > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext { > > > > > > > > MediaCodecDecContext ctx; > > > > > > > > -AVBitStreamFilterContext *bsf; > > > > +AVBSFContext *bsf; > > > > > > > > AVFifoBuffer *fifo; > > > > > > > > -AVPacket input_ref; > > > > AVPacket filtered_pkt; > > > > -uint8_t *filtered_data; > > > > > > > > } MediaCodecH264DecContext; > > > > > > > > @@ -156,8 +155,9 @@ static av_cold int > > > > mediacodec_decode_close(AVCodecContext *avctx) > > > > ff_mediacodec_dec_close(avctx, &s->ctx); > > > > > > > > av_fifo_free(s->fifo); > > > > +av_bsf_free(&s->bsf); > > > > > > > > -av_bitstream_filter_close(s->bsf); > > > > +av_packet_unref(&s->filtered_pkt); > > > > > > > > return 0; > > > > } > > > > @@ -211,12 +211,23 @@ static av_cold int > > > > mediacodec_decode_init(AVCodecContext *avctx) > > > > goto done; > > > > } > > > > > > > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb"); > > > > -if (!s->bsf) { > > > > -ret = AVERROR(ENOMEM); > > > > +const AVBitStreamFilter *bsf = > > > > av_bsf_get_by_name("h264_mp4toannexb"); > > > > +if(!bsf) { > > > > +ret = AVERROR_BSF_NOT_FOUND; > > > > goto done; > > > > } > > > > > > > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) { > > > > +goto done; > > > > +} > > > > + > > > > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, > > > > avctx)) < 0) || > > > > +((ret = av_bsf_init(s->bsf)) < 0)) { > > > > + goto done; > > > > +} > > > > + > > > > +av_init_packet(&s->filtered_pkt); > > > > + > > > > done: > > > > if (format) { > > > > ff_AMediaFormat_delete(format); > > > > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext > > > > *avctx, void *data, > > > > while (!*got_frame) { > > > > /* prepare the input data -- convert to Annex B if needed */ > > > > if (s->filtered_pkt.size <= 0) { > > > > -int size; > > > > +AVPacket input_pkt = { 0 }; > > > > + > > > > +av_packet_unref(&s->filtered_pkt); > > > > > > > > /* no more data */ > > > > if (av_fifo_size(s->fifo) < sizeof(AVPacket)) { > > > > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext > > > > *avctx, void *data, > > > > ff_mediacodec_dec_decode(avctx, &s->ctx, frame, > > > > got_frame, avpkt); > > > > } > > > > > > > > -if (s->filtered_data != s->input_ref.data) > > > > -av_freep(&s->filtered_data); > > > > -
Re: [FFmpeg-devel] [PATCH] vaapi_encode_h26[45]: Reject bitrate targets higher than 2^31
On Thu, Jun 02, 2016 at 07:13:39PM +0100, Mark Thompson wrote: > --- > ... something like this. > > libavcodec/vaapi_encode_h264.c | 6 ++ > libavcodec/vaapi_encode_h265.c | 6 ++ > 2 files changed, 12 insertions(+) > > diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c > index 0a99bb1..019ed1f 100644 > --- a/libavcodec/vaapi_encode_h264.c > +++ b/libavcodec/vaapi_encode_h264.c > @@ -731,6 +731,12 @@ static av_cold int > vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx > int hrd_buffer_size; > int hrd_initial_buffer_fullness; > > +if (avctx->bit_rate >= 1u << 31) { Wouldn't INT32_MAX be more aproriate ? > +av_log(avctx, AV_LOG_ERROR, "Target bitrate of 2^31 bps or " > + "higher is not supported.\n"); > +return AVERROR(EINVAL); > +} > + > if (avctx->rc_buffer_size) > hrd_buffer_size = avctx->rc_buffer_size; > else > diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c > index 05d3aa4..060c7b7 100644 > --- a/libavcodec/vaapi_encode_h265.c > +++ b/libavcodec/vaapi_encode_h265.c > @@ -1158,6 +1158,12 @@ static av_cold int > vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx > int hrd_buffer_size; > int hrd_initial_buffer_fullness; > > +if (avctx->bit_rate >= 1u << 31) {a Same comment as above. > +av_log(avctx, AV_LOG_ERROR, "Target bitrate of 2^31 bps or " > + "higher is not supported.\n"); > +return AVERROR(EINVAL); > +} > + > if (avctx->rc_buffer_size) > hrd_buffer_size = avctx->rc_buffer_size; > else > -- > 2.8.1 > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/vaapi_encoder_{h264, h265}: fix bad format warning
From: Matthieu Bouron --- libavcodec/vaapi_encode_h264.c | 2 +- libavcodec/vaapi_encode_h265.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c index 0a99bb1..dc7774b 100644 --- a/libavcodec/vaapi_encode_h264.c +++ b/libavcodec/vaapi_encode_h264.c @@ -769,7 +769,7 @@ static av_cold int vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx priv->fixed_qp_p = 26; priv->fixed_qp_b = 26; -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n", +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" bps.\n", avctx->bit_rate); return 0; } diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c index 05d3aa4..17cd900 100644 --- a/libavcodec/vaapi_encode_h265.c +++ b/libavcodec/vaapi_encode_h265.c @@ -1196,7 +1196,7 @@ static av_cold int vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx priv->fixed_qp_p = 30; priv->fixed_qp_b = 30; -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n", +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" bps.\n", avctx->bit_rate); return 0; } -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API
On Tue, May 31, 2016 at 10:13:40AM +0200, Matthieu Bouron wrote: > On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote: > > On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote: > > > From: Matthieu Bouron > > > > > > --- > > > libavcodec/mediacodecdec_h264.c | 61 > > > + > > > 1 file changed, 37 insertions(+), 24 deletions(-) > > > > > > diff --git a/libavcodec/mediacodecdec_h264.c > > > b/libavcodec/mediacodecdec_h264.c > > > index 2d1d525..7f764e9 100644 > > > --- a/libavcodec/mediacodecdec_h264.c > > > +++ b/libavcodec/mediacodecdec_h264.c > > > @@ -23,6 +23,7 @@ > > > #include > > > #include > > > > > > +#include "libavutil/avassert.h" > > > #include "libavutil/common.h" > > > #include "libavutil/fifo.h" > > > #include "libavutil/opt.h" > > > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext { > > > > > > MediaCodecDecContext ctx; > > > > > > -AVBitStreamFilterContext *bsf; > > > +AVBSFContext *bsf; > > > > > > AVFifoBuffer *fifo; > > > > > > -AVPacket input_ref; > > > AVPacket filtered_pkt; > > > -uint8_t *filtered_data; > > > > > > } MediaCodecH264DecContext; > > > > > > @@ -156,8 +155,9 @@ static av_cold int > > > mediacodec_decode_close(AVCodecContext *avctx) > > > ff_mediacodec_dec_close(avctx, &s->ctx); > > > > > > av_fifo_free(s->fifo); > > > +av_bsf_free(&s->bsf); > > > > > > -av_bitstream_filter_close(s->bsf); > > > +av_packet_unref(&s->filtered_pkt); > > > > > > return 0; > > > } > > > @@ -211,12 +211,23 @@ static av_cold int > > > mediacodec_decode_init(AVCodecContext *avctx) > > > goto done; > > > } > > > > > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb"); > > > -if (!s->bsf) { > > > -ret = AVERROR(ENOMEM); > > > +const AVBitStreamFilter *bsf = > > > av_bsf_get_by_name("h264_mp4toannexb"); > > > +if(!bsf) { > > > +ret = AVERROR_BSF_NOT_FOUND; > > > goto done; > > > } > > > > > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) { > > > +goto done; > > > +} > > > + > > > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) > > > < 0) || > > > +((ret = av_bsf_init(s->bsf)) < 0)) { > > > + goto done; > > > +} > > > + > > > +av_init_packet(&s->filtered_pkt); > > > + > > > done: > > > if (format) { > > > ff_AMediaFormat_delete(format); > > > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext > > > *avctx, void *data, > > > while (!*got_frame) { > > > /* prepare the input data -- convert to Annex B if needed */ > > > if (s->filtered_pkt.size <= 0) { > > > -int size; > > > +AVPacket input_pkt = { 0 }; > > > + > > > +av_packet_unref(&s->filtered_pkt); > > > > > > /* no more data */ > > > if (av_fifo_size(s->fifo) < sizeof(AVPacket)) { > > > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext > > > *avctx, void *data, > > > ff_mediacodec_dec_decode(avctx, &s->ctx, frame, > > > got_frame, avpkt); > > > } > > > > > > -if (s->filtered_data != s->input_ref.data) > > > -av_freep(&s->filtered_data); > > > -s->filtered_data = NULL; > > > -av_packet_unref(&s->input_ref); > > > +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), > > > NULL); > > > + > > > +ret = av_bsf_send_packet(s->bsf, &input_pkt); > > > +if (ret < 0) { > > > +return ret; > > > +} > > > + > > > +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt); > > > +
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote: > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote: > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron > > wrote: > > > From: Matthieu Bouron > > > > > > Codec width/height restrictions seem hardcoded at the OMX level and > > > seem arbitrary. Bypassing those restrictions allows a device to decode > > > streams at higher resolutions. > > > > > > For example it allows a Nexus 5 to decode h264 streams with a resolution > > > higher than 1920x1080. > > > > > > What happens if the resolution actually exceeds the devices capabilities? > > The patch has been tested on various devices and it has been working so > far. When the resolution actually exceeds the device capabilities the > codec just fails to configure itself. > > However I did not try to craft samples with really high resolutions (higher > than ~4K) to test the patch against. > > I will double check what is happening with both SW output and surface > output. I tested on a bunch of devices with different chipsets and they all fail at the configuration step. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote: > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron > wrote: > > From: Matthieu Bouron > > > > Codec width/height restrictions seem hardcoded at the OMX level and > > seem arbitrary. Bypassing those restrictions allows a device to decode > > streams at higher resolutions. > > > > For example it allows a Nexus 5 to decode h264 streams with a resolution > > higher than 1920x1080. > > > What happens if the resolution actually exceeds the devices capabilities? The patch has been tested on various devices and it has been working so far. When the resolution actually exceeds the device capabilities the codec just fails to configure itself. However I did not try to craft samples with really high resolutions (higher than ~4K) to test the patch against. I will double check what is happening with both SW output and surface output. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder
From: Matthieu Bouron Codec width/height restrictions seem hardcoded at the OMX level and seem arbitrary. Bypassing those restrictions allows a device to decode streams at higher resolutions. For example it allows a Nexus 5 to decode h264 streams with a resolution higher than 1920x1080. --- libavcodec/mediacodec_wrapper.c | 31 ++- libavcodec/mediacodec_wrapper.h | 2 +- libavcodec/mediacodecdec.c | 2 +- 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c index c847a11..2e3fcef 100644 --- a/libavcodec/mediacodec_wrapper.c +++ b/libavcodec/mediacodec_wrapper.c @@ -33,7 +33,8 @@ struct JNIAMediaCodecListFields { jclass mediaformat_class; -jmethodID create_video_format_id; +jmethodID mediaformat_init_id; +jmethodID set_string_id; jclass mediacodec_list_class; jmethodID init_id; @@ -51,7 +52,8 @@ struct JNIAMediaCodecListFields { static const struct FFJniField jfields_mapping[] = { { "android/media/MediaFormat", NULL, NULL, FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, mediaformat_class), 1 }, -{ "android/media/MediaFormat", "createVideoFormat", "(Ljava/lang/String;II)Landroid/media/MediaFormat;", FF_JNI_STATIC_METHOD, offsetof(struct JNIAMediaCodecListFields, create_video_format_id), 1 }, +{ "android/media/MediaFormat", "", "()V", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, mediaformat_init_id), 1}, +{ "android/media/MediaFormat", "setString", "(Ljava/lang/String;Ljava/lang/String;)V", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, set_string_id), 1}, { "android/media/MediaCodecList", NULL, NULL, FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, mediacodec_list_class), 1 }, { "android/media/MediaCodecList", "", "(I)V", FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, init_id), 0 }, @@ -87,7 +89,7 @@ static const struct FFJniField jfields_mapping[] = { ff_jni_detach_env(log_ctx);\ } while (0) -char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int height, void *log_ctx) +char *ff_AMediaCodecList_getCodecNameByType(const char *mime, void *log_ctx) { int ret; char *name = NULL; @@ -99,6 +101,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int hei jobject format = NULL; jobject codec = NULL; +jstring key = NULL; jstring tmp = NULL; jobject info = NULL; @@ -112,15 +115,29 @@ char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int hei } if (jfields.init_id && jfields.find_decoder_for_format_id) { +key = ff_jni_utf_chars_to_jstring(env, "mime", log_ctx); +if (!key) { +goto done; +} + tmp = ff_jni_utf_chars_to_jstring(env, mime, log_ctx); if (!tmp) { goto done; } -format = (*env)->CallStaticObjectMethod(env, jfields.mediaformat_class, jfields.create_video_format_id, tmp, width, height); +format = (*env)->NewObject(env, jfields.mediaformat_class, jfields.mediaformat_init_id); +if (ff_jni_exception_check(env, 1, log_ctx) < 0) { +goto done; +} + +(*env)->CallVoidMethod(env, format, jfields.set_string_id, key, tmp); if (ff_jni_exception_check(env, 1, log_ctx) < 0) { goto done; } + +(*env)->DeleteLocalRef(env, key); +key = NULL; + (*env)->DeleteLocalRef(env, tmp); tmp = NULL; @@ -135,7 +152,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int hei } if (!tmp) { av_log(NULL, AV_LOG_ERROR, "Could not find decoder in media codec list " - "for format { mime=%s width=%d height=%d }\n", mime, width, height); + "for format { mime=%s }\n", mime); goto done; } @@ -232,6 +249,10 @@ done: (*env)->DeleteLocalRef(env, codec); } +if (key) { +(*env)->DeleteLocalRef(env, key); +} + if (tmp) { (*env)->DeleteLocalRef(env, tmp); } diff --git a/libavcodec/mediacodec_wrapper.h b/libavcodec/mediacodec_wrapper.h index a804b61..36cd258 100644 --- a/libavcodec/mediacodec_wrapper.h +++ b/libavcodec/mediacodec_wrapper.h @@ -52,7 +52,7 @@ * */ -char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int height, void *log_ctx); +char *ff_AMediaCodecList_getCodecNameByType(const char *mime, void *log_ctx); struct FFAMediaFormat; typedef struct FFAMediaFormat FFAMediaFormat; diff --git a/li
[FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: do not delete a local reference twice in case of error
From: Matthieu Bouron --- libavcodec/mediacodec_wrapper.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c index 053c164..c847a11 100644 --- a/libavcodec/mediacodec_wrapper.c +++ b/libavcodec/mediacodec_wrapper.c @@ -122,6 +122,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int hei goto done; } (*env)->DeleteLocalRef(env, tmp); +tmp = NULL; codec = (*env)->NewObject(env, jfields.mediacodec_list_class, jfields.init_id, 0); if (ff_jni_exception_check(env, 1, log_ctx) < 0) { -- 2.8.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API
On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote: > On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/mediacodecdec_h264.c | 61 > > + > > 1 file changed, 37 insertions(+), 24 deletions(-) > > > > diff --git a/libavcodec/mediacodecdec_h264.c > > b/libavcodec/mediacodecdec_h264.c > > index 2d1d525..7f764e9 100644 > > --- a/libavcodec/mediacodecdec_h264.c > > +++ b/libavcodec/mediacodecdec_h264.c > > @@ -23,6 +23,7 @@ > > #include > > #include > > > > +#include "libavutil/avassert.h" > > #include "libavutil/common.h" > > #include "libavutil/fifo.h" > > #include "libavutil/opt.h" > > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext { > > > > MediaCodecDecContext ctx; > > > > -AVBitStreamFilterContext *bsf; > > +AVBSFContext *bsf; > > > > AVFifoBuffer *fifo; > > > > -AVPacket input_ref; > > AVPacket filtered_pkt; > > -uint8_t *filtered_data; > > > > } MediaCodecH264DecContext; > > > > @@ -156,8 +155,9 @@ static av_cold int > > mediacodec_decode_close(AVCodecContext *avctx) > > ff_mediacodec_dec_close(avctx, &s->ctx); > > > > av_fifo_free(s->fifo); > > +av_bsf_free(&s->bsf); > > > > -av_bitstream_filter_close(s->bsf); > > +av_packet_unref(&s->filtered_pkt); > > > > return 0; > > } > > @@ -211,12 +211,23 @@ static av_cold int > > mediacodec_decode_init(AVCodecContext *avctx) > > goto done; > > } > > > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb"); > > -if (!s->bsf) { > > -ret = AVERROR(ENOMEM); > > +const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb"); > > +if(!bsf) { > > +ret = AVERROR_BSF_NOT_FOUND; > > goto done; > > } > > > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) { > > +goto done; > > +} > > + > > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < > > 0) || > > +((ret = av_bsf_init(s->bsf)) < 0)) { > > + goto done; > > +} > > + > > +av_init_packet(&s->filtered_pkt); > > + > > done: > > if (format) { > > ff_AMediaFormat_delete(format); > > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext > > *avctx, void *data, > > while (!*got_frame) { > > /* prepare the input data -- convert to Annex B if needed */ > > if (s->filtered_pkt.size <= 0) { > > -int size; > > +AVPacket input_pkt = { 0 }; > > + > > +av_packet_unref(&s->filtered_pkt); > > > > /* no more data */ > > if (av_fifo_size(s->fifo) < sizeof(AVPacket)) { > > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext > > *avctx, void *data, > > ff_mediacodec_dec_decode(avctx, &s->ctx, frame, > > got_frame, avpkt); > > } > > > > -if (s->filtered_data != s->input_ref.data) > > -av_freep(&s->filtered_data); > > -s->filtered_data = NULL; > > -av_packet_unref(&s->input_ref); > > +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), > > NULL); > > + > > +ret = av_bsf_send_packet(s->bsf, &input_pkt); > > +if (ret < 0) { > > +return ret; > > +} > > + > > +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt); > > +if (ret == AVERROR(EAGAIN)) { > > +goto done; > > +} > > + > > +/* h264_mp4toannexb is used here and does not require flushing > > */ > > +av_assert0(ret != AVERROR_EOF); > > > > -av_fifo_generic_read(s->fifo, &s->input_ref, > > sizeof(s->input_ref), NULL); > > -ret = av_bitstream_filter_filter(s->bsf, avctx, NULL, > > - &s->filtered_data, &size, > > - s
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API
On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libavcodec/mediacodecdec_h264.c | 61 > + > 1 file changed, 37 insertions(+), 24 deletions(-) > > diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c > index 2d1d525..7f764e9 100644 > --- a/libavcodec/mediacodecdec_h264.c > +++ b/libavcodec/mediacodecdec_h264.c > @@ -23,6 +23,7 @@ > #include > #include > > +#include "libavutil/avassert.h" > #include "libavutil/common.h" > #include "libavutil/fifo.h" > #include "libavutil/opt.h" > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext { > > MediaCodecDecContext ctx; > > -AVBitStreamFilterContext *bsf; > +AVBSFContext *bsf; > > AVFifoBuffer *fifo; > > -AVPacket input_ref; > AVPacket filtered_pkt; > -uint8_t *filtered_data; > > } MediaCodecH264DecContext; > > @@ -156,8 +155,9 @@ static av_cold int mediacodec_decode_close(AVCodecContext > *avctx) > ff_mediacodec_dec_close(avctx, &s->ctx); > > av_fifo_free(s->fifo); > +av_bsf_free(&s->bsf); > > -av_bitstream_filter_close(s->bsf); > +av_packet_unref(&s->filtered_pkt); > > return 0; > } > @@ -211,12 +211,23 @@ static av_cold int > mediacodec_decode_init(AVCodecContext *avctx) > goto done; > } > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb"); > -if (!s->bsf) { > -ret = AVERROR(ENOMEM); > +const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb"); > +if(!bsf) { > +ret = AVERROR_BSF_NOT_FOUND; > goto done; > } > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) { > +goto done; > +} > + > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < 0) > || > +((ret = av_bsf_init(s->bsf)) < 0)) { > + goto done; > +} > + > +av_init_packet(&s->filtered_pkt); > + > done: > if (format) { > ff_AMediaFormat_delete(format); > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, > void *data, > while (!*got_frame) { > /* prepare the input data -- convert to Annex B if needed */ > if (s->filtered_pkt.size <= 0) { > -int size; > +AVPacket input_pkt = { 0 }; > + > +av_packet_unref(&s->filtered_pkt); > > /* no more data */ > if (av_fifo_size(s->fifo) < sizeof(AVPacket)) { > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext > *avctx, void *data, > ff_mediacodec_dec_decode(avctx, &s->ctx, frame, > got_frame, avpkt); > } > > -if (s->filtered_data != s->input_ref.data) > -av_freep(&s->filtered_data); > -s->filtered_data = NULL; > -av_packet_unref(&s->input_ref); > +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), > NULL); > + > +ret = av_bsf_send_packet(s->bsf, &input_pkt); > +if (ret < 0) { > +return ret; > +} > + > +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt); > +if (ret == AVERROR(EAGAIN)) { > +goto done; > +} > + > +/* h264_mp4toannexb is used here and does not require flushing */ > +av_assert0(ret != AVERROR_EOF); > > -av_fifo_generic_read(s->fifo, &s->input_ref, > sizeof(s->input_ref), NULL); > -ret = av_bitstream_filter_filter(s->bsf, avctx, NULL, > - &s->filtered_data, &size, > - s->input_ref.data, > s->input_ref.size, 0); > if (ret < 0) { > -s->filtered_data = s->input_ref.data; > -size = s->input_ref.size; > +return ret; > } > -s->filtered_pkt = s->input_ref; > -s->filtered_pkt.data = s->filtered_data; > -s->filtered_pkt.size = size; > } > > ret = mediacodec_process_data(avctx, frame, got_frame, > &s->filtered_pkt); > @@ -298,7 +313,7 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, > void *data, >
[FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API
From: Matthieu Bouron --- libavcodec/mediacodecdec_h264.c | 61 + 1 file changed, 37 insertions(+), 24 deletions(-) diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 2d1d525..7f764e9 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -23,6 +23,7 @@ #include #include +#include "libavutil/avassert.h" #include "libavutil/common.h" #include "libavutil/fifo.h" #include "libavutil/opt.h" @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext { MediaCodecDecContext ctx; -AVBitStreamFilterContext *bsf; +AVBSFContext *bsf; AVFifoBuffer *fifo; -AVPacket input_ref; AVPacket filtered_pkt; -uint8_t *filtered_data; } MediaCodecH264DecContext; @@ -156,8 +155,9 @@ static av_cold int mediacodec_decode_close(AVCodecContext *avctx) ff_mediacodec_dec_close(avctx, &s->ctx); av_fifo_free(s->fifo); +av_bsf_free(&s->bsf); -av_bitstream_filter_close(s->bsf); +av_packet_unref(&s->filtered_pkt); return 0; } @@ -211,12 +211,23 @@ static av_cold int mediacodec_decode_init(AVCodecContext *avctx) goto done; } -s->bsf = av_bitstream_filter_init("h264_mp4toannexb"); -if (!s->bsf) { -ret = AVERROR(ENOMEM); +const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb"); +if(!bsf) { +ret = AVERROR_BSF_NOT_FOUND; goto done; } +if ((ret = av_bsf_alloc(bsf, &s->bsf))) { +goto done; +} + +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < 0) || +((ret = av_bsf_init(s->bsf)) < 0)) { + goto done; +} + +av_init_packet(&s->filtered_pkt); + done: if (format) { ff_AMediaFormat_delete(format); @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, void *data, while (!*got_frame) { /* prepare the input data -- convert to Annex B if needed */ if (s->filtered_pkt.size <= 0) { -int size; +AVPacket input_pkt = { 0 }; + +av_packet_unref(&s->filtered_pkt); /* no more data */ if (av_fifo_size(s->fifo) < sizeof(AVPacket)) { @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, void *data, ff_mediacodec_dec_decode(avctx, &s->ctx, frame, got_frame, avpkt); } -if (s->filtered_data != s->input_ref.data) -av_freep(&s->filtered_data); -s->filtered_data = NULL; -av_packet_unref(&s->input_ref); +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), NULL); + +ret = av_bsf_send_packet(s->bsf, &input_pkt); +if (ret < 0) { +return ret; +} + +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt); +if (ret == AVERROR(EAGAIN)) { +goto done; +} + +/* h264_mp4toannexb is used here and does not require flushing */ +av_assert0(ret != AVERROR_EOF); -av_fifo_generic_read(s->fifo, &s->input_ref, sizeof(s->input_ref), NULL); -ret = av_bitstream_filter_filter(s->bsf, avctx, NULL, - &s->filtered_data, &size, - s->input_ref.data, s->input_ref.size, 0); if (ret < 0) { -s->filtered_data = s->input_ref.data; -size = s->input_ref.size; +return ret; } -s->filtered_pkt = s->input_ref; -s->filtered_pkt.data = s->filtered_data; -s->filtered_pkt.size = size; } ret = mediacodec_process_data(avctx, frame, got_frame, &s->filtered_pkt); @@ -298,7 +313,7 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, void *data, s->filtered_pkt.size -= ret; s->filtered_pkt.data += ret; } - +done: return avpkt->size; } @@ -313,8 +328,6 @@ static void mediacodec_decode_flush(AVCodecContext *avctx) } av_fifo_reset(s->fifo); -av_packet_unref(&s->input_ref); - av_init_packet(&s->filtered_pkt); s->filtered_pkt.data = NULL; s->filtered_pkt.size = 0; -- 2.8.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] lavc/mediacodecdec_h264: rename input_ref to input_pkt
From: Matthieu Bouron --- libavcodec/mediacodecdec_h264.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c index 7f764e9..3a31798 100644 --- a/libavcodec/mediacodecdec_h264.c +++ b/libavcodec/mediacodecdec_h264.c @@ -257,19 +257,19 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, void *data, /* buffer the input packet */ if (avpkt->size) { -AVPacket input_ref = { 0 }; +AVPacket input_pkt = { 0 }; -if (av_fifo_space(s->fifo) < sizeof(input_ref)) { +if (av_fifo_space(s->fifo) < sizeof(input_pkt)) { ret = av_fifo_realloc2(s->fifo, - av_fifo_size(s->fifo) + sizeof(input_ref)); + av_fifo_size(s->fifo) + sizeof(input_pkt)); if (ret < 0) return ret; } -ret = av_packet_ref(&input_ref, avpkt); +ret = av_packet_ref(&input_pkt, avpkt); if (ret < 0) return ret; -av_fifo_generic_write(s->fifo, &input_ref, sizeof(input_ref), NULL); +av_fifo_generic_write(s->fifo, &input_pkt, sizeof(input_pkt), NULL); } /* process buffered data */ -- 2.8.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks
On Thu, May 19, 2016 at 11:46:22AM +0200, Matthieu Bouron wrote: > On Tue, May 17, 2016 at 03:20:54PM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/mediacodec_wrapper.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/libavcodec/mediacodec_wrapper.c > > b/libavcodec/mediacodec_wrapper.c > > index 8ce3b32..5c047ea 100644 > > --- a/libavcodec/mediacodec_wrapper.c > > +++ b/libavcodec/mediacodec_wrapper.c > > @@ -1056,6 +1056,10 @@ FFAMediaCodec* > > ff_AMediaCodec_createCodecByName(const char *name) > > goto fail; > > } > > > > +if (codec->jfields.get_input_buffer_id && > > codec->jfields.get_output_buffer_id) { > > +codec->has_get_i_o_buffer = 1; > > +} > > + > > JNI_DETACH_ENV(attached, codec); > > > > return codec; > > @@ -1178,6 +1182,10 @@ FFAMediaCodec* > > ff_AMediaCodec_createEncoderByType(const char *mime) > > goto fail; > > } > > > > +if (codec->jfields.get_input_buffer_id && > > codec->jfields.get_output_buffer_id) { > > +codec->has_get_i_o_buffer = 1; > > +} > > + > > JNI_DETACH_ENV(attached, NULL); > > > > return codec; > > I will push both patch in one day if there is no objection. Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization
On Tue, May 17, 2016 at 04:44:57PM +0200, Matthieu Bouron wrote: > On Tue, May 17, 2016 at 03:20:53PM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/mediacodec_wrapper.c | 167 > > ++-- > > 1 file changed, 57 insertions(+), 110 deletions(-) > > > > diff --git a/libavcodec/mediacodec_wrapper.c > > b/libavcodec/mediacodec_wrapper.c > > index 6b3f905..8ce3b32 100644 > > --- a/libavcodec/mediacodec_wrapper.c > > +++ b/libavcodec/mediacodec_wrapper.c > > @@ -958,83 +958,101 @@ struct FFAMediaCodec { > > int has_get_i_o_buffer; > > }; > > > > -FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name) > > +static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec) > > ff_ prefix removed locally as this function is not meant to be exported. Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks
On Tue, May 17, 2016 at 03:20:54PM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libavcodec/mediacodec_wrapper.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c > index 8ce3b32..5c047ea 100644 > --- a/libavcodec/mediacodec_wrapper.c > +++ b/libavcodec/mediacodec_wrapper.c > @@ -1056,6 +1056,10 @@ FFAMediaCodec* ff_AMediaCodec_createCodecByName(const > char *name) > goto fail; > } > > +if (codec->jfields.get_input_buffer_id && > codec->jfields.get_output_buffer_id) { > +codec->has_get_i_o_buffer = 1; > +} > + > JNI_DETACH_ENV(attached, codec); > > return codec; > @@ -1178,6 +1182,10 @@ FFAMediaCodec* > ff_AMediaCodec_createEncoderByType(const char *mime) > goto fail; > } > > +if (codec->jfields.get_input_buffer_id && > codec->jfields.get_output_buffer_id) { > +codec->has_get_i_o_buffer = 1; > +} > + > JNI_DETACH_ENV(attached, NULL); > > return codec; I will push both patch in one day if there is no objection. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization
On Tue, May 17, 2016 at 03:20:53PM +0200, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libavcodec/mediacodec_wrapper.c | 167 > ++-- > 1 file changed, 57 insertions(+), 110 deletions(-) > > diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c > index 6b3f905..8ce3b32 100644 > --- a/libavcodec/mediacodec_wrapper.c > +++ b/libavcodec/mediacodec_wrapper.c > @@ -958,83 +958,101 @@ struct FFAMediaCodec { > int has_get_i_o_buffer; > }; > > -FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name) > +static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec) ff_ prefix removed locally as this function is not meant to be exported. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization
From: Matthieu Bouron --- libavcodec/mediacodec_wrapper.c | 167 ++-- 1 file changed, 57 insertions(+), 110 deletions(-) diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c index 6b3f905..8ce3b32 100644 --- a/libavcodec/mediacodec_wrapper.c +++ b/libavcodec/mediacodec_wrapper.c @@ -958,83 +958,101 @@ struct FFAMediaCodec { int has_get_i_o_buffer; }; -FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name) +static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec) { +int ret = 0; int attached = 0; JNIEnv *env = NULL; -FFAMediaCodec *codec = NULL; -jstring codec_name = NULL; -codec = av_mallocz(sizeof(FFAMediaCodec)); -if (!codec) { -return NULL; -} -codec->class = &amediacodec_class; +JNI_ATTACH_ENV_OR_RETURN(env, &attached, codec, AVERROR_EXTERNAL); -env = ff_jni_attach_env(&attached, codec); -if (!env) { -av_freep(&codec); -return NULL; +codec->INFO_TRY_AGAIN_LATER = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.info_try_again_later_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { +goto fail; } -if (ff_jni_init_jfields(env, &codec->jfields, jni_amediacodec_mapping, 1, codec) < 0) { +codec->BUFFER_FLAG_CODEC_CONFIG = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_codec_config_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } -codec_name = ff_jni_utf_chars_to_jstring(env, name, codec); -if (!codec_name) { +codec->BUFFER_FLAG_END_OF_STREAM = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_end_of_stream_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } -codec->object = (*env)->CallStaticObjectMethod(env, codec->jfields.mediacodec_class, codec->jfields.create_by_codec_name_id, codec_name); -if (ff_jni_exception_check(env, 1, codec) < 0) { -goto fail; +if (codec->jfields.buffer_flag_key_frame_id) { +codec->BUFFER_FLAG_KEY_FRAME = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_key_frame_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { +goto fail; +} } -codec->object = (*env)->NewGlobalRef(env, codec->object); -if (!codec->object) { +codec->CONFIGURE_FLAG_ENCODE = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.configure_flag_encode_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } codec->INFO_TRY_AGAIN_LATER = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.info_try_again_later_id); -if (ff_jni_exception_check(env, 1, codec) < 0) { +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } -codec->BUFFER_FLAG_CODEC_CONFIG = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_codec_config_id); -if (ff_jni_exception_check(env, 1, codec) < 0) { +codec->INFO_OUTPUT_BUFFERS_CHANGED = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.info_output_buffers_changed_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } -codec->BUFFER_FLAG_END_OF_STREAM = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_end_of_stream_id); -if (ff_jni_exception_check(env, 1, codec) < 0) { +codec->INFO_OUTPUT_FORMAT_CHANGED = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.info_output_format_changed_id); +if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) { goto fail; } -if (codec->jfields.buffer_flag_key_frame_id) { -codec->BUFFER_FLAG_KEY_FRAME = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec->jfields.buffer_flag_key_frame_id); -if (ff_jni_exception_check(env, 1, codec) < 0) { -goto fail; -} +fail: +JNI_DETACH_ENV(attached, NULL); + +return ret; +} + +FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name) +{ +int attached = 0; +JNIEnv *env = NULL; +FFAMediaCodec *codec = NULL; +jstring codec_name = NULL; + +codec = av_mallocz(sizeof(FFAMediaCodec)); +if (!codec) { +return NULL; } +codec->class = &amediacodec_class; -codec->CONFIGURE_FLAG_ENCODE = (*env)->GetStaticIntField(env, codec->jfields.mediacodec_class, codec
[FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks
From: Matthieu Bouron --- libavcodec/mediacodec_wrapper.c | 8 1 file changed, 8 insertions(+) diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c index 8ce3b32..5c047ea 100644 --- a/libavcodec/mediacodec_wrapper.c +++ b/libavcodec/mediacodec_wrapper.c @@ -1056,6 +1056,10 @@ FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name) goto fail; } +if (codec->jfields.get_input_buffer_id && codec->jfields.get_output_buffer_id) { +codec->has_get_i_o_buffer = 1; +} + JNI_DETACH_ENV(attached, codec); return codec; @@ -1178,6 +1182,10 @@ FFAMediaCodec* ff_AMediaCodec_createEncoderByType(const char *mime) goto fail; } +if (codec->jfields.get_input_buffer_id && codec->jfields.get_output_buffer_id) { +codec->has_get_i_o_buffer = 1; +} + JNI_DETACH_ENV(attached, NULL); return codec; -- 2.8.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon
On Thu, May 12, 2016 at 3:50 PM, Benoit Fouet wrote: > Hi, > > > On 12/05/2016 15:22, Matthieu Bouron wrote: > >> On Thu, May 12, 2016 at 10:01 AM, Benoit Fouet >> wrote: >> >> Hi, >>> >>> I mostly have nits remarks. >>> >>> On 11/05/2016 18:39, Matthieu Bouron wrote: >>> >>> From: Matthieu Bouron >>>> >>>> >>>> [...] >>> >>> diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S >>> >>>> new file mode 100644 >>>> index 000..13462e3 >>>> --- /dev/null >>>> +++ b/libswresample/arm/resample.S >>>> @@ -0,0 +1,77 @@ >>>> >>>> [...] >>>> >>>> +function ff_resample_common_apply_filter_x4_float_neon, export=1 >>>> +vmov.f32q0, #0.0 >>>> @ >>>> accumulator >>>> +1: vld1.32 {q1}, [r1]! >>>> @ >>>> src >>>> +vld1.32 {q2}, [r2]! >>>> @ >>>> filter >>>> +vmla.f32q0, q1, q2 >>>> @ >>>> src + {0..3} * filter + {0..3} >>>> >>>> nit: the comment could be "accu += src[0..3] . filter[0..3]" >>> same for the other ones below >>> >>> [...] >>> >>> +subsr3, #4 @ >>> >>>> filter_length -= 4 >>>> +bgt 1b >>>> @ >>>> loop until filter_length >>>> +vpadd.f32 d0, d0, d1 >>>> @ >>>> pair adding of the 4x32-bit accumulated values >>>> +vpadd.f32 d0, d0, d0 >>>> @ >>>> pair adding of the 4x32-bit accumulator values >>>> +vst1.32 {d0[0]}, [r0] >>>> @ >>>> write accumulator >>>> +mov pc, lr >>>> +endfunc >>>> + >>>> +function ff_resample_common_apply_filter_x8_float_neon, export=1 >>>> +vmov.f32q0, #0.0 >>>> @ >>>> accumulator >>>> +1: vld1.32 {q1}, [r1]! >>>> @ >>>> src1 >>>> +vld1.32 {q2}, [r2]! >>>> @ >>>> filter1 >>>> +vld1.32 {q8}, [r1]! >>>> @ >>>> src2 >>>> +vld1.32 {q9}, [r2]! >>>> @ >>>> filter2 >>>> +vmla.f32q0, q1, q2 >>>> @ >>>> src1 + {0..3} * filter1 + {0..3} >>>> +vmla.f32q0, q8, q9 >>>> @ >>>> src2 + {0..3} * filter2 + {0..3} >>>> >>>> instead of using src1 and src2, you may want to use src[0..3] and >>> src[4..7] >>> so, if I reuse the formulation I proposed above: >>> accu += src[0..3] . filter[0..3] >>> accu += src[4..7] . filter[4..7] >>> >>> Fixed locally (as well as the other case you mentionned) with: >> -vmla.f32q0, q1, q2 @ >> src1 + {0..3} * filter1 + {0..3} >> -vmla.f32q0, q8, q9 @ >> src2 + {0..3} * filter2 + {0..3} >> +vmla.f32q0, q1, q2 @ >> accumulator += src1 + {0..3} * filter1 + {0..3} >> +vmla.f32q0, q8, q9 @ >> accumulator += src2 + {4..7} * filter2 + {4..7} >> >> I prefer to use + {0..3} instead of [0..3] to make the comments consistent >> with what has been done in swscale/arm. >> >> > Fine for me (I chose the "[]" notation to be consistent with the "." > notation also, in order to do as if it were a dot product between two > vectors). > > > +subsr3, #8 @ >>> >>>> filter_length -= 4 >>>> >>>> -= 8 >>> >>> Fixed locally. >> >> >> [...] >>> >>> diff --git a/libswresample/arm/resample_init.c >>> >>>> b/libswresample/arm/resample_init.c >>>> new file mode 100644 >>>> index 000..c817d03 >>>> --- /dev/null >>>> +++ b/libswresample/arm/resample_init.c >>>> >>>> [...] >>>> >>>> +static int ff
Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon
On Thu, May 12, 2016 at 10:01 AM, Benoit Fouet wrote: > Hi, > > I mostly have nits remarks. > > On 11/05/2016 18:39, Matthieu Bouron wrote: > >> From: Matthieu Bouron >> >> > [...] > > diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S >> new file mode 100644 >> index 000..13462e3 >> --- /dev/null >> +++ b/libswresample/arm/resample.S >> @@ -0,0 +1,77 @@ >> >> [...] >> >> +function ff_resample_common_apply_filter_x4_float_neon, export=1 >> +vmov.f32q0, #0.0 @ >> accumulator >> +1: vld1.32 {q1}, [r1]!@ >> src >> +vld1.32 {q2}, [r2]!@ >> filter >> +vmla.f32q0, q1, q2 @ >> src + {0..3} * filter + {0..3} >> > > nit: the comment could be "accu += src[0..3] . filter[0..3]" > same for the other ones below > > [...] > > +subsr3, #4 @ >> filter_length -= 4 >> +bgt 1b @ >> loop until filter_length >> +vpadd.f32 d0, d0, d1 @ >> pair adding of the 4x32-bit accumulated values >> +vpadd.f32 d0, d0, d0 @ >> pair adding of the 4x32-bit accumulator values >> +vst1.32 {d0[0]}, [r0] @ >> write accumulator >> +mov pc, lr >> +endfunc >> + >> +function ff_resample_common_apply_filter_x8_float_neon, export=1 >> +vmov.f32q0, #0.0 @ >> accumulator >> +1: vld1.32 {q1}, [r1]!@ >> src1 >> +vld1.32 {q2}, [r2]!@ >> filter1 >> +vld1.32 {q8}, [r1]!@ >> src2 >> +vld1.32 {q9}, [r2]!@ >> filter2 >> +vmla.f32q0, q1, q2 @ >> src1 + {0..3} * filter1 + {0..3} >> +vmla.f32q0, q8, q9 @ >> src2 + {0..3} * filter2 + {0..3} >> > > instead of using src1 and src2, you may want to use src[0..3] and src[4..7] > so, if I reuse the formulation I proposed above: > accu += src[0..3] . filter[0..3] > accu += src[4..7] . filter[4..7] > Fixed locally (as well as the other case you mentionned) with: -vmla.f32q0, q1, q2 @ src1 + {0..3} * filter1 + {0..3} -vmla.f32q0, q8, q9 @ src2 + {0..3} * filter2 + {0..3} +vmla.f32q0, q1, q2 @ accumulator += src1 + {0..3} * filter1 + {0..3} +vmla.f32q0, q8, q9 @ accumulator += src2 + {4..7} * filter2 + {4..7} I prefer to use + {0..3} instead of [0..3] to make the comments consistent with what has been done in swscale/arm. > > +subsr3, #8 @ >> filter_length -= 4 >> > > -= 8 > Fixed locally. > > [...] > > diff --git a/libswresample/arm/resample_init.c >> b/libswresample/arm/resample_init.c >> new file mode 100644 >> index 000..c817d03 >> --- /dev/null >> +++ b/libswresample/arm/resample_init.c >> >> [...] >> >> +static int ff_resample_common_##TYPE##_neon(ResampleContext *c, void >> *dest, const void *source, \ >> +int n, int update_ctx) >> \ >> +{ >> \ >> +DELEM *dst = dest; >> \ >> +const DELEM *src = source; >> \ >> +int dst_index; >> \ >> +int index= c->index; >> \ >> +int frac= c->frac; >> \ >> +int sample_index = index >> c->phase_shift; >> \ >> +int x4_aligned_filter_length = c->filter_length & ~3; >> \ >> +int x8_aligned_filter_length = c->filter_length & ~7; >> \ >> + >> \ >> +index &= c-&g
Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon
Le 11 mai 2016 6:39 PM, "Matthieu Bouron" a écrit : > > From: Matthieu Bouron > > --- > > Hello, > > Here are some benchmark on a rpi2 of the attached patch. > > ./ffmpeg -f lavfi -i sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop -t 1000 -f null - > > With patch:avg=0.001159 speed=44,1x > Without patch: avg=0.001297 speed=40,8x > > ./ffmpeg -f lavfi -i sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop -t 1000 -f null - > > With patch:avg=0.001374 speed=45,6x > Without patch: avg=0.000782 speed=64,6x Without patch: avg=0.001374 speed=45,6x With patch: avg=0.000782 speed=64,6x [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon
On Wed, May 11, 2016 at 9:04 PM, Reimar Döffinger wrote: > > > On 11.05.2016, at 20:37, Michael Niedermayer > wrote: > > > On Wed, May 11, 2016 at 06:39:20PM +0200, Matthieu Bouron wrote: > >> From: Matthieu Bouron > >> > >> --- > >> > >> Hello, > >> > >> Here are some benchmark on a rpi2 of the attached patch. > >> > >> ./ffmpeg -f lavfi -i > sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop > -t 1000 -f null - > >> > >> With patch:avg=0.001159 speed=44,1x > >> Without patch: avg=0.001297 speed=40,8x > >> > >> ./ffmpeg -f lavfi -i > sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop > -t 1000 -f null - > >> > > > >> With patch:avg=0.001374 speed=45,6x > >> Without patch: avg=0.000782 speed=64,6x > > > > so its slower ? or am i misreading this ? > > > Yes, that seems weird. > Also, what are common filter lengths? > Sorry I inverted the two results, the neon version is actually faster: With*out* patch:avg=0.001374 speed=45,6x With patch: avg=0.000782 speed=64,6x > Because for a length of 4 or 8 or 16 I'd think this would be much better > fully unrolled. > And for longer ones at least partially unrolled. > The common filter length seems to be 32 but it might depends. Regarding the little performance gain on the float version it seems to be due to the switch between vfp instructions versus neon instructions (i'm not 100% sure). Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon
From: Matthieu Bouron --- Hello, Here are some benchmark on a rpi2 of the attached patch. ./ffmpeg -f lavfi -i sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop -t 1000 -f null - With patch:avg=0.001159 speed=44,1x Without patch: avg=0.001297 speed=40,8x ./ffmpeg -f lavfi -i sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop -t 1000 -f null - With patch:avg=0.001374 speed=45,6x Without patch: avg=0.000782 speed=64,6x Matthieu --- libswresample/arm/Makefile| 7 ++- libswresample/arm/resample.S | 77 libswresample/arm/resample_init.c | 120 ++ libswresample/resample.h | 1 + libswresample/resample_dsp.c | 1 + 5 files changed, 204 insertions(+), 2 deletions(-) create mode 100644 libswresample/arm/resample.S create mode 100644 libswresample/arm/resample_init.c diff --git a/libswresample/arm/Makefile b/libswresample/arm/Makefile index 60f3f6d..53ab462 100644 --- a/libswresample/arm/Makefile +++ b/libswresample/arm/Makefile @@ -1,5 +1,8 @@ -OBJS += arm/audio_convert_init.o +OBJS += arm/audio_convert_init.o \ + arm/resample_init.o + OBJS-$(CONFIG_NEON_CLOBBER_TEST) += arm/neontest.o -NEON-OBJS += arm/audio_convert_neon.o +NEON-OBJS += arm/audio_convert_neon.o \ + arm/resample.o diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S new file mode 100644 index 000..13462e3 --- /dev/null +++ b/libswresample/arm/resample.S @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2016 Matthieu Bouron + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/arm/asm.S" + +function ff_resample_common_apply_filter_x4_float_neon, export=1 +vmov.f32q0, #0.0 @ accumulator +1: vld1.32 {q1}, [r1]!@ src +vld1.32 {q2}, [r2]!@ filter +vmla.f32q0, q1, q2 @ src + {0..3} * filter + {0..3} +subsr3, #4 @ filter_length -= 4 +bgt 1b @ loop until filter_length +vpadd.f32 d0, d0, d1 @ pair adding of the 4x32-bit accumulated values +vpadd.f32 d0, d0, d0 @ pair adding of the 4x32-bit accumulator values +vst1.32 {d0[0]}, [r0] @ write accumulator +mov pc, lr +endfunc + +function ff_resample_common_apply_filter_x8_float_neon, export=1 +vmov.f32q0, #0.0 @ accumulator +1: vld1.32 {q1}, [r1]!@ src1 +vld1.32 {q2}, [r2]!@ filter1 +vld1.32 {q8}, [r1]!@ src2 +vld1.32 {q9}, [r2]!@ filter2 +vmla.f32q0, q1, q2 @ src1 + {0..3} * filter1 + {0..3} +vmla.f32q0, q8, q9 @ src2 + {0..3} * filter2 + {0..3} +subsr3, #8 @ filter_length -= 4 +bgt 1b @ loop until filter_length +vpadd.f32 d0, d0, d1 @ pair adding of the 4x32-bit accumulated values +vpadd.f32 d0, d0, d0 @ pair adding of the 4x32-bit accumulator values +vst1.32 {d0[0]}, [r0] @ write accumulator +mov pc, lr +endfunc + +function ff_resample_common_apply_filter_x4_s16_neon, export=1 +vmov.s32q0, #0 @ accumulator +1: vld1.16 {d2}, [r1]!@ src +vld1.16
[FFmpeg-devel] [PATCH 1/3] lavfi/framepool: rename FFVideoFramePool to FFFramePool
From: Matthieu Bouron --- libavfilter/avfilter.c | 2 +- libavfilter/avfilter.h | 4 ++-- libavfilter/framepool.c | 24 libavfilter/framepool.h | 32 libavfilter/video.c | 20 ++-- 5 files changed, 41 insertions(+), 41 deletions(-) diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c index 21f8d9e..2128f69 100644 --- a/libavfilter/avfilter.c +++ b/libavfilter/avfilter.c @@ -170,7 +170,7 @@ void avfilter_link_free(AVFilterLink **link) return; av_frame_free(&(*link)->partial_buf); -ff_video_frame_pool_uninit((FFVideoFramePool**)&(*link)->video_frame_pool); +ff_frame_pool_uninit((FFFramePool**)&(*link)->frame_pool); av_freep(link); } diff --git a/libavfilter/avfilter.h b/libavfilter/avfilter.h index 79227a7..b8585a9 100644 --- a/libavfilter/avfilter.h +++ b/libavfilter/avfilter.h @@ -533,9 +533,9 @@ struct AVFilterLink { int64_t frame_count; /** - * A pointer to a FFVideoFramePool struct. + * A pointer to a FFFramePool struct. */ -void *video_frame_pool; +void *frame_pool; /** * True if a frame is currently wanted on the input of this filter. diff --git a/libavfilter/framepool.c b/libavfilter/framepool.c index 6df574e..36c6e8f 100644 --- a/libavfilter/framepool.c +++ b/libavfilter/framepool.c @@ -26,7 +26,7 @@ #include "libavutil/mem.h" #include "libavutil/pixfmt.h" -struct FFVideoFramePool { +struct FFFramePool { int width; int height; @@ -37,20 +37,20 @@ struct FFVideoFramePool { }; -FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* (*alloc)(int size), - int width, - int height, - enum AVPixelFormat format, - int align) +FFFramePool *ff_frame_pool_video_init(AVBufferRef* (*alloc)(int size), + int width, + int height, + enum AVPixelFormat format, + int align) { int i, ret; -FFVideoFramePool *pool; +FFFramePool *pool; const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(format); if (!desc) return NULL; -pool = av_mallocz(sizeof(FFVideoFramePool)); +pool = av_mallocz(sizeof(FFFramePool)); if (!pool) return NULL; @@ -100,11 +100,11 @@ FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* (*alloc)(int size), return pool; fail: -ff_video_frame_pool_uninit(&pool); +ff_frame_pool_uninit(&pool); return NULL; } -int ff_video_frame_pool_get_config(FFVideoFramePool *pool, +int ff_frame_pool_get_video_config(FFFramePool *pool, int *width, int *height, enum AVPixelFormat *format, @@ -122,7 +122,7 @@ int ff_video_frame_pool_get_config(FFVideoFramePool *pool, } -AVFrame *ff_video_frame_pool_get(FFVideoFramePool *pool) +AVFrame *ff_frame_pool_get(FFFramePool *pool) { int i; AVFrame *frame; @@ -174,7 +174,7 @@ fail: return NULL; } -void ff_video_frame_pool_uninit(FFVideoFramePool **pool) +void ff_frame_pool_uninit(FFFramePool **pool) { int i; diff --git a/libavfilter/framepool.h b/libavfilter/framepool.h index 2a6c9e8..4824824 100644 --- a/libavfilter/framepool.h +++ b/libavfilter/framepool.h @@ -25,11 +25,11 @@ #include "libavutil/frame.h" /** - * Video frame pool. This structure is opaque and not meant to be accessed - * directly. It is allocated with ff_video_frame_pool_init() and freed with - * ff_video_frame_pool_uninit(). + * Frame pool. This structure is opaque and not meant to be accessed + * directly. It is allocated with ff_frame_pool_init() and freed with + * ff_frame_pool_uninit(). */ -typedef struct FFVideoFramePool FFVideoFramePool; +typedef struct FFFramePool FFFramePool; /** * Allocate and initialize a video frame pool. @@ -41,21 +41,21 @@ typedef struct FFVideoFramePool FFVideoFramePool; * @param height height of each frame in this pool * @param format format of each frame in this pool * @param align buffers alignement of each frame in this pool - * @return newly created video frame pool on success, NULL on error. + * @return newly created frame pool on success, NULL on error. */ -FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* (*alloc)(int size), - int width, - int height, - enum AVPixelFormat format, - int align); +FFFramePool *ff_frame_pool_video_init(AVBufferRef* (*alloc)(int size), + i
[FFmpeg-devel] [PATCH 2/3] lavfi/framepool: add audio support
From: Matthieu Bouron --- libavfilter/framepool.c | 106 libavfilter/framepool.h | 36 +++- 2 files changed, 141 insertions(+), 1 deletion(-) diff --git a/libavfilter/framepool.c b/libavfilter/framepool.c index 36c6e8f..4a63fe9 100644 --- a/libavfilter/framepool.c +++ b/libavfilter/framepool.c @@ -20,6 +20,7 @@ #include "framepool.h" #include "libavutil/avassert.h" +#include "libavutil/avutil.h" #include "libavutil/buffer.h" #include "libavutil/frame.h" #include "libavutil/imgutils.h" @@ -28,8 +29,15 @@ struct FFFramePool { +int type; + int width; int height; + +int planes; +int channels; +int nb_samples; + int format; int align; int linesize[4]; @@ -54,6 +62,7 @@ FFFramePool *ff_frame_pool_video_init(AVBufferRef* (*alloc)(int size), if (!pool) return NULL; +pool->type = AVMEDIA_TYPE_VIDEO; pool->width = width; pool->height = height; pool->format = format; @@ -104,6 +113,49 @@ fail: return NULL; } +FFFramePool *ff_frame_pool_audio_init(AVBufferRef* (*alloc)(int size), + int channels, + int nb_samples, + enum AVSampleFormat format, + int align) +{ +int ret, planar; +FFFramePool *pool; + +pool = av_mallocz(sizeof(FFFramePool)); +if (!pool) +return NULL; + +planar = av_sample_fmt_is_planar(format); + +pool->type = AVMEDIA_TYPE_AUDIO; +pool->planes = planar ? channels : 1; +pool->channels = channels; +pool->nb_samples = nb_samples; +pool->format = format; +pool->align = align; + +ret = av_samples_get_buffer_size(&pool->linesize[0], channels, + nb_samples, format, 0); +if (ret < 0) { +goto fail; +} + +pool->pools[0] = av_buffer_pool_init(pool->linesize[0], NULL); +if (!pool->pools[0]) { +ret = AVERROR(ENOMEM); +goto fail; +} + +return pool; + +fail: +ff_frame_pool_uninit(&pool); +return NULL; +} + + + int ff_frame_pool_get_video_config(FFFramePool *pool, int *width, int *height, @@ -121,6 +173,22 @@ int ff_frame_pool_get_video_config(FFFramePool *pool, return 0; } +int ff_frame_pool_get_audio_config(FFFramePool *pool, + int *channels, + int *nb_samples, + enum AVSampleFormat *format, + int *align) +{ +if (!pool) +return AVERROR(EINVAL); + +*channels = pool->channels; +*nb_samples = pool->nb_samples; +*format = pool->format; +*align = pool->align; + +return 0; +} AVFrame *ff_frame_pool_get(FFFramePool *pool) { @@ -133,6 +201,8 @@ AVFrame *ff_frame_pool_get(FFFramePool *pool) return NULL; } +if (pool->type == AVMEDIA_TYPE_VIDEO) { + desc = av_pix_fmt_desc_get(pool->format); if (!desc) { goto fail; @@ -167,6 +237,42 @@ AVFrame *ff_frame_pool_get(FFFramePool *pool) } frame->extended_data = frame->data; +} else if (pool->type == AVMEDIA_TYPE_AUDIO) { +frame->nb_samples = pool->nb_samples; +av_frame_set_channels(frame, pool->channels); +frame->format = pool->format; +frame->linesize[0] = pool->linesize[0]; + +if (pool->planes > AV_NUM_DATA_POINTERS) { +frame->extended_data = av_mallocz_array(pool->planes, + sizeof(*frame->extended_data)); +frame->nb_extended_buf = pool->planes - AV_NUM_DATA_POINTERS; +frame->extended_buf = av_mallocz_array(frame->nb_extended_buf, + sizeof(*frame->extended_buf)); +if (!frame->extended_data || !frame->extended_buf) { +goto fail; +} +} else { +frame->extended_data = frame->data; +av_assert0(frame->nb_extended_buf == 0); +} + +for (i = 0; i < FFMIN(pool->planes, AV_NUM_DATA_POINTERS); i++) { +frame->buf[i] = av_buffer_pool_get(pool->pools[0]); +if (!frame->buf[i]) +goto fail; +frame->extended_data[i] = frame->data[i] = frame->buf[i]->data; +} +for (i = 0; i < frame->nb_extended_buf; i++) { +frame->extended_buf[i] = av_buffer_pool_get(pool->pools[0]); +if (!frame->extended_buf[i]) +goto fail; +frame->ex
[FFmpeg-devel] [PATCH 3/3] lavfi: use an audio frame pool for each link of the filtergraph
From: Matthieu Bouron --- libavfilter/audio.c | 51 +-- 1 file changed, 37 insertions(+), 14 deletions(-) diff --git a/libavfilter/audio.c b/libavfilter/audio.c index 51fef03..dbc92d6 100644 --- a/libavfilter/audio.c +++ b/libavfilter/audio.c @@ -28,6 +28,9 @@ #include "avfilter.h" #include "internal.h" +#define BUFFER_ALIGN 0 + + AVFrame *ff_null_get_audio_buffer(AVFilterLink *link, int nb_samples) { return ff_get_audio_buffer(link->dst->outputs[0], nb_samples); @@ -35,29 +38,49 @@ AVFrame *ff_null_get_audio_buffer(AVFilterLink *link, int nb_samples) AVFrame *ff_default_get_audio_buffer(AVFilterLink *link, int nb_samples) { -AVFrame *frame = av_frame_alloc(); +AVFrame *frame = NULL; int channels = link->channels; -int ret; av_assert0(channels == av_get_channel_layout_nb_channels(link->channel_layout) || !av_get_channel_layout_nb_channels(link->channel_layout)); -if (!frame) -return NULL; +if (!link->frame_pool) { +link->frame_pool = ff_frame_pool_audio_init(av_buffer_allocz, channels, +nb_samples, link->format, BUFFER_ALIGN); +if (!link->frame_pool) +return NULL; +} else { +int pool_channels = 0; +int pool_nb_samples = 0; +int pool_align = 0; +enum AVSampleFormat pool_format = AV_SAMPLE_FMT_NONE; -frame->nb_samples = nb_samples; -frame->format = link->format; -av_frame_set_channels(frame, link->channels); -frame->channel_layout = link->channel_layout; -frame->sample_rate= link->sample_rate; -ret = av_frame_get_buffer(frame, 0); -if (ret < 0) { -av_frame_free(&frame); +if (ff_frame_pool_get_audio_config(link->frame_pool, + &pool_channels, &pool_nb_samples, + &pool_format, &pool_align) < 0) { +return NULL; +} + +if (pool_channels != channels || pool_nb_samples < nb_samples || +pool_format != link->format || pool_align != BUFFER_ALIGN) { + +ff_frame_pool_uninit((FFFramePool **)&link->frame_pool); +link->frame_pool = ff_frame_pool_audio_init(av_buffer_allocz, channels, +nb_samples, link->format, BUFFER_ALIGN); +if (!link->frame_pool) +return NULL; +} +} + +frame = ff_frame_pool_get(link->frame_pool); +if (!frame) { return NULL; } -av_samples_set_silence(frame->extended_data, 0, nb_samples, channels, - link->format); +frame->nb_samples = nb_samples; +frame->channel_layout = link->channel_layout; +frame->sample_rate = link->sample_rate; +av_samples_set_silence(frame->extended_data, 0, nb_samples, channels, link->format); return frame; } -- 2.8.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] (no subject)
Hello, The following patchset add an audio frame pool for each link of the filtergraph. It extends the FFVideoFramePool API to support audio (and renames it to FFFramePool). The performance gain on a rpi2 is very little. malloc+free goes from 2.50% to 1.84% cpu time with the following command line: perf record ./ffmpeg_g -f lavfi -i sine=440,asetnsamples=4096,aformat=sample_fmts=s16,aresample=48000 -t 500 -f null - For reference, most of the time (81%) is spend in request_frame and in the resample function. Given the performance gain, i'm not sure if it's really useful. However it could be a good idea to have this API to use it in both libavcodec (and replace what is being done in lavc/utils.c) and libavfilter for audio and video if made public or semi-public later on. Best regards, Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support
On Thu, Apr 7, 2016 at 4:18 PM, wm4 wrote: > On Fri, 18 Mar 2016 17:50:39 +0100 > Matthieu Bouron wrote: > > > From: Matthieu Bouron > > > > --- > > > > Hello, > > Can't say much about this, so just some minor confused comments. > Thanks for your comments and sorry for the late reply. > > > > > The following patch add hwaccel support to the mediacodec (h264) decoder > by allowing > > the user to render the output frames directly on a surface. > > > > In order to do so the user needs to initialize the hwaccel through the > use of > > av_mediacodec_alloc_context and av_mediacodec_default_init functions. > The later > > takes a reference to an android/view/Surface as parameter. > > > > If the hwaccel successfully initialize, the decoder output frames pix > fmt will be > > AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to > render > > the frames on the surface: > > > > AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3]; > > av_mediacodec_release_buffer(buffer, 1); > > > > The last argument of av_mediacodec_release_buffer enable rendering of the > > buffer on the surface (or not if set to 0). > > > > I don't understand this (at all), but unreferencing the AVFrame should > unref the underlying surface. > In this case, the underlying surface will remain (it is owned by the codec itself) but the output buffer (that should be renderered to the surface) will be discarded. > > > Regarding the internal changes in the mediacodec decoder: > > > > MediaCodec.flush() discards both input and output buffers meaning that if > > MediaCodec.flush() is called all output buffers the user has a reference > on are > > now invalid (and cannot be used). > > This behaviour does not fit well in the avcodec API. > > > > When the decoder is configured to output software buffers, there is no > issue as > > the buffers are copied. > > > > Now when the decoder is configured to output to a surface, the user > might not > > want to render all the frames as fast as the decoder can go and might > want to > > control *when* the frame are rendered, so we need to make sure that the > > MediaCodec.flush() call is delayed until all the frames the user retains > has > > been released or rendered. > > > > Delaying the call to MediaCodec.flush() means buffering any inputs that > come > > the decoder until the user has released/renderer the frame he retains. > > > > This is a limitation of this hwaccel implementation, if the user retains > a > > frame (a), then issue a flush command to the decoder, the packets he > feeds to > > the decoder at that point will be queued in the internal decoder packet > queue > > (until he releases the frame (a)). This scenario leads to a memory usage > > increase to say the least. > > > > Currently there is no limitation on the size of the internal decoder > packet > > queue but this is something that can be added easily. Then, if the queue > is > > full, what would be the behaviour of the decoder ? Can it block ? Or > should it > > returns something like AVERROR(EAGAIN) ? > > The current API can't do anything like this. It has to output 0 or 1 > frame per input packet. (If it outputs nothing, the frame is either > discarded or queued internally. The queue can be emptied only when > draining the decoder at the end of the stream.) > > So it looks like all you can do is blocking. (Which could lead to a > deadlock in the API user, depending of how the user's code works?) > Yes if I block at some point, it can lead to a deadlock if the user never releases all the frames. I'm considering buffering a few input packets before blocking. > > > > > About the other internal decoder changes I introduced: > > > > The MediaCodecDecContext is now refcounted (using the lavu/atomic api) > since > > the (hwaccel) frames can be retained by the user, we need to delay the > > destruction of the codec until the user has released all the frames he > has a > > reference on. > > The reference counter of the MediaCodecDecContext is incremented each > time an > > (hwaccel) frame is outputted by the decoder and decremented each time a > > (hwaccel) frame is released. > > > > Also, when the decoder is configured to output to a surface the pts that > are > > given to the MediaCodec API are now rescaled based on the codec_timebase > as > > those timestamps values are propagated to the frames rendered on the > surface > > since Andr
Re: [FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon
On Mon, Apr 11, 2016 at 4:18 PM, Matthieu Bouron wrote: > > > On Mon, Apr 11, 2016 at 9:58 AM, Benoit Fouet > wrote: > >> Hi, >> >> (again, thanks to both of you for documenting all this assembly /NEON >> code) >> >> On 09/04/2016 10:22, Matthieu Bouron wrote: >> >>> From: Matthieu Bouron >>> >>> --- >>> >>> Hello, >>> >>> The following patch add yuv2planeX_8_neon function for the arm >>> platform. It is >>> currently restricted to 8-bit per component sources until I fix fate >>> issues >>> with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured >>> out yet >>> where it comes from). >>> >>> Matthieu >>> >>> --- >>> libswscale/arm/Makefile | 1 + >>> libswscale/arm/output.S | 78 >>> >>> libswscale/arm/swscale.c | 7 + >>> libswscale/utils.c | 3 +- >>> 4 files changed, 88 insertions(+), 1 deletion(-) >>> create mode 100644 libswscale/arm/output.S >>> >>> [...] >>> >>> diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S >>> new file mode 100644 >>> index 000..4437447 >>> --- /dev/null >>> +++ b/libswscale/arm/output.S >>> @@ -0,0 +1,78 @@ >>> >> >> [...] >> >> >> +function ff_yuv2planeX_8_neon, export=1 >>> +push {r4-r12, lr} >>> +vpush {q4-q7} >>> +ldr r4, [sp, #104] >>> @ dstW >>> +ldr r5, [sp, #108] >>> @ dither >>> +ldr r6, [sp, #112] >>> @ offset >>> +vld1.8 {d0}, [r5] >>> @ load 8x8-bit dither values >>> +tst r6, #0 >>> @ check offsetting which can be 0 or 3 only >>> +beq 1f >>> +vext.u8 d0, d0, d0, #3 >>> @ honor offseting which can be 3 only >>> +1: vmovl.u8q0, d0 >>> @ extend dither to 16-bit >>> +vshll.u16 q1, d0, #12 >>> @ extend dither to 32-bit with left shift by 12 (part 1) >>> +vshll.u16 q2, d1, #12 >>> @ extend dither to 32-bit with left shift by 12 (part 2) >>> +mov r7, #0 >>> @ i = 0 >>> +2: vmov.u8 q3, q1 >>> @ initialize accumulator with dithering values (part 1) >>> +vmov.u8 q4, q2 >>> @ initialize accumulator with dithering values (part 2) >>> +mov r8, r1 >>> @ tmpFilterSize = filterSize >>> +mov r9, r2 >>> @ srcp >>> +mov r10, r0 >>> @ filterp >>> +3: ldr r11, [r9], #4 >>> @ get pointer @ src[j] >>> +ldr r12, [r9], #4 >>> @ get pointer @ src[j+1] >>> +add r11, r11, r7, lsl #1 >>> @ &src[j][i] >>> +add r12, r12, r7, lsl #1 >>> @ &src[j+1][i] >>> +vld1.16 {q5}, [r11] >>> @ read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H >>> +vld1.16 {q6}, [r12] >>> @ read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P >>> +ldr r11, [r10], #4 >>> @ read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1]) >>> +vmov.16 q7, q5 >>> @ copy 8x16-bit @ src[j ][i + {0..7}] for following inplace zip >>> instruction >>> +vmov.16 q8, q6 >>> @ copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip >>> instruction >>> +vzip.16 q7, q8 >>> @ A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L >>> >> >> nit: O,H,P > > > Fixed. > > Patch updated fixing fate issues with 10-bit sources (the code was not > honoring offsetting: tst r6, #0 has been replaced with cmp r6, #0). > If there is no objection, I will push the patch in the next hours. > Patch applied. Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon
On Mon, Apr 11, 2016 at 9:58 AM, Benoit Fouet wrote: > Hi, > > (again, thanks to both of you for documenting all this assembly /NEON code) > > On 09/04/2016 10:22, Matthieu Bouron wrote: > >> From: Matthieu Bouron >> >> --- >> >> Hello, >> >> The following patch add yuv2planeX_8_neon function for the arm platform. >> It is >> currently restricted to 8-bit per component sources until I fix fate >> issues >> with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured >> out yet >> where it comes from). >> >> Matthieu >> >> --- >> libswscale/arm/Makefile | 1 + >> libswscale/arm/output.S | 78 >> >> libswscale/arm/swscale.c | 7 + >> libswscale/utils.c | 3 +- >> 4 files changed, 88 insertions(+), 1 deletion(-) >> create mode 100644 libswscale/arm/output.S >> >> [...] >> >> diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S >> new file mode 100644 >> index 000..4437447 >> --- /dev/null >> +++ b/libswscale/arm/output.S >> @@ -0,0 +1,78 @@ >> > > [...] > > > +function ff_yuv2planeX_8_neon, export=1 >> +push {r4-r12, lr} >> +vpush {q4-q7} >> +ldr r4, [sp, #104] @ >> dstW >> +ldr r5, [sp, #108] @ >> dither >> +ldr r6, [sp, #112] @ >> offset >> +vld1.8 {d0}, [r5] @ >> load 8x8-bit dither values >> +tst r6, #0 @ >> check offsetting which can be 0 or 3 only >> +beq 1f >> +vext.u8 d0, d0, d0, #3 @ >> honor offseting which can be 3 only >> +1: vmovl.u8q0, d0 @ >> extend dither to 16-bit >> +vshll.u16 q1, d0, #12@ >> extend dither to 32-bit with left shift by 12 (part 1) >> +vshll.u16 q2, d1, #12@ >> extend dither to 32-bit with left shift by 12 (part 2) >> +mov r7, #0 @ >> i = 0 >> +2: vmov.u8 q3, q1 @ >> initialize accumulator with dithering values (part 1) >> +vmov.u8 q4, q2 @ >> initialize accumulator with dithering values (part 2) >> +mov r8, r1 @ >> tmpFilterSize = filterSize >> +mov r9, r2 @ >> srcp >> +mov r10, r0@ >> filterp >> +3: ldr r11, [r9], #4 @ >> get pointer @ src[j] >> +ldr r12, [r9], #4 @ >> get pointer @ src[j+1] >> +add r11, r11, r7, lsl #1 @ >> &src[j][i] >> +add r12, r12, r7, lsl #1 @ >> &src[j+1][i] >> +vld1.16 {q5}, [r11]@ >> read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H >> +vld1.16 {q6}, [r12]@ >> read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P >> +ldr r11, [r10], #4 @ >> read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1]) >> +vmov.16 q7, q5 @ >> copy 8x16-bit @ src[j ][i + {0..7}] for following inplace zip instruction >> +vmov.16 q8, q6 @ >> copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip instruction >> +vzip.16 q7, q8 @ >> A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L >> > > nit: O,H,P Fixed. Patch updated fixing fate issues with 10-bit sources (the code was not honoring offsetting: tst r6, #0 has been replaced with cmp r6, #0). If there is no objection, I will push the patch in the next hours. Thanks for the review, Matthieu From 95186d8459c1cb1615299edd6756292140f7fb68 Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 8 Apr 2016 15:32:24
[FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon
From: Matthieu Bouron --- Hello, The following patch add yuv2planeX_8_neon function for the arm platform. It is currently restricted to 8-bit per component sources until I fix fate issues with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured out yet where it comes from). Matthieu --- libswscale/arm/Makefile | 1 + libswscale/arm/output.S | 78 libswscale/arm/swscale.c | 7 + libswscale/utils.c | 3 +- 4 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 libswscale/arm/output.S diff --git a/libswscale/arm/Makefile b/libswscale/arm/Makefile index b8b0134..792da6b 100644 --- a/libswscale/arm/Makefile +++ b/libswscale/arm/Makefile @@ -4,4 +4,5 @@ OBJS+= arm/swscale.o\ NEON-OBJS += arm/rgb2yuv_neon_32.o NEON-OBJS += arm/rgb2yuv_neon_16.o NEON-OBJS += arm/hscale.o \ + arm/output.o \ arm/yuv2rgb_neon.o \ diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S new file mode 100644 index 000..4437447 --- /dev/null +++ b/libswscale/arm/output.S @@ -0,0 +1,78 @@ +/* + * Copyright (c) 2016 Clément Bœsch + * Copyright (c) 2016 Matthieu Bouron + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/arm/asm.S" + +function ff_yuv2planeX_8_neon, export=1 +push {r4-r12, lr} +vpush {q4-q7} +ldr r4, [sp, #104] @ dstW +ldr r5, [sp, #108] @ dither +ldr r6, [sp, #112] @ offset +vld1.8 {d0}, [r5] @ load 8x8-bit dither values +tst r6, #0 @ check offsetting which can be 0 or 3 only +beq 1f +vext.u8 d0, d0, d0, #3 @ honor offseting which can be 3 only +1: vmovl.u8q0, d0 @ extend dither to 16-bit +vshll.u16 q1, d0, #12@ extend dither to 32-bit with left shift by 12 (part 1) +vshll.u16 q2, d1, #12@ extend dither to 32-bit with left shift by 12 (part 2) +mov r7, #0 @ i = 0 +2: vmov.u8 q3, q1 @ initialize accumulator with dithering values (part 1) +vmov.u8 q4, q2 @ initialize accumulator with dithering values (part 2) +mov r8, r1 @ tmpFilterSize = filterSize +mov r9, r2 @ srcp +mov r10, r0@ filterp +3: ldr r11, [r9], #4 @ get pointer @ src[j] +ldr r12, [r9], #4 @ get pointer @ src[j+1] +add r11, r11, r7, lsl #1 @ &src[j][i] +add r12, r12, r7, lsl #1 @ &src[j+1][i] +vld1.16 {q5}, [r11]@ read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H +vld1.16 {q6}, [r12]@ read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P +ldr r11, [r10], #4 @ read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1]) +vmov.16 q7, q5 @ copy 8x16-bit @ src[j ][i + {0..7}] for following inplace zip instruction +vmov.16 q8, q6 @ copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip instruction +vzip.16 q7, q8 @ A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L +vdup.32 q15, r11
Re: [FFmpeg-devel] [PATCH] swscale/arm: add ff_hscale_8_to_15_neon
On Fri, Apr 8, 2016 at 10:27 PM, Michael Niedermayer wrote: > On Fri, Apr 08, 2016 at 12:24:13PM +0200, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libswscale/arm/Makefile | 6 ++-- > > libswscale/arm/hscale.S | 70 > +++ > > libswscale/arm/swscale.c | 37 +++ > > libswscale/swscale.c | 2 ++ > > libswscale/swscale_internal.h | 1 + > > 5 files changed, 114 insertions(+), 2 deletions(-) > > create mode 100644 libswscale/arm/hscale.S > > create mode 100644 libswscale/arm/swscale.c > > tested, works (fate) > Applied with minor changes in the comments. Thanks, Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] swscale/arm: add ff_hscale_8_to_15_neon
From: Matthieu Bouron --- libswscale/arm/Makefile | 6 ++-- libswscale/arm/hscale.S | 70 +++ libswscale/arm/swscale.c | 37 +++ libswscale/swscale.c | 2 ++ libswscale/swscale_internal.h | 1 + 5 files changed, 114 insertions(+), 2 deletions(-) create mode 100644 libswscale/arm/hscale.S create mode 100644 libswscale/arm/swscale.c diff --git a/libswscale/arm/Makefile b/libswscale/arm/Makefile index 9ccec3b..b8b0134 100644 --- a/libswscale/arm/Makefile +++ b/libswscale/arm/Makefile @@ -1,5 +1,7 @@ -OBJS+= arm/swscale_unscaled.o +OBJS+= arm/swscale.o\ + arm/swscale_unscaled.o \ NEON-OBJS += arm/rgb2yuv_neon_32.o NEON-OBJS += arm/rgb2yuv_neon_16.o -NEON-OBJS += arm/yuv2rgb_neon.o +NEON-OBJS += arm/hscale.o \ + arm/yuv2rgb_neon.o \ diff --git a/libswscale/arm/hscale.S b/libswscale/arm/hscale.S new file mode 100644 index 000..d559b3d --- /dev/null +++ b/libswscale/arm/hscale.S @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2016 Clément Bœsch + * Copyright (c) 2016 Matthieu Bouron + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/arm/asm.S" + +function ff_hscale_8_to_15_neon, export=1 +push{r4-r12, lr} +vpush {q4-q7} +ldr r4, [sp, #104] @ filter +ldr r5, [sp, #108] @ filterPos +ldr r6, [sp, #112] @ filterSize +add r10, r4, r6, lsl #1@ filter2 = filter + filterSize * 2 +1: ldr r8, [r5], #4 @ filterPos[0] +ldr r9, [r5], #4 @ filterPos[1] +vmov.s32q4, #0 @ val accumulator +vmov.s32q5, #0 @ val accumulator +mov r7, r6 @ filterSize counter +mov r0, r3 @ srcp +2: add r11, r0, r8@ srcp + filterPos[0] +add r12, r0, r9@ srcp + filterPos[1] +vld1.8 d0, [r11] @ srcp[filterPos[0] + {0..7}] +vld1.8 d2, [r12] @ srcp[filterPos[1] + {0..7}] +vld1.16 {q2}, [r4]!@ load 8x16-bit filter values +vld1.16 {q3}, [r10]! @ load 8x16-bit filter values +vmovl.u8q0, d0 @ unpack src values to 16-bit +vmovl.u8q1, d2 @ unpack src values to 16-bit +vmull.s16 q8, d0, d4 @ srcp[filterPos[0] + {0..7}] * filter[{0..7}] (part 1) +vmull.s16 q9, d1, d5 @ srcp[filterPos[0] + {0..7}] * filter[{0..7}] (part 2) +vmull.s16 q10, d2, d6@ srcp[filterPos[1] + {0..7}] * filter[{0..7}] (part 1) +vmull.s16 q11, d3, d7@ srcp[filterPos[1] + {0..7}] * filter[{0..7}] (part 2) +vpadd.s32 d16, d16, d17 @ horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 1) +vpadd.s32 d17, d18, d19 @ horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 2) +vpadd.s32 d20, d20, d21 @ horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 1) +vpadd.s32 d21, d22, d23 @ horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 2) +vadd.s32
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support
On Wed, Mar 23, 2016 at 6:16 PM, Matthieu Bouron wrote: > > > On Tue, Mar 22, 2016 at 10:04 AM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> >> >> On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron < >> matthieu.bou...@gmail.com> wrote: >> >>> From: Matthieu Bouron >>> >>> --- >>> >>> Hello, >>> >>> The following patch add hwaccel support to the mediacodec (h264) decoder >>> by allowing >>> the user to render the output frames directly on a surface. >>> >>> In order to do so the user needs to initialize the hwaccel through the >>> use of >>> av_mediacodec_alloc_context and av_mediacodec_default_init functions. >>> The later >>> takes a reference to an android/view/Surface as parameter. >>> >>> If the hwaccel successfully initialize, the decoder output frames pix >>> fmt will be >>> AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to >>> render >>> the frames on the surface: >>> >>> AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3]; >>> av_mediacodec_release_buffer(buffer, 1); >>> >>> The last argument of av_mediacodec_release_buffer enable rendering of the >>> buffer on the surface (or not if set to 0). >>> >>> Regarding the internal changes in the mediacodec decoder: >>> >>> MediaCodec.flush() discards both input and output buffers meaning that if >>> MediaCodec.flush() is called all output buffers the user has a reference >>> on are >>> now invalid (and cannot be used). >>> This behaviour does not fit well in the avcodec API. >>> >>> When the decoder is configured to output software buffers, there is no >>> issue as >>> the buffers are copied. >>> >>> Now when the decoder is configured to output to a surface, the user >>> might not >>> want to render all the frames as fast as the decoder can go and might >>> want to >>> control *when* the frame are rendered, so we need to make sure that the >>> MediaCodec.flush() call is delayed until all the frames the user retains >>> has >>> been released or rendered. >>> >>> Delaying the call to MediaCodec.flush() means buffering any inputs that >>> come >>> the decoder until the user has released/renderer the frame he retains. >>> >>> This is a limitation of this hwaccel implementation, if the user retains >>> a >>> frame (a), then issue a flush command to the decoder, the packets he >>> feeds to >>> the decoder at that point will be queued in the internal decoder packet >>> queue >>> (until he releases the frame (a)). This scenario leads to a memory usage >>> increase to say the least. >>> >>> Currently there is no limitation on the size of the internal decoder >>> packet >>> queue but this is something that can be added easily. Then, if the queue >>> is >>> full, what would be the behaviour of the decoder ? Can it block ? Or >>> should it >>> returns something like AVERROR(EAGAIN) ? >>> >>> About the other internal decoder changes I introduced: >>> >>> The MediaCodecDecContext is now refcounted (using the lavu/atomic api) >>> since >>> the (hwaccel) frames can be retained by the user, we need to delay the >>> destruction of the codec until the user has released all the frames he >>> has a >>> reference on. >>> The reference counter of the MediaCodecDecContext is incremented each >>> time an >>> (hwaccel) frame is outputted by the decoder and decremented each time a >>> (hwaccel) frame is released. >>> >>> Also, when the decoder is configured to output to a surface the pts that >>> are >>> given to the MediaCodec API are now rescaled based on the codec_timebase >>> as >>> those timestamps values are propagated to the frames rendered on the >>> surface >>> since Android M. Not sure if it's really useful though. >>> >>> On the performance side: >>> >>> On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps: >>> - software output + rgba conversion goes at 59~60fps >>> - surface output + render on a surface goes at 100~110fps >>> >>> >> [...] >> >> Patch updated with the following differences: >> * the public mediacodec api is now always built (not only when >> mediacodec is available) (and the build when mediacodec is not available >> has been fixed) >> * the documentation of av_mediacodec_release_buffer has been improved a >> bit >> > > Patch updated with the following differences: > MediaCodecBuffer->released type is now a volatile int (instead of a int*) > MediaCodecContext->refcount type is now a volatile int (instead of a > int*) > Ping. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
On Fri, Apr 1, 2016 at 4:15 PM, Matthieu Bouron wrote: > > > On Mon, Mar 28, 2016 at 9:12 PM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> >> >> On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron < >> matthieu.bou...@gmail.com> wrote: >> >>> >>> >>> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron < >>> matthieu.bou...@gmail.com> wrote: >>> >>>> The following patchset aims to make bitexact the yuv->rgba armv7 neon >>>> code path >>>> with the aarch64 one. It also aims to make the two code bases as close >>>> as >>>> possible. >>>> >>>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path >>>> >>>> The current 32bit code path which is unused is removed. >>>> >>>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time >>>> >>>> The code process only one line at a time for the yuv420p,nv12 and nv21 >>>> formats >>>> with no regression in performance observed on a rpi2 (I've even >>>> observed a >>>> slight increase of performance for the nv12 and nv21 formats). >>>> >>>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its >>>> >>>> The last patch of the serie makes the code bitexact with the aarch64 >>>> version. >>>> The increase of precision (which introduces a performance loss) is >>>> compensated >>>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. >>>> >>>> ./ffmpeg_g -nostats -f lavfi -i >>>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >>>> null - >>>> >>>> without patchset : >>>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 >>>> >>>> with patchset: >>>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 >>> >>> >>> I've managed tu run the code on a beagle bone black board, here are the >>> results: >>> >>> nv12->bgra >>> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 >>> max:0.032600 min:0.011513 >>> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 >>> avg:0.013659 max:0.034427 min:0.013411 >>> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 >>> avg:0.012751 max:0.034288 min:0.012523 >>> >>> yuv420p->bgra >>> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 >>> max:0.033866 min:0.012945 >>> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 >>> avg:0.015358 max:0.036186 min:0.015134 >>> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 >>> avg:0.014784 max:0.035487 min:0.014568 >>> >>> So it looks like processing one line at a time as negative effect on >>> performance on this board (as opposed to the rpi2). I'll try to keep the >>> two line processing code and post some result (so we can decide, which >>> version to choose). >>> >> >> I've managed to update the patchset to keep processing two line at a time >> for the nv12,nv21 and yuv420p formats, here are the results: >> >> ./ffmpeg_g -nostats -f lavfi -i >> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >> null - >> >> Beagle bone black: >> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 >> max:0.032600 min:0.011513 >> with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 >> max:0.034288 min:0.012523 >> with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 >> max:0.032124 min:0.011202 >> >> Nexus5: >> without patchset: avg: ~2,869ms >> with patchset v1: avg: ~3,008ms >> with patchset v2: avg: ~2,702ms >> >> RPI2: >> without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 >> max:0.039399 min:0.020605 >> with patchset v1: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 >> max:0.037472 min:0.01884 >> with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184 >> min:0.020768 >> >> Given the following the results, i will drop the current patchset and >> submit another one (which keeps processing two lines at a time). >> > > I will push the updated patchset (which takes into account Benoit's > comments) in one hour~. > Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
On Mon, Mar 28, 2016 at 9:12 PM, Matthieu Bouron wrote: > > > On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> >> >> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron < >> matthieu.bou...@gmail.com> wrote: >> >>> The following patchset aims to make bitexact the yuv->rgba armv7 neon >>> code path >>> with the aarch64 one. It also aims to make the two code bases as close as >>> possible. >>> >>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path >>> >>> The current 32bit code path which is unused is removed. >>> >>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time >>> >>> The code process only one line at a time for the yuv420p,nv12 and nv21 >>> formats >>> with no regression in performance observed on a rpi2 (I've even observed >>> a >>> slight increase of performance for the nv12 and nv21 formats). >>> >>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its >>> >>> The last patch of the serie makes the code bitexact with the aarch64 >>> version. >>> The increase of precision (which introduces a performance loss) is >>> compensated >>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. >>> >>> ./ffmpeg_g -nostats -f lavfi -i >>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >>> null - >>> >>> without patchset : >>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 >>> >>> with patchset: >>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 >> >> >> I've managed tu run the code on a beagle bone black board, here are the >> results: >> >> nv12->bgra >> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 >> max:0.032600 min:0.011513 >> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659 >> max:0.034427 min:0.013411 >> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 >> avg:0.012751 max:0.034288 min:0.012523 >> >> yuv420p->bgra >> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866 >> min:0.012945 >> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 >> avg:0.015358 max:0.036186 min:0.015134 >> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 >> avg:0.014784 max:0.035487 min:0.014568 >> >> So it looks like processing one line at a time as negative effect on >> performance on this board (as opposed to the rpi2). I'll try to keep the >> two line processing code and post some result (so we can decide, which >> version to choose). >> > > I've managed to update the patchset to keep processing two line at a time > for the nv12,nv21 and yuv420p formats, here are the results: > > ./ffmpeg_g -nostats -f lavfi -i > testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f > null - > > Beagle bone black: > without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600 > min:0.011513 > with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288 > min:0.012523 > with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 max:0.032124 > min:0.011202 > > Nexus5: > without patchset: avg: ~2,869ms > with patchset v1: avg: ~3,008ms > with patchset v2: avg: ~2,702ms > > RPI2: > without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 > min:0.020605 > with patchset v1: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 > max:0.037472 min:0.01884 > with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184 > min:0.020768 > > Given the following the results, i will drop the current patchset and > submit another one (which keeps processing two lines at a time). > I will push the updated patchset (which takes into account Benoit's comments) in one hour~. Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2 8/9] swscale/arm/yuv2rgb: save a few instructions by processing the luma line interleaved
On Thu, Mar 31, 2016 at 11:17 AM, Benoit Fouet wrote: > Hi, > > On 28/03/2016 21:19, Matthieu Bouron wrote: > >> --- >> libswscale/arm/yuv2rgb_neon.S | 88 >> +-- >> 1 file changed, 34 insertions(+), 54 deletions(-) >> >> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S >> index 124d7d3..6b911c8 100644 >> --- a/libswscale/arm/yuv2rgb_neon.S >> +++ b/libswscale/arm/yuv2rgb_neon.S >> >> [...] >> >> @@ -94,25 +67,29 @@ >> .ifc \ofmt,bgra >> compute_rgbad8, d7, d6, d9, d12, d11, d10, d13 >> .endif >> + >> +vzip.8 d6, d10 >> +vzip.8 d7, d11 >> +vzip.8 d8, d12 >> +vzip.8 d9, d13 >> > > Adding a comment to explain the resulting interleaving would be nice Added locally: +vzip.8 d6, d10@ d6 = R1R2R3R4R5R6R7R8 d10 = R9R10R11R12R13R14R15R16 +vzip.8 d7, d11@ d7 = G1G2G3G4G5G6G7G8 d11 = G9G10G11G12G13G14G15G16 +vzip.8 d8, d12@ d8 = B1B2B3B4B5B6B7B8 d12 = B9B10B11B12B13B14B15B16 +vzip.8 d9, d13@ d9 = A1A2A3A4A5A6A7A8 d13 = A9A10A11A12A13A14A15A16 > > > vst4.8 {q3, q4}, [\dst,:128]! >> vst4.8 {q5, q6}, [\dst,:128]! >> - >> .endm >> .macro process_1l ofmt >> -compute_premult d28, d29, d30, d31 >> -vld1.8 {q7}, [r4]! >> -compute r2, d14, d15, \ofmt >> +compute_premult >> +vld2.8 {d14, d15}, [r4]! >> +compute r2, \ofmt >> .endm >> .macro process_2l ofmt >> -compute_premult d28, d29, d30, d31 >> +compute_premult >> -vld1.8 {q7}, [r4]! >> @ first line of luma >> -compute r2, d14, d15, \ofmt >> +vld2.8 {d14, d15}, [r4]! @ >> q7 = Y (interleaved) >> +compute r2, \ofmt >> -vld1.8 {q7}, [r12]! >> @ second line of luma >> -compute r11, d14, d15, \ofmt >> +vld2.8 {d14, d15}, [r12]! @ >> q7 = Y (interleaved) >> +compute r11, \ofmt >> .endm >> >> > > What about adding a level of macro here? Something like: > .macro process_1l_internal ofmt src_addr res > compute_premult > vld2.8{d14, d15}, [\src_addr]! > compute\res, \ofmt > .endm > > (again, the naming could be changed, according to your own taste :-) ) > > This way, we would get: > .macro process_1l ofmt > process_1l_internal \ofmt, r4, r2 > .endm > > .macro process_2l ofmt > process_1l_internal \ofmt, r4, r2 > process_1l_internal \ofmt, r12, r11 > .endm Added locally: process_1l_16px_internal added to the macro-ify patch and then renamed to process_1l_internal in a later patch. Thanks, Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2 6/9] swscale/arm/yuv2rgb: macro-ify
On Thu, Mar 31, 2016 at 10:48 AM, Benoit Fouet wrote: > Hi, > > (sorry for the first mail, fuzzy fingers...) > > On 28/03/2016 21:19, Matthieu Bouron wrote: > >> --- >> libswscale/arm/yuv2rgb_neon.S | 137 >> ++ >> 1 file changed, 60 insertions(+), 77 deletions(-) >> >> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S >> index ef7b0a6..e1b68c1 100644 >> --- a/libswscale/arm/yuv2rgb_neon.S >> +++ b/libswscale/arm/yuv2rgb_neon.S >> @@ -64,7 +64,7 @@ >> vmov.u8 \a2, #255 >> .endm >> -.macro compute_16px dst y0 y1 ofmt >> +.macro compute dst y0 y1 ofmt >> vmovl.u8q14, \y0 >> @ 8px of y >> vmovl.u8q15, \y1 >> @ 8px of y >> @@ -99,23 +99,23 @@ >> .endm >> -.macro process_1l_16px ofmt >> +.macro process_1l ofmt >> compute_premult d28, d29, d30, d31 >> vld1.8 {q7}, [r4]! >> -compute_16pxr2, d14, d15, \ofmt >> +compute r2, d14, d15, \ofmt >> .endm >> -.macro process_2l_16px ofmt >> +.macro process_2l ofmt >> compute_premult d28, d29, d30, d31 >> vld1.8 {q7}, [r4]! >> @ first line of luma >> -compute_16pxr2, d14, d15, \ofmt >> +compute r2, d14, d15, \ofmt >> vld1.8 {q7}, [r12]! >> @ second line of luma >> -compute_16pxr11, d14, d15, \ofmt >> +compute r11, d14, d15, \ofmt >> .endm >> >> > > This renaming could be split > Splitted locally. > > [...] > > > @@ -232,68 +204,79 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 >> vld1.8 d3, [r10]! >> @ d3: chroma blue line >> vsubl.u8q14, d2, d10 >> @ q14 = U - 128 >> vsubl.u8q15, d3, d10 >> @ q15 = V - 128 >> +.endm >> -process_2l_16px \ofmt >> -.endif >> - >> -.ifc \ifmt,yuv422p >> +.macro load_chroma_yuv422p >> pld [r10, #64*3] >> vld1.8 d2, [r6]! >> @ d2: chroma red line >> vld1.8 d3, [r10]! >> @ d3: chroma blue line >> vsubl.u8q14, d2, d10 >> @ q14 = U - 128 >> vsubl.u8q15, d3, d10 >> @ q15 = V - 128 >> +.endm >> -process_1l_16px \ofmt >> -.endif >> - >> -subsr8, r8, #16@ >> width -= 16 >> -bgt 2b >> - >> -add r2, r2, r3 @ >> dst += padding >> -add r4, r4, r5 @ >> srcY += paddingY >> - >> -.ifc \ifmt,nv12 >> +.macro increment_nv12 >> > > How about increment_and test_nv12? Same for the other ones. > (I'm not happy with the name I found, but am trying to come up with a > solution to have a more explicit naming) Renamed to increment_and_test_* locally. Thanks, Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats
On Wed, Mar 30, 2016 at 11:36:34PM +0200, Benoit Fouet wrote: > Hi, Hi Benoit, > > Le 26/03/2016 13:05, Matthieu Bouron a écrit : > >On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer >>>wrote: > >>>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote: > >>>> >From: Matthieu Bouron > >>>> > > >>>> >--- > >>>> > libswscale/arm/yuv2rgb_neon.S | 89 > >>>--- > >>>> > 1 file changed, 24 insertions(+), 65 deletions(-) > >>> > >>>breaks build > >>> > >>> make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/ > >>>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon > >>>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux > >>>--enable-cross-compile && make -j12 > >>> > >>>CC libavutil/arm/float_dsp_init_arm.o > >>>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages: > >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > >>>instruction should be in IT block -- `subeq r6,r6,r0' > >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > >>>instruction should be in IT block -- `addne r6,r7' > >>> > >[...] > > > >Patch updated with the relevant it instructions added. It still does build > >on my rpi2 setup but is not tested on the same setup as yours. > >Can you confirm it builds/works on your setup ? > > > >If it works, i will send an updated version of the next patch (07/10) to > >resolve the conflicts. > > > >Matthieu > > > >0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch > > > > > > From 7b3a405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001 > >From: Matthieu Bouron > >Date: Wed, 23 Mar 2016 11:26:13 + > >Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time > > for the yuv420p and nv{12,21} formats > > > >--- > > libswscale/arm/yuv2rgb_neon.S | 92 > > +-- > > 1 file changed, 27 insertions(+), 65 deletions(-) > > > >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S > >index ef7b0a6..6aeccae 100644 > >--- a/libswscale/arm/yuv2rgb_neon.S > >+++ b/libswscale/arm/yuv2rgb_neon.S > >@@ -105,16 +105,6 @@ > > compute_16pxr2, d14, d15, \ofmt > > .endm > >-.macro process_2l_16px ofmt > >-compute_premult d28, d29, d30, d31 > >- > >-vld1.8 {q7}, [r4]!@ > >first line of luma > >-compute_16pxr2, d14, d15, \ofmt > >- > >-vld1.8 {q7}, [r12]! @ > >second line of luma > >-compute_16pxr11, d14, d15, \ofmt > >-.endm > >- > > .macro load_args_nvx > > push{r4-r12, lr} > > vpush {q4-q7} > >@@ -127,13 +117,9 @@ > > ldr r10,[sp, #128] @ > > r10 = y_coeff > > vdup.16 d0, r10@ > > d0 = y_coeff > > vld1.16 {d1}, [r8] @ > > d1 = *table > >-add r11, r2, r3@ > >r11 = dst + linesize (dst2) > >-add r12, r4, r5@ > >r12 = srcY + linesizeY (srcY2) > > Nit: this lets r11 and r12 unused by the NV conversions. It should be > possible not to push/pop them > If not (which I would certainly understand), what would you think about > moving the registers save out of the 'load_args_*' macro? > It seems weird to have all the push/vpush that are not factored, and the > pop/vpop that is done in only one place, at the end of each function. Thanks for the review, I unfortunately dropped this part of the patch set, processing only one line at a time proved to be slower on devices other than the rpi2. (I will keep your remark in mind if I ever switch back to processing only one line at a time for all formats). The v2 patch set is in reply of the following thread: https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192272.html Would you mind taking a look at it ? Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 02/10] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p
On Wed, Mar 30, 2016 at 11:34:59PM +0200, Benoit Fouet wrote: > Hi, > > Le 25/03/2016 23:45, Matthieu Bouron a écrit : > >From: Matthieu Bouron > > > >--- > > libswscale/arm/yuv2rgb_neon.S | 9 - > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S > >index f40327b..aac0773 100644 > >--- a/libswscale/arm/yuv2rgb_neon.S > >+++ b/libswscale/arm/yuv2rgb_neon.S > >@@ -172,11 +172,10 @@ > > vdup.16 d0, r10@ > > d0 = y_coeff > > vld1.16 {d1}, [r8] @ > > d1 = *table > > add r11, r2, r3@ > > r11 = dst + linesize (dst2) > >-lsl r8, r0, #2 > >-sub r3, r3, r8 @ r3 > >= linesize * 2 - width * 4 (padding) > >-sub r5, r5, r0 @ r5 > >= linesizeY * 2 - width (paddingY) > >-sub r7, r7, r0, lsr #1 @ r7 > >= linesizeU - width / 2 (paddingU) > >-sub r12,r12,r0, lsr #1 @ > >r12 = linesizeV- width / 2 (paddingV) > >+sub r3, r3, r0, lsl #2 @ r3 > > = linesize - width * 4 (padding) > >+sub r5, r5, r0 @ r5 > > = linesizeY - width (paddingY) > >+sub r7, r7, r0, lsr #1 @ r7 > > = linesizeU - width / 2 (paddingU) > >+sub r12,r12,r0, lsr #1 @ > >r12 = linesizeV - width / 2 (paddingV) > > ldr r10,[sp, #120] @ > > r10 = srcV > > .endm > > nit: it would be cool to split: one for the comments and the other one for > the lsl factorization. Splitted locally in the v2 patch set. Thanks, Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 6/9] swscale/arm/yuv2rgb: macro-ify
--- libswscale/arm/yuv2rgb_neon.S | 137 ++ 1 file changed, 60 insertions(+), 77 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index ef7b0a6..e1b68c1 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -64,7 +64,7 @@ vmov.u8 \a2, #255 .endm -.macro compute_16px dst y0 y1 ofmt +.macro compute dst y0 y1 ofmt vmovl.u8q14, \y0 @ 8px of y vmovl.u8q15, \y1 @ 8px of y @@ -99,23 +99,23 @@ .endm -.macro process_1l_16px ofmt +.macro process_1l ofmt compute_premult d28, d29, d30, d31 vld1.8 {q7}, [r4]! -compute_16pxr2, d14, d15, \ofmt +compute r2, d14, d15, \ofmt .endm -.macro process_2l_16px ofmt +.macro process_2l ofmt compute_premult d28, d29, d30, d31 vld1.8 {q7}, [r4]!@ first line of luma -compute_16pxr2, d14, d15, \ofmt +compute r2, d14, d15, \ofmt vld1.8 {q7}, [r12]! @ second line of luma -compute_16pxr11, d14, d15, \ofmt +compute r11, d14, d15, \ofmt .endm -.macro load_args_nvx +.macro load_args_nv12 push{r4-r12, lr} vpush {q4-q7} ldr r4, [sp, #104] @ r4 = srcY @@ -136,6 +136,10 @@ sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm +.macro load_args_nv21 +load_args_nv12 +.endm + .macro load_args_yuv420p push{r4-r12, lr} vpush {q4-q7} @@ -176,55 +180,23 @@ ldr r10,[sp, #120] @ r10 = srcV .endm -.macro declare_func ifmt ofmt -function ff_\ifmt\()_to_\ofmt\()_neon, export=1 - -.ifc \ifmt,nv12 -load_args_nvx -.endif - -.ifc \ifmt,nv21 -load_args_nvx -.endif - -.ifc \ifmt,yuv420p -load_args_yuv420p -.endif - - -.ifc \ifmt,yuv422p -load_args_yuv422p -.endif - -1: -mov r8, r0 @ r8 = width -2: -pld [r6, #64*3] -pld [r4, #64*3] - -vmov.i8 d10, #128 - -.ifc \ifmt,nv12 +.macro load_chroma_nv12 pld [r12, #64*3] vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_2l_16px \ofmt -.endif - -.ifc \ifmt,nv21 +.macro load_chroma_nv21 pld [r12, #64*3] vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d3, d10 @ q14 = U - 128 vsubl.u8q15, d2, d10 @ q15 = V - 128 +.endm -process_2l_16px \ofmt -.endif - -.ifc \ifmt,yuv420p +.macro load_chroma_yuv420p pld [r10, #64*3] pld [r12, #64*3] @@ -232,68 +204,79 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 vld1.8 d3, [r10]! @ d3: chroma blue line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_2l_16px \ofmt -.endif - -.ifc \ifmt,yuv422p +.macro load_chroma_yuv422p pld [r10, #64*3] vld1.8 d2, [r6]! @ d2: chroma red line vld1.8 d3, [r10]! @ d3: chroma blue line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -subsr8, r8, #16@ width -= 16 -bgt 2b - -add r2, r2, r3 @ dst += padding -add r4, r4, r5 @ srcY += paddingY - -.ifc \ifmt,nv12 +.macro increment_nv12 add r11, r11, r3 @ dst2 += padding add r12, r12, r5 @ srcY2 += paddingY - add r6, r6, r7 @ srcC += paddingC - subsr1, r1, #2 @ height -= 2 -.endif - -.ifc \ifmt,nv21 -add r11, r11, r3 @
[FFmpeg-devel] [PATCH v2 2/9] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index f40327b..aac0773 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -172,11 +172,10 @@ vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table add r11, r2, r3@ r11 = dst + linesize (dst2) -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) -sub r12,r12,r0, lsr #1 @ r12 = linesizeV- width / 2 (paddingV) +sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) +sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) +sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) +sub r12,r12,r0, lsr #1 @ r12 = linesizeV - width / 2 (paddingV) ldr r10,[sp, #120] @ r10 = srcV .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 7/9] swscale/arm/yuv2rgb: re-order compute_rgba macro arguments
--- libswscale/arm/yuv2rgb_neon.S | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index e1b68c1..124d7d3 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -56,8 +56,8 @@ vqrshrun.s16\dst_comp2, q2, #6 .endm -.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 -compute_color \r1, \r2, q8, q9 +.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 +compute_color \r1, \r2, q8, q9 compute_color \g1, \g2, q10, q11 compute_color \b1, \b2, q12, q13 vmov.u8 \a1, #255 @@ -80,19 +80,19 @@ .ifc \ofmt,argb -compute_rgbad7, d11, d8, d12, d9, d13, d6, d10 +compute_rgbad7, d8, d9, d6, d11, d12, d13, d10 .endif .ifc \ofmt,rgba -compute_rgbad6, d10, d7, d11, d8, d12, d9, d13 +compute_rgbad6, d7, d8, d9, d10, d11, d12, d13 .endif .ifc \ofmt,abgr -compute_rgbad9, d13, d8, d12, d7, d11, d6, d10 +compute_rgbad9, d8, d7, d6, d13, d12, d11, d10 .endif .ifc \ofmt,bgra -compute_rgbad8, d12, d7, d11, d6, d10, d9, d13 +compute_rgbad8, d7, d6, d9, d12, d11, d10, d13 .endif vst4.8 {q3, q4}, [\dst,:128]! vst4.8 {q5, q6}, [\dst,:128]! -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 4/9] swscale/arm/yuv2rgb: factorize lsl in load_args_yuv420p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 22864ec..4601a79 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -152,8 +152,7 @@ add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) lsl r3, r3, #1 lsl r5, r5, #1 -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) +sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) ldr r10,[sp, #120] @ r10 = srcV .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 5/9] swscale/arm/yuv2rgb: factorize lsl in load_args_nvx
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 4601a79..ef7b0a6 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -131,8 +131,7 @@ add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) lsl r3, r3, #1 lsl r5, r5, #1 -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) +sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 1/9] swscale/arm/yuv2rgb: remove 32bit code path
From: Matthieu Bouron --- libswscale/arm/swscale_unscaled.c | 72 -- libswscale/arm/yuv2rgb_neon.S | 156 -- 2 files changed, 66 insertions(+), 162 deletions(-) diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c index 8aa933c..149208c 100644 --- a/libswscale/arm/swscale_unscaled.c +++ b/libswscale/arm/swscale_unscaled.c @@ -61,14 +61,14 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[ return 0; } -#define YUV_TO_RGB_TABLE(precision) \ -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), \ - -#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) \ -int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, \ +#define YUV_TO_RGB_TABLE \ +c->yuv2rgb_v2r_coeff / (1 << 7), \ +c->yuv2rgb_u2g_coeff / (1 << 7), \ +c->yuv2rgb_v2g_coeff / (1 << 7), \ +c->yuv2rgb_u2b_coeff / (1 << 7), \ + +#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt) \ +int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ uint8_t *dst, int linesize, \ const uint8_t *srcY, int linesizeY, \ const uint8_t *srcU, int linesizeU, \ @@ -77,37 +77,34 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, int y_offset, \ int y_coeff); \ \ -static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, const uint8_t *src[], \ +static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \ int srcStride[], int srcSliceY, int srcSliceH, \ uint8_t *dst[], int dstStride[]) { \ -const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; \ +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \ \ -ff_##ifmt##_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, \ +ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ dst[0] + srcSliceY * dstStride[0], dstStride[0], \ src[0], srcStride[0], \ src[1], srcStride[1], \ src[2], srcStride[2], \ yuv2rgb_table, \ c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / ((precision) == 16 ? 1 << 7 : 1));\ + c->yuv2rgb_y_coeff / (1 << 7)); \ \ return 0; \ } \ -#define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, b
[FFmpeg-devel] [PATCH v2 3/9] swscale/arm/yuv2rgb: remove unused store of dst + linesize in load_args_yuv422p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 1 - 1 file changed, 1 deletion(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index aac0773..22864ec 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -171,7 +171,6 @@ ldr r10,[sp, #136] @ r10 = y_coeff vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: fix zero stride for OMX.allwinner.video.decoder.avc
On Mon, Mar 28, 2016 at 07:51:24PM +0300, Kirill Gavrilov wrote: > --- > libavcodec/mediacodecdec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c > index 5c1368f..c21ceba 100644 > --- a/libavcodec/mediacodecdec.c > +++ b/libavcodec/mediacodecdec.c > @@ -247,7 +247,7 @@ static int mediacodec_dec_parse_format(AVCodecContext > *avctx, MediaCodecDecConte > av_freep(&format); > return AVERROR_EXTERNAL; > } > -s->stride = value >= 0 ? value : s->width; > +s->stride = value > 0 ? value : s->width; > > if (!ff_AMediaFormat_getInt32(s->format, "slice-height", &value)) { > format = ff_AMediaFormat_toString(s->format); > -- > 2.6.1.windows.1 Applied, thanks. Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 9/9] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
--- libswscale/arm/swscale_unscaled.c | 18 +- libswscale/arm/yuv2rgb_neon.S | 40 +-- 2 files changed, 31 insertions(+), 27 deletions(-) diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c index 149208c..e1597ab 100644 --- a/libswscale/arm/swscale_unscaled.c +++ b/libswscale/arm/swscale_unscaled.c @@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[ } #define YUV_TO_RGB_TABLE \ -c->yuv2rgb_v2r_coeff / (1 << 7), \ -c->yuv2rgb_u2g_coeff / (1 << 7), \ -c->yuv2rgb_v2g_coeff / (1 << 7), \ -c->yuv2rgb_u2b_coeff / (1 << 7), \ +c->yuv2rgb_v2r_coeff, \ +c->yuv2rgb_u2g_coeff, \ +c->yuv2rgb_v2g_coeff, \ +c->yuv2rgb_u2b_coeff, \ #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt) \ int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ @@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], src[1], srcStride[1], \ src[2], srcStride[2], \ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7)); \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ @@ -117,12 +117,12 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], uint8_t *dst[], int dstStride[]) { \ const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \ \ -ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ +ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ dst[0] + srcSliceY * dstStride[0], dstStride[0], \ src[0], srcStride[0], src[1], srcStride[1], \ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7)); \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 6b911c8..741928d 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -23,17 +23,20 @@ .macro compute_premult -vmul.s16q8, q15, d1[0] @ q8 = V * v2r -vmul.s16q9, q14, d1[1] @ q9 = U * u2g -vmla.s16q9, q15, d1[2] @ q9 = U * u2g + V * v2g -vmul.s16q10,q14, d1[3] @ q10 = U * u2b +vsub.u16q14,q11@ q14 = U * (1 << 3) - 128 * (1 << 3) +vsub.u16q15,q11@ q15 = V * (1 << 3) - 128 * (1 << 3) +vqdmulh.s16 q8, q15, d1[0]
[FFmpeg-devel] [PATCH v2 8/9] swscale/arm/yuv2rgb: save a few instructions by processing the luma line interleaved
--- libswscale/arm/yuv2rgb_neon.S | 88 +-- 1 file changed, 34 insertions(+), 54 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 124d7d3..6b911c8 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -22,62 +22,35 @@ #include "libavutil/arm/asm.S" -.macro compute_premult half_u1, half_u2, half_v1, half_v2 -vmovd2, \half_u1 @ copy left q14 to left q1 -vmovd3, \half_u1 @ copy left q14 to right q1 -vmovd4, \half_u2 @ copy right q14 to left q2 -vmovd5, \half_u2 @ copy right q14 to right q2 - -vmovd6, \half_v1 @ copy left q15 to left q3 -vmovd7, \half_v1 @ copy left q15 to right q3 -vmovd8, \half_v2 @ copy right q15 to left q4 -vmovd9, \half_v2 @ copy right q15 to right q4 - -vzip.16 d2, d3 @ U1U1U2U2U3U3U4U4 -vzip.16 d4, d5 @ U5U5U6U6U7U7U8U8 - -vzip.16 d6, d7 @ V1V1V2V2V3V3V4V4 -vzip.16 d8, d9 @ V5V5V6V6V7V7V8V8 - -vmul.s16q8, q3, d1[0] @ V * v2r (left, red) -vmul.s16q9, q4, d1[0] @ V * v2r (right, red) -vmul.s16q10, q1, d1[1] @ U * u2g -vmul.s16q11, q2, d1[1] @ U * u2g -vmla.s16q10, q3, d1[2] @ U * u2g + V * v2g (left, green) -vmla.s16q11, q4, d1[2] @ U * u2g + V * v2g (right, green) -vmul.s16q12, q1, d1[3] @ U * u2b (left, blue) -vmul.s16q13, q2, d1[3] @ U * u2b (right, blue) +.macro compute_premult +vmul.s16q8, q15, d1[0] @ q8 = V * v2r +vmul.s16q9, q14, d1[1] @ q9 = U * u2g +vmla.s16q9, q15, d1[2] @ q9 = U * u2g + V * v2g +vmul.s16q10,q14, d1[3] @ q10 = U * u2b .endm -.macro compute_color dst_comp1 dst_comp2 pre1 pre2 -vadd.s16q1, q14, \pre1 -vadd.s16q2, q15, \pre2 +.macro compute_color dst_comp1 dst_comp2 pre +vadd.s16q1, q14, \pre +vadd.s16q2, q15, \pre vqrshrun.s16\dst_comp1, q1, #6 vqrshrun.s16\dst_comp2, q2, #6 .endm .macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 -compute_color \r1, \r2, q8, q9 -compute_color \g1, \g2, q10, q11 -compute_color \b1, \b2, q12, q13 +compute_color \r1, \r2, q8 +compute_color \g1, \g2, q9 +compute_color \b1, \b2, q10 vmov.u8 \a1, #255 vmov.u8 \a2, #255 .endm -.macro compute dst y0 y1 ofmt -vmovl.u8q14, \y0 @ 8px of y -vmovl.u8q15, \y1 @ 8px of y - -vdup.16 q5, r9 @ q5 = y_offset -vmovd14, d0@ q7 = y_coeff -vmovd15, d0@ q7 = y_coeff - -vsub.s16q14, q5 -vsub.s16q15, q5 - -vmul.s16q14, q7@ q14 = (srcY - y_offset) * y_coeff (left) -vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right) - +.macro compute dst ofmt +vmovl.u8q14, d14 @ q14 = Y +vmovl.u8q15, d15 @ q15 = Y +vsub.s16q14, q12 @ q14 = (srcY - y_offset) +vsub.s16q15, q12 @ q15 = (srcY - y_offset) +vmul.s16q14, q13 @ q14 = (srcY - y_offset) * y_coeff (left) +vmul.s16q15, q13 @ q15 = (srcY - y_offset) * y_coeff (right) .ifc \ofmt,argb compute_rgba
Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron wrote: > > > On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> The following patchset aims to make bitexact the yuv->rgba armv7 neon >> code path >> with the aarch64 one. It also aims to make the two code bases as close as >> possible. >> >> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path >> >> The current 32bit code path which is unused is removed. >> >> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time >> >> The code process only one line at a time for the yuv420p,nv12 and nv21 >> formats >> with no regression in performance observed on a rpi2 (I've even observed a >> slight increase of performance for the nv12 and nv21 formats). >> >> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its >> >> The last patch of the serie makes the code bitexact with the aarch64 >> version. >> The increase of precision (which introduces a performance loss) is >> compensated >> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. >> >> ./ffmpeg_g -nostats -f lavfi -i >> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >> null - >> >> without patchset : >> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 >> >> with patchset: >> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 > > > I've managed tu run the code on a beagle bone black board, here are the > results: > > nv12->bgra > without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600 > min:0.011513 > with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659 > max:0.034427 min:0.013411 > with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 > max:0.034288 min:0.012523 > > yuv420p->bgra > without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866 > min:0.012945 > with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 avg:0.015358 > max:0.036186 min:0.015134 > with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 avg:0.014784 > max:0.035487 min:0.014568 > > So it looks like processing one line at a time as negative effect on > performance on this board (as opposed to the rpi2). I'll try to keep the > two line processing code and post some result (so we can decide, which > version to choose). > I've managed to update the patchset to keep processing two line at a time for the nv12,nv21 and yuv420p formats, here are the results: ./ffmpeg_g -nostats -f lavfi -i testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f null - Beagle bone black: without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600 min:0.011513 with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288 min:0.012523 with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 max:0.032124 min:0.011202 Nexus5: without patchset: avg: ~2,869ms with patchset v1: avg: ~3,008ms with patchset v2: avg: ~2,702ms RPI2: without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 with patchset v1: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184 min:0.020768 Given the following the results, i will drop the current patchset and submit another one (which keeps processing two lines at a time). Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: fix zero stride for OMX.allwinner.video.decoder.avc
On Sun, Mar 27, 2016 at 11:15 PM, Kirill Gavrilov wrote: > Hi, > Hi, > > on my device ("OMX.allwinner.video.decoder.avc") returned stride property > is always 0. > I have found that stride is overridden for "OMX.SEC.avc.dec" and prepared > the similar patch. > But probably it is better to change comparison at the line above to "value > > 0"? > >s->stride = value >= 0 ? value : s->width > I think it would be better to change the comparaison line. Can you send the relevant patch ? Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron wrote: > The following patchset aims to make bitexact the yuv->rgba armv7 neon code > path > with the aarch64 one. It also aims to make the two code bases as close as > possible. > > [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path > > The current 32bit code path which is unused is removed. > > [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time > > The code process only one line at a time for the yuv420p,nv12 and nv21 > formats > with no regression in performance observed on a rpi2 (I've even observed a > slight increase of performance for the nv12 and nv21 formats). > > [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its > > The last patch of the serie makes the code bitexact with the aarch64 > version. > The increase of precision (which introduces a performance loss) is > compensated > by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. > > ./ffmpeg_g -nostats -f lavfi -i > testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f > null - > > without patchset : > [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 > > with patchset: > [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 I've managed tu run the code on a beagle bone black board, here are the results: nv12->bgra without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600 min:0.011513 with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659 max:0.034427 min:0.013411 with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288 min:0.012523 yuv420p->bgra without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866 min:0.012945 with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 avg:0.015358 max:0.036186 min:0.015134 with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 avg:0.014784 max:0.035487 min:0.014568 So it looks like processing one line at a time as negative effect on performance on this board (as opposed to the rpi2). I'll try to keep the two line processing code and post some result (so we can decide, which version to choose). Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libswscale/arm/swscale_unscaled.c | 16 +++ > libswscale/arm/yuv2rgb_neon.S | 89 > +-- > 2 files changed, 47 insertions(+), 58 deletions(-) > > Patch updated (resolve a conflict with the updated version of patch 06/10). From 24b2371eb5ea859b2a68ef1ee3cf9a0098d9375a Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Wed, 23 Mar 2016 16:51:20 + Subject: [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part --- libswscale/arm/swscale_unscaled.c | 16 +++ libswscale/arm/yuv2rgb_neon.S | 89 +-- 2 files changed, 47 insertions(+), 58 deletions(-) diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c index 149208c..1986d65 100644 --- a/libswscale/arm/swscale_unscaled.c +++ b/libswscale/arm/swscale_unscaled.c @@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[ } #define YUV_TO_RGB_TABLE\ -c->yuv2rgb_v2r_coeff / (1 << 7),\ -c->yuv2rgb_u2g_coeff / (1 << 7),\ -c->yuv2rgb_v2g_coeff / (1 << 7),\ -c->yuv2rgb_u2b_coeff / (1 << 7),\ +c->yuv2rgb_v2r_coeff, \ +c->yuv2rgb_u2g_coeff, \ +c->yuv2rgb_v2g_coeff, \ +c->yuv2rgb_u2b_coeff, \ #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt) \ int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ @@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], src[1], srcStride[1], \ src[2], srcStride[2], \ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7));\ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ @@ -121,8 +121,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], dst[0] + srcSliceY * dstStride[0], dstStride[0], \ src[0], srcStride[0], src[1], srcStride[1],\ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7));\ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 4a5ce11..bd994e3 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -68,14 +68,14 @@ .macro load_chroma_nv12 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line -vsubl.u8q14, d2, d10 @ q14 = U - 128 -vsubl.u8q15, d3, d10 @ q15 = V - 128 +vshll.u8q14, d2, #3@ q14 = U * (1 << 3) +vshll.u8
Re: [FFmpeg-devel] [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the compute_rgba macro
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libswscale/arm/yuv2rgb_neon.S | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > Patch updated (resolve a conflict with the updated version of patch 06/10). From 41b0ff49706d82ef964faa75888e95d86f69df34 Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 25 Mar 2016 15:38:37 + Subject: [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the compute_rgba macro --- libswscale/arm/yuv2rgb_neon.S | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 6a15778..4a5ce11 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -123,7 +123,7 @@ vqrshrun.s16\dst_comp2, q2, #6 .endm -.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 +.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 compute_color \r1, \r2, q8, q9 compute_color \g1, \g2, q10, q11 compute_color \b1, \b2, q12, q13 @@ -178,19 +178,19 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right) .ifc \ofmt,argb -compute_rgbad7, d11, d8, d12, d9, d13, d6, d10 +compute_rgbad7, d8, d9, d6, d11, d12, d13, d10 .endif .ifc \ofmt,rgba -compute_rgbad6, d10, d7, d11, d8, d12, d9, d13 +compute_rgbad6, d7, d8, d9, d10, d11, d12, d13 .endif .ifc \ofmt,abgr -compute_rgbad9, d13, d8, d12, d7, d11, d6, d10 +compute_rgbad9, d8, d7, d6, d13, d12, d11, d10 .endif .ifc \ofmt,bgra -compute_rgbad8, d12, d7, d11, d6, d10, d9, d13 +compute_rgbad8, d7, d6, d9, d12, d11, d10, d13 .endif vst4.8 {q3, q4}, [r2,:128]! -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its aarch64 counter part
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libswscale/arm/yuv2rgb_neon.S | 154 > +++--- > 1 file changed, 69 insertions(+), 85 deletions(-) > Patch updated (resolve a conflict with the updated version of patch 06/10). From d06a5437f9042e0b350556e9642d52866284e7a8 Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Wed, 23 Mar 2016 14:10:45 + Subject: [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its aarch64 counter part --- libswscale/arm/yuv2rgb_neon.S | 154 +++--- 1 file changed, 69 insertions(+), 85 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 6279637..6a15778 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -21,90 +21,6 @@ #include "libavutil/arm/asm.S" - -.macro compute_premult half_u1, half_u2, half_v1, half_v2 -vmovd2, \half_u1 @ copy left q14 to left q1 -vmovd3, \half_u1 @ copy left q14 to right q1 -vmovd4, \half_u2 @ copy right q14 to left q2 -vmovd5, \half_u2 @ copy right q14 to right q2 - -vmovd6, \half_v1 @ copy left q15 to left q3 -vmovd7, \half_v1 @ copy left q15 to right q3 -vmovd8, \half_v2 @ copy right q15 to left q4 -vmovd9, \half_v2 @ copy right q15 to right q4 - -vzip.16 d2, d3 @ U1U1U2U2U3U3U4U4 -vzip.16 d4, d5 @ U5U5U6U6U7U7U8U8 - -vzip.16 d6, d7 @ V1V1V2V2V3V3V4V4 -vzip.16 d8, d9 @ V5V5V6V6V7V7V8V8 - -vmul.s16q8, q3, d1[0] @ V * v2r (left, red) -vmul.s16q9, q4, d1[0] @ V * v2r (right, red) -vmul.s16q10, q1, d1[1] @ U * u2g -vmul.s16q11, q2, d1[1] @ U * u2g -vmla.s16q10, q3, d1[2] @ U * u2g + V * v2g (left, green) -vmla.s16q11, q4, d1[2] @ U * u2g + V * v2g (right, green) -vmul.s16q12, q1, d1[3] @ U * u2b (left, blue) -vmul.s16q13, q2, d1[3] @ U * u2b (right, blue) -.endm - -.macro compute_color dst_comp1 dst_comp2 pre1 pre2 -vadd.s16q1, q14, \pre1 -vadd.s16q2, q15, \pre2 -vqrshrun.s16\dst_comp1, q1, #6 -vqrshrun.s16\dst_comp2, q2, #6 -.endm - -.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 -compute_color \r1, \r2, q8, q9 -compute_color \g1, \g2, q10, q11 -compute_color \b1, \b2, q12, q13 -vmov.u8 \a1, #255 -vmov.u8 \a2, #255 -.endm - -.macro compute_16px dst y0 y1 ofmt -vmovl.u8q14, \y0 @ 8px of y -vmovl.u8q15, \y1 @ 8px of y - -vdup.16 q5, r9 @ q5 = y_offset -vmovd14, d0@ q7 = y_coeff -vmovd15, d0@ q7 = y_coeff - -vsub.s16q14, q5 -vsub.s16q15, q5 - -vmul.s16q14, q7@ q14 = (srcY - y_offset) * y_coeff (left) -vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right) - - -.ifc \ofmt,argb -compute_rgbad7, d11, d8, d12, d9, d13, d6, d10 -.endif - -.ifc \ofmt,rgba -compute_rgbad6, d10, d7, d11, d8, d12, d9, d13 -.endif - -.ifc \ofmt,abgr -compute_rgbad9, d13, d8, d12, d7, d11, d6, d10 -.endif - -.ifc \ofmt,bgra -compute_rgbad8, d12, d7, d11, d6, d10, d9, d13 -.endif -vst4.8 {q3, q4}, [\dst,:128]! -vst4.8 {q5, q6}, [\dst,:128]! - -.endm - -.macro process_1l_16px ofmt -compute_premult d28, d29, d30, d31 -vld1.8 {q7}, [r4]! -compute_16pxr2, d14, d15, \ofmt -.endm - .macro load_args_nv12 push{r4-r12, lr} vpush {q4-q7} @@ -200,6 +116,21 @@
Re: [FFmpeg-devel] [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > libswscale/arm/yuv2rgb_neon.S | 115 > ++ > 1 file changed, 39 insertions(+), 76 deletions(-) > [...] Patch updated (resolve a conflict with the updated version of patch 06/10). From f8a7db56aba4b38089698c2f87583b071d03bf29 Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Wed, 23 Mar 2016 13:51:10 + Subject: [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify --- libswscale/arm/yuv2rgb_neon.S | 116 ++ 1 file changed, 39 insertions(+), 77 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 6aeccae..6279637 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -105,7 +105,7 @@ compute_16pxr2, d14, d15, \ofmt .endm -.macro load_args_nvx +.macro load_args_nv12 push{r4-r12, lr} vpush {q4-q7} ldr r4, [sp, #104] @ r4 = srcY @@ -122,6 +122,10 @@ sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm +.macro load_args_nv21 +load_args_nv12 +.endm + .macro load_args_yuv420p push{r4-r12, lr} vpush {q4-q7} @@ -146,116 +150,74 @@ load_args_yuv420p .endm -.macro declare_func ifmt ofmt -function ff_\ifmt\()_to_\ofmt\()_neon, export=1 - -.ifc \ifmt,nv12 -load_args_nvx -.endif - -.ifc \ifmt,nv21 -load_args_nvx -.endif - -.ifc \ifmt,yuv420p -load_args_yuv420p -.endif - - -.ifc \ifmt,yuv422p -load_args_yuv422p -.endif - -1: -mov r8, r0 @ r8 = width -2: -pld [r6, #64*3] -pld [r4, #64*3] - -vmov.i8 d10, #128 - -.ifc \ifmt,nv12 +.macro load_chroma_nv12 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -.ifc \ifmt,nv21 +.macro load_chroma_nv21 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d3, d10 @ q14 = U - 128 vsubl.u8q15, d2, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -.ifc \ifmt,yuv420p -pld [r10, #64*3] - -vld1.8 d2, [r6]! @ d2: chroma red line -vld1.8 d3, [r10]! @ d3: chroma blue line -vsubl.u8q14, d2, d10 @ q14 = U - 128 -vsubl.u8q15, d3, d10 @ q15 = V - 128 - -process_1l_16px \ofmt -.endif - -.ifc \ifmt,yuv422p +.macro load_chroma_yuv420p pld [r10, #64*3] vld1.8 d2, [r6]! @ d2: chroma red line vld1.8 d3, [r10]! @ d3: chroma blue line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -subsr8, r8, #16@ width -= 16 -bgt 2b - -add r2, r2, r3 @ dst += padding -add r4, r4, r5 @ srcY += paddingY - -.ifc \ifmt,nv12 -tst r1, #1 -ite eq -subeq r6, r6, r0 @ if (height % 2 == 0) paddingU -= width -addne r6, r7 @ else paddingU += linesizeU - width - -subsr1, r1, #1 @ height -= 1 -.endif +.macro load_chroma_yuv422p +load_chroma_yuv420p +.endm -.ifc \ifmt,nv21 +.macro increment_nv12 tst r1, #1 ite eq subeq r6, r6, r0 @ if (height % 2 == 0) paddingU -= width addne r6, r7 @ else paddingU += linesizeU - width +.endm -subsr1, r1, #1 @ height -= 1 -.endif +.macro increment_nv21 +increment_nv12 +.endm -.ifc \ifmt,yuv420p +.macro increment_yuv420p tst r1, #1 itete eq subeq
Re: [FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats
On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer wrote: > On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libswscale/arm/yuv2rgb_neon.S | 89 > --- > > 1 file changed, 24 insertions(+), 65 deletions(-) > > breaks build > > make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/ > --cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon > -mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux > --enable-cross-compile && make -j12 > > CC libavutil/arm/float_dsp_init_arm.o > src/libswscale/arm/yuv2rgb_neon.S: Assembler messages: > src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > instruction should be in IT block -- `subeq r6,r6,r0' > src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional > instruction should be in IT block -- `addne r6,r7' > [...] Patch updated with the relevant it instructions added. It still does build on my rpi2 setup but is not tested on the same setup as yours. Can you confirm it builds/works on your setup ? If it works, i will send an updated version of the next patch (07/10) to resolve the conflicts. Matthieu From 7b3a405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Wed, 23 Mar 2016 11:26:13 + Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12,21} formats --- libswscale/arm/yuv2rgb_neon.S | 92 +-- 1 file changed, 27 insertions(+), 65 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index ef7b0a6..6aeccae 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -105,16 +105,6 @@ compute_16pxr2, d14, d15, \ofmt .endm -.macro process_2l_16px ofmt -compute_premult d28, d29, d30, d31 - -vld1.8 {q7}, [r4]!@ first line of luma -compute_16pxr2, d14, d15, \ofmt - -vld1.8 {q7}, [r12]! @ second line of luma -compute_16pxr11, d14, d15, \ofmt -.endm - .macro load_args_nvx push{r4-r12, lr} vpush {q4-q7} @@ -127,13 +117,9 @@ ldr r10,[sp, #128] @ r10 = y_coeff vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) -add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) -lsl r3, r3, #1 -lsl r5, r5, #1 -sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) +sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) +sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) +sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm .macro load_args_yuv420p @@ -142,26 +128,6 @@ ldr r4, [sp, #104] @ r4 = srcY ldr r5, [sp, #108] @ r5 = linesizeY ldr r6, [sp, #112] @ r6 = srcU -ldr r8, [sp, #128] @ r8 = table -ldr r9, [sp, #132] @ r9 = y_offset -ldr r10,[sp, #136] @ r10 = y_coeff -vdup.16 d0, r10@ d0 = y_coeff -vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) -add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) -lsl r3, r3, #1 -lsl r5, r5, #1 -sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -ldr r10,[sp, #120]
[FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 89 --- 1 file changed, 24 insertions(+), 65 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index ef7b0a6..8abb986 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -105,16 +105,6 @@ compute_16pxr2, d14, d15, \ofmt .endm -.macro process_2l_16px ofmt -compute_premult d28, d29, d30, d31 - -vld1.8 {q7}, [r4]!@ first line of luma -compute_16pxr2, d14, d15, \ofmt - -vld1.8 {q7}, [r12]! @ second line of luma -compute_16pxr11, d14, d15, \ofmt -.endm - .macro load_args_nvx push{r4-r12, lr} vpush {q4-q7} @@ -127,13 +117,9 @@ ldr r10,[sp, #128] @ r10 = y_coeff vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) -add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) -lsl r3, r3, #1 -lsl r5, r5, #1 -sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) +sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) +sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) +sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm .macro load_args_yuv420p @@ -142,26 +128,6 @@ ldr r4, [sp, #104] @ r4 = srcY ldr r5, [sp, #108] @ r5 = linesizeY ldr r6, [sp, #112] @ r6 = srcU -ldr r8, [sp, #128] @ r8 = table -ldr r9, [sp, #132] @ r9 = y_offset -ldr r10,[sp, #136] @ r10 = y_coeff -vdup.16 d0, r10@ d0 = y_coeff -vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) -add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) -lsl r3, r3, #1 -lsl r5, r5, #1 -sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -ldr r10,[sp, #120] @ r10 = srcV -.endm - -.macro load_args_yuv422p -push{r4-r12, lr} -vpush {q4-q7} -ldr r4, [sp, #104] @ r4 = srcY -ldr r5, [sp, #108] @ r5 = linesizeY -ldr r6, [sp, #112] @ r6 = srcU ldr r7, [sp, #116] @ r7 = linesizeU ldr r12,[sp, #124] @ r12 = linesizeV ldr r8, [sp, #128] @ r8 = table @@ -176,6 +142,10 @@ ldr r10,[sp, #120] @ r10 = srcV .endm +.macro load_args_yuv422p +load_args_yuv420p +.endm + .macro declare_func ifmt ofmt function ff_\ifmt\()_to_\ofmt\()_neon, export=1 @@ -205,35 +175,30 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 vmov.i8 d10, #128 .ifc \ifmt,nv12 -pld [r12, #64*3] - vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 -process_2l_16px \ofmt +process_1l_16px \ofmt .endif .ifc \ifmt,nv21 -pld [r12, #64*3] - vld2.8 {d2, d3}, [r6]!@ q1
[FFmpeg-devel] [PATCH 04/10] swscale/arm/yuv2rgb: factorize lsl in load_args_yuv420p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 22864ec..4601a79 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -152,8 +152,7 @@ add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) lsl r3, r3, #1 lsl r5, r5, #1 -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) +sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) ldr r10,[sp, #120] @ r10 = srcV .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 03/10] swscale/arm/yuv2rgb: remove unused store of dst + linesize in load_args_yuv422p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 1 - 1 file changed, 1 deletion(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index aac0773..22864ec 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -171,7 +171,6 @@ ldr r10,[sp, #136] @ r10 = y_coeff vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table -add r11, r2, r3@ r11 = dst + linesize (dst2) sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its aarch64 counter part
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 154 +++--- 1 file changed, 69 insertions(+), 85 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index f77f534..03d15cb 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -21,90 +21,6 @@ #include "libavutil/arm/asm.S" - -.macro compute_premult half_u1, half_u2, half_v1, half_v2 -vmovd2, \half_u1 @ copy left q14 to left q1 -vmovd3, \half_u1 @ copy left q14 to right q1 -vmovd4, \half_u2 @ copy right q14 to left q2 -vmovd5, \half_u2 @ copy right q14 to right q2 - -vmovd6, \half_v1 @ copy left q15 to left q3 -vmovd7, \half_v1 @ copy left q15 to right q3 -vmovd8, \half_v2 @ copy right q15 to left q4 -vmovd9, \half_v2 @ copy right q15 to right q4 - -vzip.16 d2, d3 @ U1U1U2U2U3U3U4U4 -vzip.16 d4, d5 @ U5U5U6U6U7U7U8U8 - -vzip.16 d6, d7 @ V1V1V2V2V3V3V4V4 -vzip.16 d8, d9 @ V5V5V6V6V7V7V8V8 - -vmul.s16q8, q3, d1[0] @ V * v2r (left, red) -vmul.s16q9, q4, d1[0] @ V * v2r (right, red) -vmul.s16q10, q1, d1[1] @ U * u2g -vmul.s16q11, q2, d1[1] @ U * u2g -vmla.s16q10, q3, d1[2] @ U * u2g + V * v2g (left, green) -vmla.s16q11, q4, d1[2] @ U * u2g + V * v2g (right, green) -vmul.s16q12, q1, d1[3] @ U * u2b (left, blue) -vmul.s16q13, q2, d1[3] @ U * u2b (right, blue) -.endm - -.macro compute_color dst_comp1 dst_comp2 pre1 pre2 -vadd.s16q1, q14, \pre1 -vadd.s16q2, q15, \pre2 -vqrshrun.s16\dst_comp1, q1, #6 -vqrshrun.s16\dst_comp2, q2, #6 -.endm - -.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 -compute_color \r1, \r2, q8, q9 -compute_color \g1, \g2, q10, q11 -compute_color \b1, \b2, q12, q13 -vmov.u8 \a1, #255 -vmov.u8 \a2, #255 -.endm - -.macro compute_16px dst y0 y1 ofmt -vmovl.u8q14, \y0 @ 8px of y -vmovl.u8q15, \y1 @ 8px of y - -vdup.16 q5, r9 @ q5 = y_offset -vmovd14, d0@ q7 = y_coeff -vmovd15, d0@ q7 = y_coeff - -vsub.s16q14, q5 -vsub.s16q15, q5 - -vmul.s16q14, q7@ q14 = (srcY - y_offset) * y_coeff (left) -vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right) - - -.ifc \ofmt,argb -compute_rgbad7, d11, d8, d12, d9, d13, d6, d10 -.endif - -.ifc \ofmt,rgba -compute_rgbad6, d10, d7, d11, d8, d12, d9, d13 -.endif - -.ifc \ofmt,abgr -compute_rgbad9, d13, d8, d12, d7, d11, d6, d10 -.endif - -.ifc \ofmt,bgra -compute_rgbad8, d12, d7, d11, d6, d10, d9, d13 -.endif -vst4.8 {q3, q4}, [\dst,:128]! -vst4.8 {q5, q6}, [\dst,:128]! - -.endm - -.macro process_1l_16px ofmt -compute_premult d28, d29, d30, d31 -vld1.8 {q7}, [r4]! -compute_16pxr2, d14, d15, \ofmt -.endm - .macro load_args_nv12 push{r4-r12, lr} vpush {q4-q7} @@ -198,6 +114,21 @@ add r10,r10,r12@ srcV += paddingV .endm +.macro compute_color dst_comp1 dst_comp2 pre1 pre2 +vadd.s16q1, q14, \pre1 +vadd.s16q2, q15, \pre2 +vqrshrun.s16\dst_comp1, q1, #6 +vqrshrun.s16\dst_comp2, q2, #6 +.endm + +.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 +compute_color \r1, \r2, q8, q9 +compute_color \g1, \g2, q10, q11 +compute_color \b1, \b2, q12, q13 +
[FFmpeg-devel] [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path
From: Matthieu Bouron --- libswscale/arm/swscale_unscaled.c | 72 -- libswscale/arm/yuv2rgb_neon.S | 156 -- 2 files changed, 66 insertions(+), 162 deletions(-) diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c index 8aa933c..149208c 100644 --- a/libswscale/arm/swscale_unscaled.c +++ b/libswscale/arm/swscale_unscaled.c @@ -61,14 +61,14 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[ return 0; } -#define YUV_TO_RGB_TABLE(precision) \ -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), \ -c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), \ - -#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) \ -int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, \ +#define YUV_TO_RGB_TABLE \ +c->yuv2rgb_v2r_coeff / (1 << 7), \ +c->yuv2rgb_u2g_coeff / (1 << 7), \ +c->yuv2rgb_v2g_coeff / (1 << 7), \ +c->yuv2rgb_u2b_coeff / (1 << 7), \ + +#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt) \ +int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ uint8_t *dst, int linesize, \ const uint8_t *srcY, int linesizeY, \ const uint8_t *srcU, int linesizeU, \ @@ -77,37 +77,34 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, int y_offset, \ int y_coeff); \ \ -static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, const uint8_t *src[], \ +static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \ int srcStride[], int srcSliceY, int srcSliceH, \ uint8_t *dst[], int dstStride[]) { \ -const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; \ +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \ \ -ff_##ifmt##_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, \ +ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \ dst[0] + srcSliceY * dstStride[0], dstStride[0], \ src[0], srcStride[0], \ src[1], srcStride[1], \ src[2], srcStride[2], \ yuv2rgb_table, \ c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / ((precision) == 16 ? 1 << 7 : 1));\ + c->yuv2rgb_y_coeff / (1 << 7)); \ \ return 0; \ } \ -#define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr, precision) \ -DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, b
[FFmpeg-devel] [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
From: Matthieu Bouron --- libswscale/arm/swscale_unscaled.c | 16 +++ libswscale/arm/yuv2rgb_neon.S | 89 +-- 2 files changed, 47 insertions(+), 58 deletions(-) diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c index 149208c..1986d65 100644 --- a/libswscale/arm/swscale_unscaled.c +++ b/libswscale/arm/swscale_unscaled.c @@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[ } #define YUV_TO_RGB_TABLE \ -c->yuv2rgb_v2r_coeff / (1 << 7), \ -c->yuv2rgb_u2g_coeff / (1 << 7), \ -c->yuv2rgb_v2g_coeff / (1 << 7), \ -c->yuv2rgb_u2b_coeff / (1 << 7), \ +c->yuv2rgb_v2r_coeff, \ +c->yuv2rgb_u2g_coeff, \ +c->yuv2rgb_v2g_coeff, \ +c->yuv2rgb_u2b_coeff, \ #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt) \ int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \ @@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], src[1], srcStride[1], \ src[2], srcStride[2], \ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7)); \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ @@ -121,8 +121,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], dst[0] + srcSliceY * dstStride[0], dstStride[0], \ src[0], srcStride[0], src[1], srcStride[1], \ yuv2rgb_table, \ - c->yuv2rgb_y_offset >> 9, \ - c->yuv2rgb_y_coeff / (1 << 7)); \ + c->yuv2rgb_y_offset >> 6, \ + c->yuv2rgb_y_coeff); \ \ return 0; \ } \ diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index fe5dd04..9345bae 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -68,14 +68,14 @@ .macro load_chroma_nv12 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line -vsubl.u8q14, d2, d10 @ q14 = U - 128 -vsubl.u8q15, d3, d10 @ q15 = V - 128 +vshll.u8q14, d2, #3@ q14 = U * (1 << 3) +vshll.u8q15, d3, #3@ q15 = V * (1 << 3) .endm .macro load_chroma_nv21 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line -vsubl.u8q14, d3, d10 @ q14 = U - 128 -vsubl.u8q15, d2, d10 @ q15 = V - 128 +vshll.u8q14, d3, #3@ q14 = U * (1 << 3) +vshll.u8q15, d2, #3
[FFmpeg-devel] [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 115 ++ 1 file changed, 39 insertions(+), 76 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 8abb986..f77f534 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -105,7 +105,7 @@ compute_16pxr2, d14, d15, \ofmt .endm -.macro load_args_nvx +.macro load_args_nv12 push{r4-r12, lr} vpush {q4-q7} ldr r4, [sp, #104] @ r4 = srcY @@ -122,6 +122,10 @@ sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm +.macro load_args_nv21 +load_args_nv12 +.endm + .macro load_args_yuv420p push{r4-r12, lr} vpush {q4-q7} @@ -146,113 +150,72 @@ load_args_yuv420p .endm -.macro declare_func ifmt ofmt -function ff_\ifmt\()_to_\ofmt\()_neon, export=1 - -.ifc \ifmt,nv12 -load_args_nvx -.endif - -.ifc \ifmt,nv21 -load_args_nvx -.endif - -.ifc \ifmt,yuv420p -load_args_yuv420p -.endif - - -.ifc \ifmt,yuv422p -load_args_yuv422p -.endif - -1: -mov r8, r0 @ r8 = width -2: -pld [r6, #64*3] -pld [r4, #64*3] - -vmov.i8 d10, #128 - -.ifc \ifmt,nv12 +.macro load_chroma_nv12 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -.ifc \ifmt,nv21 +.macro load_chroma_nv21 vld2.8 {d2, d3}, [r6]!@ q1: interleaved chroma line vsubl.u8q14, d3, d10 @ q14 = U - 128 vsubl.u8q15, d2, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -.ifc \ifmt,yuv420p -pld [r10, #64*3] - -vld1.8 d2, [r6]! @ d2: chroma red line -vld1.8 d3, [r10]! @ d3: chroma blue line -vsubl.u8q14, d2, d10 @ q14 = U - 128 -vsubl.u8q15, d3, d10 @ q15 = V - 128 - -process_1l_16px \ofmt -.endif - -.ifc \ifmt,yuv422p +.macro load_chroma_yuv420p pld [r10, #64*3] vld1.8 d2, [r6]! @ d2: chroma red line vld1.8 d3, [r10]! @ d3: chroma blue line vsubl.u8q14, d2, d10 @ q14 = U - 128 vsubl.u8q15, d3, d10 @ q15 = V - 128 +.endm -process_1l_16px \ofmt -.endif - -subsr8, r8, #16@ width -= 16 -bgt 2b - -add r2, r2, r3 @ dst += padding -add r4, r4, r5 @ srcY += paddingY - -.ifc \ifmt,nv12 -tst r1, #1 -subeq r6, r6, r0 @ if (height % 2 == 0) paddingU -= width -addne r6, r7 @ else paddingU += linesizeU - width - -subsr1, r1, #1 @ height -= 1 -.endif +.macro load_chroma_yuv422p +load_chroma_yuv420p +.endm -.ifc \ifmt,nv21 +.macro increment_nv12 tst r1, #1 subeq r6, r6, r0 @ if (height % 2 == 0) paddingU -= width addne r6, r7 @ else paddingU += linesizeU - width +.endm -subsr1, r1, #1 @ height -= 1 -.endif +.macro increment_nv21 +increment_nv12 +.endm -.ifc \ifmt,yuv420p +.macro increment_yuv420p tst r1, #1 subeq r6, r6, r0, lsr #1 @ if (height % 2 == 0) paddingU -= (width / 2) addne r6, r7 @ else paddingU += linesizeU - (width / 2) subeq r10, r10, r0, lsr #1 @ if (height % 2 == 0) paddingU -= (width / 2) addne r10, r12 @ else paddingV = linesizeV - (width / 2) +.endm -subsr1, r1, #1 @ height -= 1 -.endif
[FFmpeg-devel] [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the compute_rgba macro
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 03d15cb..fe5dd04 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -121,7 +121,7 @@ vqrshrun.s16\dst_comp2, q2, #6 .endm -.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2 +.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2 compute_color \r1, \r2, q8, q9 compute_color \g1, \g2, q10, q11 compute_color \b1, \b2, q12, q13 @@ -176,19 +176,19 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1 vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right) .ifc \ofmt,argb -compute_rgbad7, d11, d8, d12, d9, d13, d6, d10 +compute_rgbad7, d8, d9, d6, d11, d12, d13, d10 .endif .ifc \ofmt,rgba -compute_rgbad6, d10, d7, d11, d8, d12, d9, d13 +compute_rgbad6, d7, d8, d9, d10, d11, d12, d13 .endif .ifc \ofmt,abgr -compute_rgbad9, d13, d8, d12, d7, d11, d6, d10 +compute_rgbad9, d8, d7, d6, d13, d12, d11, d10 .endif .ifc \ofmt,bgra -compute_rgbad8, d12, d7, d11, d6, d10, d9, d13 +compute_rgbad8, d7, d6, d9, d12, d11, d10, d13 .endif vst4.8 {q3, q4}, [r2,:128]! -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 02/10] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index f40327b..aac0773 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -172,11 +172,10 @@ vdup.16 d0, r10@ d0 = y_coeff vld1.16 {d1}, [r8] @ d1 = *table add r11, r2, r3@ r11 = dst + linesize (dst2) -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) -sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) -sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) -sub r12,r12,r0, lsr #1 @ r12 = linesizeV- width / 2 (paddingV) +sub r3, r3, r0, lsl #2 @ r3 = linesize - width * 4 (padding) +sub r5, r5, r0 @ r5 = linesizeY - width (paddingY) +sub r7, r7, r0, lsr #1 @ r7 = linesizeU - width / 2 (paddingU) +sub r12,r12,r0, lsr #1 @ r12 = linesizeV - width / 2 (paddingV) ldr r10,[sp, #120] @ r10 = srcV .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 05/10] swscale/arm/yuv2rgb: factorize lsl in load_args_nvx
From: Matthieu Bouron --- libswscale/arm/yuv2rgb_neon.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S index 4601a79..ef7b0a6 100644 --- a/libswscale/arm/yuv2rgb_neon.S +++ b/libswscale/arm/yuv2rgb_neon.S @@ -131,8 +131,7 @@ add r12, r4, r5@ r12 = srcY + linesizeY (srcY2) lsl r3, r3, #1 lsl r5, r5, #1 -lsl r8, r0, #2 -sub r3, r3, r8 @ r3 = linesize * 2 - width * 4 (padding) +sub r3, r3, r0, lsl #2 @ r3 = linesize * 2 - width * 4 (padding) sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY) sub r7, r7, r0 @ r7 = linesizeC - width (paddingC) .endm -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part
The following patchset aims to make bitexact the yuv->rgba armv7 neon code path with the aarch64 one. It also aims to make the two code bases as close as possible. [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path The current 32bit code path which is unused is removed. [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time The code process only one line at a time for the yuv420p,nv12 and nv21 formats with no regression in performance observed on a rpi2 (I've even observed a slight increase of performance for the nv12 and nv21 formats). [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its The last patch of the serie makes the code bitexact with the aarch64 version. The increase of precision (which introduces a performance loss) is compensated by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. ./ffmpeg_g -nostats -f lavfi -i testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f null - without patchset : [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 with patchset: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.018846 Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support
On Tue, Mar 22, 2016 at 10:04 AM, Matthieu Bouron wrote: > > > On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> From: Matthieu Bouron >> >> --- >> >> Hello, >> >> The following patch add hwaccel support to the mediacodec (h264) decoder >> by allowing >> the user to render the output frames directly on a surface. >> >> In order to do so the user needs to initialize the hwaccel through the >> use of >> av_mediacodec_alloc_context and av_mediacodec_default_init functions. The >> later >> takes a reference to an android/view/Surface as parameter. >> >> If the hwaccel successfully initialize, the decoder output frames pix fmt >> will be >> AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to >> render >> the frames on the surface: >> >> AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3]; >> av_mediacodec_release_buffer(buffer, 1); >> >> The last argument of av_mediacodec_release_buffer enable rendering of the >> buffer on the surface (or not if set to 0). >> >> Regarding the internal changes in the mediacodec decoder: >> >> MediaCodec.flush() discards both input and output buffers meaning that if >> MediaCodec.flush() is called all output buffers the user has a reference >> on are >> now invalid (and cannot be used). >> This behaviour does not fit well in the avcodec API. >> >> When the decoder is configured to output software buffers, there is no >> issue as >> the buffers are copied. >> >> Now when the decoder is configured to output to a surface, the user might >> not >> want to render all the frames as fast as the decoder can go and might >> want to >> control *when* the frame are rendered, so we need to make sure that the >> MediaCodec.flush() call is delayed until all the frames the user retains >> has >> been released or rendered. >> >> Delaying the call to MediaCodec.flush() means buffering any inputs that >> come >> the decoder until the user has released/renderer the frame he retains. >> >> This is a limitation of this hwaccel implementation, if the user retains a >> frame (a), then issue a flush command to the decoder, the packets he >> feeds to >> the decoder at that point will be queued in the internal decoder packet >> queue >> (until he releases the frame (a)). This scenario leads to a memory usage >> increase to say the least. >> >> Currently there is no limitation on the size of the internal decoder >> packet >> queue but this is something that can be added easily. Then, if the queue >> is >> full, what would be the behaviour of the decoder ? Can it block ? Or >> should it >> returns something like AVERROR(EAGAIN) ? >> >> About the other internal decoder changes I introduced: >> >> The MediaCodecDecContext is now refcounted (using the lavu/atomic api) >> since >> the (hwaccel) frames can be retained by the user, we need to delay the >> destruction of the codec until the user has released all the frames he >> has a >> reference on. >> The reference counter of the MediaCodecDecContext is incremented each >> time an >> (hwaccel) frame is outputted by the decoder and decremented each time a >> (hwaccel) frame is released. >> >> Also, when the decoder is configured to output to a surface the pts that >> are >> given to the MediaCodec API are now rescaled based on the codec_timebase >> as >> those timestamps values are propagated to the frames rendered on the >> surface >> since Android M. Not sure if it's really useful though. >> >> On the performance side: >> >> On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps: >> - software output + rgba conversion goes at 59~60fps >> - surface output + render on a surface goes at 100~110fps >> >> > [...] > > Patch updated with the following differences: > * the public mediacodec api is now always built (not only when > mediacodec is available) (and the build when mediacodec is not available > has been fixed) > * the documentation of av_mediacodec_release_buffer has been improved a > bit > Patch updated with the following differences: MediaCodecBuffer->released type is now a volatile int (instead of a int*) MediaCodecContext->refcount type is now a volatile int (instead of a int*) Matthieu [...] From fdbc9e38816be8ce3af2d4a85383203588f1dd7a Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 11 Mar
Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support
On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron wrote: > From: Matthieu Bouron > > --- > > Hello, > > The following patch add hwaccel support to the mediacodec (h264) decoder > by allowing > the user to render the output frames directly on a surface. > > In order to do so the user needs to initialize the hwaccel through the use > of > av_mediacodec_alloc_context and av_mediacodec_default_init functions. The > later > takes a reference to an android/view/Surface as parameter. > > If the hwaccel successfully initialize, the decoder output frames pix fmt > will be > AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to > render > the frames on the surface: > > AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3]; > av_mediacodec_release_buffer(buffer, 1); > > The last argument of av_mediacodec_release_buffer enable rendering of the > buffer on the surface (or not if set to 0). > > Regarding the internal changes in the mediacodec decoder: > > MediaCodec.flush() discards both input and output buffers meaning that if > MediaCodec.flush() is called all output buffers the user has a reference > on are > now invalid (and cannot be used). > This behaviour does not fit well in the avcodec API. > > When the decoder is configured to output software buffers, there is no > issue as > the buffers are copied. > > Now when the decoder is configured to output to a surface, the user might > not > want to render all the frames as fast as the decoder can go and might want > to > control *when* the frame are rendered, so we need to make sure that the > MediaCodec.flush() call is delayed until all the frames the user retains > has > been released or rendered. > > Delaying the call to MediaCodec.flush() means buffering any inputs that > come > the decoder until the user has released/renderer the frame he retains. > > This is a limitation of this hwaccel implementation, if the user retains a > frame (a), then issue a flush command to the decoder, the packets he feeds > to > the decoder at that point will be queued in the internal decoder packet > queue > (until he releases the frame (a)). This scenario leads to a memory usage > increase to say the least. > > Currently there is no limitation on the size of the internal decoder packet > queue but this is something that can be added easily. Then, if the queue is > full, what would be the behaviour of the decoder ? Can it block ? Or > should it > returns something like AVERROR(EAGAIN) ? > > About the other internal decoder changes I introduced: > > The MediaCodecDecContext is now refcounted (using the lavu/atomic api) > since > the (hwaccel) frames can be retained by the user, we need to delay the > destruction of the codec until the user has released all the frames he has > a > reference on. > The reference counter of the MediaCodecDecContext is incremented each time > an > (hwaccel) frame is outputted by the decoder and decremented each time a > (hwaccel) frame is released. > > Also, when the decoder is configured to output to a surface the pts that > are > given to the MediaCodec API are now rescaled based on the codec_timebase as > those timestamps values are propagated to the frames rendered on the > surface > since Android M. Not sure if it's really useful though. > > On the performance side: > > On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps: > - software output + rgba conversion goes at 59~60fps > - surface output + render on a surface goes at 100~110fps > > [...] Patch updated with the following differences: * the public mediacodec api is now always built (not only when mediacodec is available) (and the build when mediacodec is not available has been fixed) * the documentation of av_mediacodec_release_buffer has been improved a bit The development branch is located here: https://github.com/mbouron/FFmpeg/tree/feature/mediacodec-hwaccel From 26b21e16a93e6580ee75cc94d71fca23c111ad5b Mon Sep 17 00:00:00 2001 From: Matthieu Bouron Date: Fri, 11 Mar 2016 17:21:04 +0100 Subject: [PATCH] lavc: add mediacodec hwaccel support --- configure | 1 + libavcodec/Makefile | 6 +- libavcodec/allcodecs.c | 1 + libavcodec/mediacodec.c | 133 libavcodec/mediacodec.h | 88 + libavcodec/mediacodec_surface.c | 66 ++ libavcodec/mediacodec_surface.h | 31 + libavcodec/mediacodec_wrapper.c | 5 +- libavcodec/mediacodecdec.c | 272 +--- libavcodec/mediacodecdec.h | 17 +++ libavcodec/mediacodecdec_h264.c | 23 libavutil/pixdesc.c | 4 + libavutil/pixfmt.h
[FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support
From: Matthieu Bouron --- Hello, The following patch add hwaccel support to the mediacodec (h264) decoder by allowing the user to render the output frames directly on a surface. In order to do so the user needs to initialize the hwaccel through the use of av_mediacodec_alloc_context and av_mediacodec_default_init functions. The later takes a reference to an android/view/Surface as parameter. If the hwaccel successfully initialize, the decoder output frames pix fmt will be AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to render the frames on the surface: AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3]; av_mediacodec_release_buffer(buffer, 1); The last argument of av_mediacodec_release_buffer enable rendering of the buffer on the surface (or not if set to 0). Regarding the internal changes in the mediacodec decoder: MediaCodec.flush() discards both input and output buffers meaning that if MediaCodec.flush() is called all output buffers the user has a reference on are now invalid (and cannot be used). This behaviour does not fit well in the avcodec API. When the decoder is configured to output software buffers, there is no issue as the buffers are copied. Now when the decoder is configured to output to a surface, the user might not want to render all the frames as fast as the decoder can go and might want to control *when* the frame are rendered, so we need to make sure that the MediaCodec.flush() call is delayed until all the frames the user retains has been released or rendered. Delaying the call to MediaCodec.flush() means buffering any inputs that come the decoder until the user has released/renderer the frame he retains. This is a limitation of this hwaccel implementation, if the user retains a frame (a), then issue a flush command to the decoder, the packets he feeds to the decoder at that point will be queued in the internal decoder packet queue (until he releases the frame (a)). This scenario leads to a memory usage increase to say the least. Currently there is no limitation on the size of the internal decoder packet queue but this is something that can be added easily. Then, if the queue is full, what would be the behaviour of the decoder ? Can it block ? Or should it returns something like AVERROR(EAGAIN) ? About the other internal decoder changes I introduced: The MediaCodecDecContext is now refcounted (using the lavu/atomic api) since the (hwaccel) frames can be retained by the user, we need to delay the destruction of the codec until the user has released all the frames he has a reference on. The reference counter of the MediaCodecDecContext is incremented each time an (hwaccel) frame is outputted by the decoder and decremented each time a (hwaccel) frame is released. Also, when the decoder is configured to output to a surface the pts that are given to the MediaCodec API are now rescaled based on the codec_timebase as those timestamps values are propagated to the frames rendered on the surface since Android M. Not sure if it's really useful though. On the performance side: On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps: - software output + rgba conversion goes at 59~60fps - surface output + render on a surface goes at 100~110fps Matthieu --- configure | 1 + libavcodec/Makefile | 6 +- libavcodec/allcodecs.c | 1 + libavcodec/mediacodec.c | 125 ++ libavcodec/mediacodec.h | 85 + libavcodec/mediacodec_surface.c | 66 ++ libavcodec/mediacodec_surface.h | 31 + libavcodec/mediacodec_wrapper.c | 5 +- libavcodec/mediacodecdec.c | 272 +--- libavcodec/mediacodecdec.h | 17 +++ libavcodec/mediacodecdec_h264.c | 23 libavutil/pixdesc.c | 4 + libavutil/pixfmt.h | 2 + 13 files changed, 586 insertions(+), 52 deletions(-) create mode 100644 libavcodec/mediacodec.c create mode 100644 libavcodec/mediacodec.h create mode 100644 libavcodec/mediacodec_surface.c create mode 100644 libavcodec/mediacodec_surface.h diff --git a/configure b/configure index e5de306..4d66673 100755 --- a/configure +++ b/configure @@ -2530,6 +2530,7 @@ h264_d3d11va_hwaccel_select="h264_decoder" h264_dxva2_hwaccel_deps="dxva2" h264_dxva2_hwaccel_select="h264_decoder" h264_mediacodec_decoder_deps="mediacodec" +h264_mediacodec_hwaccel_deps="mediacodec" h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser" h264_mmal_decoder_deps="mmal" h264_mmal_decoder_select="mmal" diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 6bb1af1..a3dad7e 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -10,6 +10,7 @@ HEADERS = avcodec.h \
Re: [FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM
On Sun, Mar 13, 2016 at 08:48:21PM +0100, Matthieu Bouron wrote: > On Fri, Mar 11, 2016 at 09:36:41PM +0100, Matthieu Bouron wrote: [...] > > If nobody objects, I will push the patch (with #include removed) > tomorrow. > Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM
On Fri, Mar 11, 2016 at 09:36:41PM +0100, Matthieu Bouron wrote: > From: Matthieu Bouron > > Android N will prevent users from loading non-public APIs. > > Users should only rely on the av_jni_set_java_vm function to set the > Java VM. > --- > libavcodec/ffjni.c | 88 > ++ > 1 file changed, 3 insertions(+), 85 deletions(-) > > diff --git a/libavcodec/ffjni.c b/libavcodec/ffjni.c > index da13699..54f3122 100644 > --- a/libavcodec/ffjni.c > +++ b/libavcodec/ffjni.c > @@ -35,80 +35,6 @@ > static JavaVM *java_vm = NULL; > static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; > > -/** > - * Check if JniInvocation has been initialized. Only available on > - * Android >= 4.4. > - * > - * @param log_ctx context used for logging, can be NULL > - * @return 0 on success, < 0 otherwise > - */ > -static int check_jni_invocation(void *log_ctx) > -{ > -int ret = AVERROR_EXTERNAL; > -void *handle = NULL; > -void **jni_invocation = NULL; > - > -handle = dlopen(NULL, RTLD_LOCAL); > -if (!handle) { > -goto done; > -} > - > -jni_invocation = (void **)dlsym(handle, > "_ZN13JniInvocation15jni_invocation_E"); > -if (!jni_invocation) { > -av_log(log_ctx, AV_LOG_ERROR, "Could not find > JniInvocation::jni_invocation_ symbol\n"); > -goto done; > -} > - > -ret = !(jni_invocation != NULL && *jni_invocation != NULL); > - > -done: > -if (handle) { > -dlclose(handle); > -} > - > -return ret; > -} > - > -/** > - * Return created Java virtual machine using private JNI_GetCreatedJavaVMs > - * function from the specified library name. > - * > - * @param name library name used for symbol lookups, can be NULL > - * @param log_ctx context used for logging, can be NULL > - * @return the current Java virtual machine in use > - */ > -static JavaVM *get_java_vm(const char *name, void *log_ctx) > -{ > -JavaVM *vm = NULL; > -jsize nb_vm = 0; > - > -void *handle = NULL; > -jint (*get_created_java_vms) (JavaVM ** vmBuf, jsize bufLen, jsize > *nVMs) = NULL; > - > -handle = dlopen(name, RTLD_LOCAL); > -if (!handle) { > -return NULL; > -} > - > -get_created_java_vms = (jint (*)(JavaVM **, jsize, jsize *)) > dlsym(handle, "JNI_GetCreatedJavaVMs"); > -if (!get_created_java_vms) { > -av_log(log_ctx, AV_LOG_ERROR, "Could not find JNI_GetCreatedJavaVMs > symbol in library '%s'\n", name); > -goto done; > -} > - > -if (get_created_java_vms(&vm, 1, &nb_vm) != JNI_OK) { > -av_log(log_ctx, AV_LOG_ERROR, "Could not get created Java virtual > machines\n"); > -goto done; > -} > - > -done: > -if (handle) { > -dlclose(handle); > -} > - > -return vm; > -} > - > JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx) > { > int ret = 0; > @@ -117,21 +43,13 @@ JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx) > *attached = 0; > > pthread_mutex_lock(&lock); > -if (java_vm == NULL && (java_vm = av_jni_get_java_vm(log_ctx)) == NULL) { > - > -av_log(log_ctx, AV_LOG_INFO, "Retrieving current Java virtual > machine using Android JniInvocation wrapper\n"); > -if (check_jni_invocation(log_ctx) == 0) { > -if ((java_vm = get_java_vm(NULL, log_ctx)) != NULL || > -(java_vm = get_java_vm("libdvm.so", log_ctx)) != NULL || > -(java_vm = get_java_vm("libart.so", log_ctx)) != NULL) { > -av_log(log_ctx, AV_LOG_INFO, "Found Java virtual machine > using Android JniInvocation wrapper\n"); > -} > -} > +if (java_vm == NULL) { > +java_vm = av_jni_get_java_vm(log_ctx); > } > pthread_mutex_unlock(&lock); > > if (!java_vm) { > -av_log(log_ctx, AV_LOG_ERROR, "Could not retrieve a Java virtual > machine\n"); > +av_log(log_ctx, AV_LOG_ERROR, "No Java virtual machine has been > registered\n"); > return NULL; > } > If nobody objects, I will push the patch (with #include removed) tomorrow. Matthieu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM
From: Matthieu Bouron Android N will prevent users from loading non-public APIs. Users should only rely on the av_jni_set_java_vm function to set the Java VM. --- libavcodec/ffjni.c | 88 ++ 1 file changed, 3 insertions(+), 85 deletions(-) diff --git a/libavcodec/ffjni.c b/libavcodec/ffjni.c index da13699..54f3122 100644 --- a/libavcodec/ffjni.c +++ b/libavcodec/ffjni.c @@ -35,80 +35,6 @@ static JavaVM *java_vm = NULL; static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; -/** - * Check if JniInvocation has been initialized. Only available on - * Android >= 4.4. - * - * @param log_ctx context used for logging, can be NULL - * @return 0 on success, < 0 otherwise - */ -static int check_jni_invocation(void *log_ctx) -{ -int ret = AVERROR_EXTERNAL; -void *handle = NULL; -void **jni_invocation = NULL; - -handle = dlopen(NULL, RTLD_LOCAL); -if (!handle) { -goto done; -} - -jni_invocation = (void **)dlsym(handle, "_ZN13JniInvocation15jni_invocation_E"); -if (!jni_invocation) { -av_log(log_ctx, AV_LOG_ERROR, "Could not find JniInvocation::jni_invocation_ symbol\n"); -goto done; -} - -ret = !(jni_invocation != NULL && *jni_invocation != NULL); - -done: -if (handle) { -dlclose(handle); -} - -return ret; -} - -/** - * Return created Java virtual machine using private JNI_GetCreatedJavaVMs - * function from the specified library name. - * - * @param name library name used for symbol lookups, can be NULL - * @param log_ctx context used for logging, can be NULL - * @return the current Java virtual machine in use - */ -static JavaVM *get_java_vm(const char *name, void *log_ctx) -{ -JavaVM *vm = NULL; -jsize nb_vm = 0; - -void *handle = NULL; -jint (*get_created_java_vms) (JavaVM ** vmBuf, jsize bufLen, jsize *nVMs) = NULL; - -handle = dlopen(name, RTLD_LOCAL); -if (!handle) { -return NULL; -} - -get_created_java_vms = (jint (*)(JavaVM **, jsize, jsize *)) dlsym(handle, "JNI_GetCreatedJavaVMs"); -if (!get_created_java_vms) { -av_log(log_ctx, AV_LOG_ERROR, "Could not find JNI_GetCreatedJavaVMs symbol in library '%s'\n", name); -goto done; -} - -if (get_created_java_vms(&vm, 1, &nb_vm) != JNI_OK) { -av_log(log_ctx, AV_LOG_ERROR, "Could not get created Java virtual machines\n"); -goto done; -} - -done: -if (handle) { -dlclose(handle); -} - -return vm; -} - JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx) { int ret = 0; @@ -117,21 +43,13 @@ JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx) *attached = 0; pthread_mutex_lock(&lock); -if (java_vm == NULL && (java_vm = av_jni_get_java_vm(log_ctx)) == NULL) { - -av_log(log_ctx, AV_LOG_INFO, "Retrieving current Java virtual machine using Android JniInvocation wrapper\n"); -if (check_jni_invocation(log_ctx) == 0) { -if ((java_vm = get_java_vm(NULL, log_ctx)) != NULL || -(java_vm = get_java_vm("libdvm.so", log_ctx)) != NULL || -(java_vm = get_java_vm("libart.so", log_ctx)) != NULL) { -av_log(log_ctx, AV_LOG_INFO, "Found Java virtual machine using Android JniInvocation wrapper\n"); -} -} +if (java_vm == NULL) { +java_vm = av_jni_get_java_vm(log_ctx); } pthread_mutex_unlock(&lock); if (!java_vm) { -av_log(log_ctx, AV_LOG_ERROR, "Could not retrieve a Java virtual machine\n"); +av_log(log_ctx, AV_LOG_ERROR, "No Java virtual machine has been registered\n"); return NULL; } -- 2.7.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavf/img2dec: disable parsing if frame_size is specified
From: Matthieu Bouron --- Hello, The following patch disable parsing if the frame_size option is specified. The main purpose here is to disable the use of parsers (which have a huge performance cost on embedded platforms) for single images when their size is known in advance. The patch sounds hackish to me though, but others might consider it OK (or not). The performance of the jpeg parser still need to be addressed at some point. Matthieu --- libavformat/img2dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavformat/img2dec.c b/libavformat/img2dec.c index fe0e346..9aa6dd7 100644 --- a/libavformat/img2dec.c +++ b/libavformat/img2dec.c @@ -206,7 +206,7 @@ int ff_img_read_header(AVFormatContext *s1) s->is_pipe = 0; else { s->is_pipe = 1; -st->need_parsing = AVSTREAM_PARSE_FULL; +st->need_parsing = s->frame_size > 0 ? AVSTREAM_PARSE_NONE : AVSTREAM_PARSE_FULL; } if (s->ts_from_file == 2) { -- 2.7.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/mjpegdec: avoid unneeded allocation if the frame is to be skipped
On Tue, Mar 01, 2016 at 08:53:33PM +0100, Paul B Mahol wrote: > On 3/1/16, Matthieu Bouron wrote: > > From: Matthieu Bouron > > > > --- > > libavcodec/mjpegdec.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > probbably ok Pushed. Thanks. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] lavc: add h264 mediacodec decoder
On Thu, Mar 03, 2016 at 02:03:01PM +0100, Matthieu Bouron wrote: [...] > > Patch updated with the following differences: > * ff_set_dimensions return code is now used > * add missing exception when trying to call the MediaCodec object > constructor > * remove leftover avctx_internal field from MediaCodecH264DecContext > * add ff_AMediaCodec_getName function > > The dev branch can be found here: > https://github.com/mbouron/FFmpeg/tree/feature/mediacodec-support-v7 > > If nobody objects I would like to push the patchset in 3 days. > Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc: add JNI support
On Thu, Mar 03, 2016 at 01:56:16PM +0100, Matthieu Bouron wrote: [...] > > New patch attached with the following differences: > * added myself as a maintainer of jni* and ffjni* > Pushed. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel