Re: [FFmpeg-devel] [PATCH] lavf/mov: ignore ctts entries that do not apply to a least one sample

2016-06-20 Thread Matthieu Bouron
On Fri, Jun 17, 2016 at 01:26:10AM +0200, Michael Niedermayer wrote:
> On Thu, Jun 16, 2016 at 05:26:14PM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > Fixes packet pts of samples which contain ctts entries with count=0.
> > ---
> > 
> > Hello,
> > 
> > The following patch fixes packet pts of samples which contain ctts values 
> > with
> > count=0 (so the ctts entry does not apply to any sample if I understand
> > correctly). Such samples are produced by a LG G4 phone. I don't have any
> > sample I can share at the moment (and thus no fate test following this patch
> > yet).
> > 
> > An alternative to this patch is to remove directly the entry when the ctts 
> > atom
> > is parsed. Would you prefer this alternative ?
> 
> i dont know what is preferred but i agree about either solution
> 
> removing them on load would avoid any issues with ctts_count > 0
> and no real entries, i dont know though if that ever matters

I've attached the alternative patch that removes the CTTS entries with
count <= 0 at parsing time. I think it's better in the end (I first liked
the idea to keep the ctts table as is in memory but after some thoughts I
think it's really useful). Anyway I'll go with whatever patch you prefer.

Matthieu

[...]
>From 3bf2a6a81b8cca09bee4c0b6ef6f6ce78e276f0d Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Thu, 16 Jun 2016 13:16:52 +0200
Subject: [PATCH] lavf/mov: ignore ctts that do not apply to a least one sample

Fixes packet pts of samples which contain ctts values with count <= 0.
---
 libavformat/mov.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 57a0354..8eab34c 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2574,7 +2574,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 {
 AVStream *st;
 MOVStreamContext *sc;
-unsigned int i, entries;
+unsigned int i, entries, ctts_count = 0;
 
 if (c->fc->nb_streams < 1)
 return 0;
@@ -2600,8 +2600,16 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 int count=avio_rb32(pb);
 int duration =avio_rb32(pb);
 
-sc->ctts_data[i].count   = count;
-sc->ctts_data[i].duration= duration;
+if (count <= 0) {
+av_log(c->fc, AV_LOG_TRACE,
+"ignoring CTTS entry with count=%d duration=%d\n",
+count, duration);
+continue;
+}
+
+sc->ctts_data[ctts_count].count= count;
+sc->ctts_data[ctts_count].duration = duration;
+ctts_count++;
 
 av_log(c->fc, AV_LOG_TRACE, "count=%d, duration=%d\n",
 count, duration);
@@ -2617,7 +2625,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, MOVAtom atom)
 mov_update_dts_shift(sc, duration);
 }
 
-sc->ctts_count = i;
+sc->ctts_count = ctts_count;
 
 if (pb->eof_reached)
 return AVERROR_EOF;
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability

2016-06-20 Thread Matthieu Bouron
On Sun, Jun 19, 2016 at 06:01:49PM +0200, Matthieu Bouron wrote:
> On Fri, Jun 17, 2016 at 09:47:35AM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils
> > to overwrite the field with incorrect values as the decoder is
> > asynchronous.
> 
> If there is no objection, I will push the patch in one day.

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS

2016-06-19 Thread Matthieu Bouron
On Mon, Jun 13, 2016 at 02:37:29PM +0200, Matthieu Bouron wrote:
> On Mon, Jun 13, 2016 at 12:23:07PM +0200, Hendrik Leppkes wrote:
> > On Mon, Jun 13, 2016 at 11:51 AM, Matthieu Bouron
> >  wrote:
> > > On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote:
> > >> From: Matthieu Bouron 
> > >>
> > >> Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
> > >> to be set in their respective csd-{0,1} buffers.
> > >> ---
> > >>
> > >> Hello,
> > >>
> > >> The attached patch fixes playback of HLS streams on MediaTek devices 
> > >> which
> > >> requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead 
> > >> of
> > >> having sps+pps in the csd-0 which works on other devices).
> > >>
> > >> I'm not sure if I can use the ff_h264_decode_extradata this way (or at 
> > >> least
> > >> initialize the H264Context with zeroes minus the avctx field).
> > >
> > > Rebased patch (after the h264 ps merged) attached.
> > >
> > > I still have the same question, is my use of
> > > H264Context + ff_h264_decode_extradata correct ?
> > >
> > 
> > Using H264 decoder internals seems to be a rather unfortunate
> > solution, as its prone to breakage, often subtle, as the h264 decoder
> > gets changed and not all inter-module dependencies are known.
> > So if possible at all, not using something that uses H264Context for
> > example would be nice.
> > 
> > For the record, ff_h264_decode_extradata is scheduled for refactoring
> > to make it independent of H264Context so it can be more easily shared
> > with the h264 decoder and the h264 parser.
> > Once that is done, it may give you a cleaner interface to use it from
> > mediacodec as well.
> 
> Ok. I can wait for the refactor to be merged but the MediaCodec decoder
> will remain broken on those devices. I'm not too happy about that if a
> release is to be made in those following days. Do we have an ETA ?  I'm
> also not too happy to write the same parsing code as I did before for the
> AVCC format to split/extract the PPS/SPS.
> 
> Or ..., I can push this code (if its use is valid) and update it when the
> merge lands (I'm helping Clément with the merges, so I will take care
> about this part).

Updated patch attached (using the new ff_h264_decode_extradata API).

Matthieu
>From 30d70187e10f09231a59a255204c810d1662336b Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 10 Jun 2016 13:16:09 +0200
Subject: [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to
 extract PPS/SPS

Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
to be set in their respective csd-{0,1} buffers.
---
 configure   |   2 +-
 libavcodec/mediacodecdec_h264.c | 140 +---
 2 files changed, 30 insertions(+), 112 deletions(-)

diff --git a/configure b/configure
index a220fa1..eb08478 100755
--- a/configure
+++ b/configure
@@ -2548,7 +2548,7 @@ h264_d3d11va_hwaccel_select="h264_decoder"
 h264_dxva2_hwaccel_deps="dxva2"
 h264_dxva2_hwaccel_select="h264_decoder"
 h264_mediacodec_decoder_deps="mediacodec"
-h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser"
+h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser"
 h264_mmal_decoder_deps="mmal"
 h264_mmal_decoder_select="mmal"
 h264_mmal_hwaccel_deps="mmal"
diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 52e48ae..b63b395 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -32,6 +32,7 @@
 #include "libavutil/atomic.h"
 
 #include "avcodec.h"
+#include "h264.h"
 #include "internal.h"
 #include "mediacodecdec.h"
 #include "mediacodec_wrapper.h"
@@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext {
 
 } MediaCodecH264DecContext;
 
-static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx,
-uint8_t **extradata_annexb, int *extradata_annexb_size,
-int *sps_offset, int *sps_size,
-int *pps_offset, int *pps_size)
-{
-uint16_t unit_size;
-uint64_t total_size = 0;
-
-uint8_t i, j, unit_nb;
-uint8_t sps_seen = 0;
-uint8_t pps_seen = 0;
-
-const uint8_t *extradata;
-static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 };
-
-if (avctx->extradata_size < 8) {
-av_log(avctx, AV_LOG_ERROR,
-"Too small extradata size, corrupted stream or invalid MP4/AVCC bitstream\n");
- 

Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability

2016-06-19 Thread Matthieu Bouron
On Fri, Jun 17, 2016 at 09:47:35AM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils
> to overwrite the field with incorrect values as the decoder is
> asynchronous.

If there is no objection, I will push the patch in one day.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/mediacodecdec{, _h264}: set FF_CODEC_CAP_SETS_PKT_DTS capability

2016-06-17 Thread Matthieu Bouron
From: Matthieu Bouron 

And sets frames pkt_dts to AV_NOPTS_VALUE as we do not want lavc/utils
to overwrite the field with incorrect values as the decoder is
asynchronous.
---
 libavcodec/mediacodecdec.c  | 1 +
 libavcodec/mediacodecdec_h264.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c
index 0b08f020..68df885 100644
--- a/libavcodec/mediacodecdec.c
+++ b/libavcodec/mediacodecdec.c
@@ -162,6 +162,7 @@ static int mediacodec_wrap_buffer(AVCodecContext *avctx,
  *   * N avpackets can be pushed before 1 frame is actually returned
  *   * 0-sized avpackets are pushed to flush remaining frames at EOS */
 frame->pkt_pts = info->presentationTimeUs;
+frame->pkt_dts = AV_NOPTS_VALUE;
 
 av_log(avctx, AV_LOG_DEBUG,
 "Frame: width=%d stride=%d height=%d slice-height=%d "
diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 52e48ae..0f90606 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -344,4 +344,5 @@ AVCodec ff_h264_mediacodec_decoder = {
 .flush  = mediacodec_decode_flush,
 .close  = mediacodec_decode_close,
 .capabilities   = CODEC_CAP_DELAY,
+.caps_internal  = FF_CODEC_CAP_SETS_PKT_DTS,
 };
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavf/mov: ignore ctts entries that do not apply to a least one sample

2016-06-16 Thread Matthieu Bouron
From: Matthieu Bouron 

Fixes packet pts of samples which contain ctts entries with count=0.
---

Hello,

The following patch fixes packet pts of samples which contain ctts values with
count=0 (so the ctts entry does not apply to any sample if I understand
correctly). Such samples are produced by a LG G4 phone. I don't have any
sample I can share at the moment (and thus no fate test following this patch
yet).

An alternative to this patch is to remove directly the entry when the ctts atom
is parsed. Would you prefer this alternative ?

What happens without the patch is that the ctts_index is never incremented if
the current ctts entry count is 0.

Matthieu

---
 libavformat/mov.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 57a0354..7fbad22 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -5175,6 +5175,11 @@ static int mov_read_packet(AVFormatContext *s, AVPacket 
*pkt)
 
 pkt->stream_index = sc->ffindex;
 pkt->dts = sample->timestamp;
+
+if (sc->ctts_data && sc->ctts_index < sc->ctts_count &&
+sc->ctts_data[sc->ctts_index].count == 0)
+sc->ctts_index++;
+
 if (sc->ctts_data && sc->ctts_index < sc->ctts_count) {
 pkt->pts = pkt->dts + sc->dts_shift + 
sc->ctts_data[sc->ctts_index].duration;
 /* update ctts context */
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType

2016-06-15 Thread Matthieu Bouron
On Mon, Jun 13, 2016 at 02:47:45PM +0200, Matthieu Bouron wrote:
> On Wed, Jun 08, 2016 at 11:19:51PM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > Allows to select a codec (encoder or decoder) only if it supports a
> > specific profile.
> > 
> > Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an
> > AVCodecContext profile to a MediaCodec profile. It only supports H264
> > for now.
> > 
> > The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0)
> > has been dropped as this method does not allow to select a decoder
> > compatible with a specific profile.
> > ---
> 
> If there is no objection, I will push this patch in one day.

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType

2016-06-13 Thread Matthieu Bouron
On Wed, Jun 08, 2016 at 11:19:51PM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> Allows to select a codec (encoder or decoder) only if it supports a
> specific profile.
> 
> Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an
> AVCodecContext profile to a MediaCodec profile. It only supports H264
> for now.
> 
> The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0)
> has been dropped as this method does not allow to select a decoder
> compatible with a specific profile.
> ---

If there is no objection, I will push this patch in one day.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS

2016-06-13 Thread Matthieu Bouron
On Mon, Jun 13, 2016 at 12:23:07PM +0200, Hendrik Leppkes wrote:
> On Mon, Jun 13, 2016 at 11:51 AM, Matthieu Bouron
>  wrote:
> > On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote:
> >> From: Matthieu Bouron 
> >>
> >> Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
> >> to be set in their respective csd-{0,1} buffers.
> >> ---
> >>
> >> Hello,
> >>
> >> The attached patch fixes playback of HLS streams on MediaTek devices which
> >> requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of
> >> having sps+pps in the csd-0 which works on other devices).
> >>
> >> I'm not sure if I can use the ff_h264_decode_extradata this way (or at 
> >> least
> >> initialize the H264Context with zeroes minus the avctx field).
> >
> > Rebased patch (after the h264 ps merged) attached.
> >
> > I still have the same question, is my use of
> > H264Context + ff_h264_decode_extradata correct ?
> >
> 
> Using H264 decoder internals seems to be a rather unfortunate
> solution, as its prone to breakage, often subtle, as the h264 decoder
> gets changed and not all inter-module dependencies are known.
> So if possible at all, not using something that uses H264Context for
> example would be nice.
> 
> For the record, ff_h264_decode_extradata is scheduled for refactoring
> to make it independent of H264Context so it can be more easily shared
> with the h264 decoder and the h264 parser.
> Once that is done, it may give you a cleaner interface to use it from
> mediacodec as well.

Ok. I can wait for the refactor to be merged but the MediaCodec decoder
will remain broken on those devices. I'm not too happy about that if a
release is to be made in those following days. Do we have an ETA ?  I'm
also not too happy to write the same parsing code as I did before for the
AVCC format to split/extract the PPS/SPS.

Or ..., I can push this code (if its use is valid) and update it when the
merge lands (I'm helping Clément with the merges, so I will take care
about this part).

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS

2016-06-13 Thread Matthieu Bouron
On Fri, Jun 10, 2016 at 03:08:48PM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
> to be set in their respective csd-{0,1} buffers.
> ---
> 
> Hello,
> 
> The attached patch fixes playback of HLS streams on MediaTek devices which
> requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of
> having sps+pps in the csd-0 which works on other devices).
> 
> I'm not sure if I can use the ff_h264_decode_extradata this way (or at least
> initialize the H264Context with zeroes minus the avctx field).

Rebased patch (after the h264 ps merged) attached.

I still have the same question, is my use of
H264Context + ff_h264_decode_extradata correct ?

Thanks in advance,
Matthieu

[...]
>From 30d70187e10f09231a59a255204c810d1662336b Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 10 Jun 2016 13:16:09 +0200
Subject: [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to
 extract PPS/SPS

Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
to be set in their respective csd-{0,1} buffers.
---
 configure   |   2 +-
 libavcodec/mediacodecdec_h264.c | 140 +---
 2 files changed, 30 insertions(+), 112 deletions(-)

diff --git a/configure b/configure
index a220fa1..eb08478 100755
--- a/configure
+++ b/configure
@@ -2548,7 +2548,7 @@ h264_d3d11va_hwaccel_select="h264_decoder"
 h264_dxva2_hwaccel_deps="dxva2"
 h264_dxva2_hwaccel_select="h264_decoder"
 h264_mediacodec_decoder_deps="mediacodec"
-h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser"
+h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser"
 h264_mmal_decoder_deps="mmal"
 h264_mmal_decoder_select="mmal"
 h264_mmal_hwaccel_deps="mmal"
diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 52e48ae..b63b395 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -32,6 +32,7 @@
 #include "libavutil/atomic.h"
 
 #include "avcodec.h"
+#include "h264.h"
 #include "internal.h"
 #include "mediacodecdec.h"
 #include "mediacodec_wrapper.h"
@@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext {
 
 } MediaCodecH264DecContext;
 
-static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx,
-uint8_t **extradata_annexb, int *extradata_annexb_size,
-int *sps_offset, int *sps_size,
-int *pps_offset, int *pps_size)
-{
-uint16_t unit_size;
-uint64_t total_size = 0;
-
-uint8_t i, j, unit_nb;
-uint8_t sps_seen = 0;
-uint8_t pps_seen = 0;
-
-const uint8_t *extradata;
-static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 };
-
-if (avctx->extradata_size < 8) {
-av_log(avctx, AV_LOG_ERROR,
-"Too small extradata size, corrupted stream or invalid MP4/AVCC bitstream\n");
-return AVERROR(EINVAL);
-}
-
-*extradata_annexb = NULL;
-*extradata_annexb_size = 0;
-
-*sps_offset = *sps_size = 0;
-*pps_offset = *pps_size = 0;
-
-extradata = avctx->extradata + 4;
-
-/* skip length size */
-extradata++;
-
-for (j = 0; j < 2; j ++) {
-
-if (j == 0) {
-/* number of sps unit(s) */
-unit_nb = *extradata++ & 0x1f;
-} else {
-/* number of pps unit(s) */
-unit_nb = *extradata++;
-}
-
-for (i = 0; i < unit_nb; i++) {
-int err;
-
-unit_size   = AV_RB16(extradata);
-total_size += unit_size + 4;
-
-if (total_size > INT_MAX) {
-av_log(avctx, AV_LOG_ERROR,
-"Too big extradata size, corrupted stream or invalid MP4/AVCC bitstream\n");
-av_freep(extradata_annexb);
-return AVERROR(EINVAL);
-}
-
-if (extradata + 2 + unit_size > avctx->extradata + avctx->extradata_size) {
-av_log(avctx, AV_LOG_ERROR, "Packet header is not contained in global extradata, "
-"corrupted stream or invalid MP4/AVCC bitstream\n");
-av_freep(extradata_annexb);
-return AVERROR(EINVAL);
-}
-
-if ((err = av_reallocp(extradata_annexb, total_size)) < 0) {
-return err;
-}
-
-memcpy(*extradata_annexb + total_size - unit_size - 4, nalu_header, 4);
-memcpy(*extradata_annexb + total_size - unit_size, extradata + 2, unit_size);
-extradata += 2 + unit_size;
-}
-
-if (unit_nb) {
-if (j == 0) {
-sps_seen = 1;
-*sps_size = total_

[FFmpeg-devel] [PATCH] lavc/mediacodecdec_h264: use ff_h264_decode_extradata to extract PPS/SPS

2016-06-10 Thread Matthieu Bouron
From: Matthieu Bouron 

Fixes playback of HLS streams on MediaTek devices which requires PPS/SPS
to be set in their respective csd-{0,1} buffers.
---

Hello,

The attached patch fixes playback of HLS streams on MediaTek devices which
requires PPS/SPS to be set in their respetive csd-{0,1} buffers (instead of
having sps+pps in the csd-0 which works on other devices).

I'm not sure if I can use the ff_h264_decode_extradata this way (or at least
initialize the H264Context with zeroes minus the avctx field).

Matthieu

---
 configure   |   2 +-
 libavcodec/mediacodecdec_h264.c | 140 +---
 2 files changed, 30 insertions(+), 112 deletions(-)

diff --git a/configure b/configure
index 7c463a5..508affe 100755
--- a/configure
+++ b/configure
@@ -2544,7 +2544,7 @@ h264_d3d11va_hwaccel_select="h264_decoder"
 h264_dxva2_hwaccel_deps="dxva2"
 h264_dxva2_hwaccel_select="h264_decoder"
 h264_mediacodec_decoder_deps="mediacodec"
-h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser"
+h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_decoder h264_parser"
 h264_mmal_decoder_deps="mmal"
 h264_mmal_decoder_select="mmal"
 h264_mmal_hwaccel_deps="mmal"
diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 52e48ae..69e9122 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -32,6 +32,7 @@
 #include "libavutil/atomic.h"
 
 #include "avcodec.h"
+#include "h264.h"
 #include "internal.h"
 #include "mediacodecdec.h"
 #include "mediacodec_wrapper.h"
@@ -50,104 +51,6 @@ typedef struct MediaCodecH264DecContext {
 
 } MediaCodecH264DecContext;
 
-static int h264_extradata_to_annexb_sps_pps(AVCodecContext *avctx,
-uint8_t **extradata_annexb, int *extradata_annexb_size,
-int *sps_offset, int *sps_size,
-int *pps_offset, int *pps_size)
-{
-uint16_t unit_size;
-uint64_t total_size = 0;
-
-uint8_t i, j, unit_nb;
-uint8_t sps_seen = 0;
-uint8_t pps_seen = 0;
-
-const uint8_t *extradata;
-static const uint8_t nalu_header[4] = { 0x00, 0x00, 0x00, 0x01 };
-
-if (avctx->extradata_size < 8) {
-av_log(avctx, AV_LOG_ERROR,
-"Too small extradata size, corrupted stream or invalid MP4/AVCC 
bitstream\n");
-return AVERROR(EINVAL);
-}
-
-*extradata_annexb = NULL;
-*extradata_annexb_size = 0;
-
-*sps_offset = *sps_size = 0;
-*pps_offset = *pps_size = 0;
-
-extradata = avctx->extradata + 4;
-
-/* skip length size */
-extradata++;
-
-for (j = 0; j < 2; j ++) {
-
-if (j == 0) {
-/* number of sps unit(s) */
-unit_nb = *extradata++ & 0x1f;
-} else {
-/* number of pps unit(s) */
-unit_nb = *extradata++;
-}
-
-for (i = 0; i < unit_nb; i++) {
-int err;
-
-unit_size   = AV_RB16(extradata);
-total_size += unit_size + 4;
-
-if (total_size > INT_MAX) {
-av_log(avctx, AV_LOG_ERROR,
-"Too big extradata size, corrupted stream or invalid 
MP4/AVCC bitstream\n");
-av_freep(extradata_annexb);
-return AVERROR(EINVAL);
-}
-
-if (extradata + 2 + unit_size > avctx->extradata + 
avctx->extradata_size) {
-av_log(avctx, AV_LOG_ERROR, "Packet header is not contained in 
global extradata, "
-"corrupted stream or invalid MP4/AVCC bitstream\n");
-av_freep(extradata_annexb);
-return AVERROR(EINVAL);
-}
-
-if ((err = av_reallocp(extradata_annexb, total_size)) < 0) {
-return err;
-}
-
-memcpy(*extradata_annexb + total_size - unit_size - 4, 
nalu_header, 4);
-memcpy(*extradata_annexb + total_size - unit_size, extradata + 2, 
unit_size);
-extradata += 2 + unit_size;
-}
-
-if (unit_nb) {
-if (j == 0) {
-sps_seen = 1;
-*sps_size = total_size;
-} else {
-pps_seen = 1;
-*pps_size = total_size - *sps_size;
-*pps_offset = *sps_size;
-}
-}
-}
-
-*extradata_annexb_size = total_size;
-
-if (!sps_seen)
-av_log(avctx, AV_LOG_WARNING,
-   "Warning: SPS NALU missing or invalid. "
-   "The resulting stream may not play.\n");
-
-if (!pps_seen)
-av_log(avctx, AV_LOG_WARNING,
-   "Warning: PPS NALU missing or invalid. "
-   "The resulting stream may not play.\n");
-
-return 0;
-}
-
 static

Re: [FFmpeg-devel] FFmpeg 3.1

2016-06-10 Thread Matthieu Bouron
On Mon, Jun 06, 2016 at 10:23:20AM +0200, Matthieu Bouron wrote:
> On Mon, Jun 06, 2016 at 03:28:19AM +0200, Michael Niedermayer wrote:
> > Hi all
> > 
> > its time for making the next major release
> > If you want something in dont forget to push it to git master
> 
> I'd like to have the 3 pending MediaCodec patches merged before the
> release.

I'd like to have an upcoming MediaCodec patch which fixes playback of HLS
streams (and more generally annex-b streams) on MediaTek devices. The
issue has been reported by an user.

Thanks,
Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/mediacodec: refactor ff_AMediaCodecList_getCodecByType

2016-06-08 Thread Matthieu Bouron
From: Matthieu Bouron 

Allows to select a codec (encoder or decoder) only if it supports a
specific profile.

Adds ff_AMediaCodecProfile_getProfileFromAVCodecContext to convert an
AVCodecContext profile to a MediaCodec profile. It only supports H264
for now.

The codepath using MediaCodecList.findDecoderForFormat() (Android >= 5.0)
has been dropped as this method does not allow to select a decoder
compatible with a specific profile.
---
 libavcodec/mediacodec_wrapper.c | 277 ++--
 libavcodec/mediacodec_wrapper.h |   4 +-
 libavcodec/mediacodecdec.c  |   8 +-
 3 files changed, 216 insertions(+), 73 deletions(-)

diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
index c05b6fd..b87e62a 100644
--- a/libavcodec/mediacodec_wrapper.c
+++ b/libavcodec/mediacodec_wrapper.c
@@ -26,6 +26,7 @@
 #include "libavutil/mem.h"
 #include "libavutil/avstring.h"
 
+#include "avcodec.h"
 #include "ffjni.h"
 #include "version.h"
 #include "mediacodec_wrapper.h"
@@ -41,9 +42,26 @@ struct JNIAMediaCodecListFields {
 
 jclass mediacodec_info_class;
 jmethodID get_name_id;
+jmethodID get_codec_capabilities_id;
 jmethodID get_supported_types_id;
 jmethodID is_encoder_id;
 
+jclass codec_capabilities_class;
+jfieldID color_formats_id;
+jfieldID profile_levels_id;
+
+jclass codec_profile_level_class;
+jfieldID profile_id;
+jfieldID level_id;
+
+jfieldID avc_profile_baseline_id;
+jfieldID avc_profile_main_id;
+jfieldID avc_profile_extended_id;
+jfieldID avc_profile_high_id;
+jfieldID avc_profile_high10_id;
+jfieldID avc_profile_high422_id;
+jfieldID avc_profile_high444_id;
+
 } JNIAMediaCodecListFields;
 
 static const struct FFJniField jni_amediacodeclist_mapping[] = {
@@ -56,9 +74,26 @@ static const struct FFJniField jni_amediacodeclist_mapping[] 
= {
 
 { "android/media/MediaCodecInfo", NULL, NULL, FF_JNI_CLASS, 
offsetof(struct JNIAMediaCodecListFields, mediacodec_info_class), 1 },
 { "android/media/MediaCodecInfo", "getName", "()Ljava/lang/String;", 
FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, get_name_id), 1 },
+{ "android/media/MediaCodecInfo", "getCapabilitiesForType", 
"(Ljava/lang/String;)Landroid/media/MediaCodecInfo$CodecCapabilities;", 
FF_JNI_METHOD, offsetof(struct JNIAMediaCodecListFields, 
get_codec_capabilities_id), 1 },
 { "android/media/MediaCodecInfo", "getSupportedTypes", 
"()[Ljava/lang/String;", FF_JNI_METHOD, offsetof(struct 
JNIAMediaCodecListFields, get_supported_types_id), 1 },
 { "android/media/MediaCodecInfo", "isEncoder", "()Z", FF_JNI_METHOD, 
offsetof(struct JNIAMediaCodecListFields, is_encoder_id), 1 },
 
+{ "android/media/MediaCodecInfo$CodecCapabilities", NULL, NULL, 
FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, 
codec_capabilities_class), 1 },
+{ "android/media/MediaCodecInfo$CodecCapabilities", "colorFormats", 
"[I", FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, 
color_formats_id), 1 },
+{ "android/media/MediaCodecInfo$CodecCapabilities", "profileLevels", 
"[Landroid/media/MediaCodecInfo$CodecProfileLevel;", FF_JNI_FIELD, 
offsetof(struct JNIAMediaCodecListFields, profile_levels_id), 1 },
+
+{ "android/media/MediaCodecInfo$CodecProfileLevel", NULL, NULL, 
FF_JNI_CLASS, offsetof(struct JNIAMediaCodecListFields, 
codec_profile_level_class), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileLevel", "profile", "I", 
FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, profile_id), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileLevel", "level", "I", 
FF_JNI_FIELD, offsetof(struct JNIAMediaCodecListFields, level_id), 1 },
+
+{ "android/media/MediaCodecInfo$CodecProfileLevel", 
"AVCProfileBaseline", "I", FF_JNI_STATIC_FIELD, offsetof(struct 
JNIAMediaCodecListFields, avc_profile_baseline_id), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileMain", 
"I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, 
avc_profile_main_id), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileLevel", 
"AVCProfileExtended", "I", FF_JNI_STATIC_FIELD, offsetof(struct 
JNIAMediaCodecListFields, avc_profile_extended_id), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileLevel", "AVCProfileHigh", 
"I", FF_JNI_STATIC_FIELD, offsetof(struct JNIAMediaCodecListFields, 
avc_profile_high_id), 1 },
+{ "android/media/MediaCodecInfo$CodecProfileL

Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-06-07 Thread Matthieu Bouron
On Mon, Jun 06, 2016 at 11:41:41AM +0200, Matthieu Bouron wrote:
> On Mon, Jun 06, 2016 at 11:29:03AM +0200, Hendrik Leppkes wrote:
> > On Mon, Jun 6, 2016 at 9:54 AM, Matthieu Bouron
> >  wrote:
> > > On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote:
> > >> On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote:
> > >> > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote:
> > >> > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron
> > >> > >  wrote:
> > >> > > > From: Matthieu Bouron 
> > >> > > >
> > >> > > > Codec width/height restrictions seem hardcoded at the OMX level and
> > >> > > > seem arbitrary. Bypassing those restrictions allows a device to 
> > >> > > > decode
> > >> > > > streams at higher resolutions.
> > >> > > >
> > >> > > > For example it allows a Nexus 5 to decode h264 streams with a 
> > >> > > > resolution
> > >> > > > higher than 1920x1080.
> > >> > >
> > >> > >
> > >> > > What happens if the resolution actually exceeds the devices 
> > >> > > capabilities?
> > >> >
> > >> > The patch has been tested on various devices and it has been working so
> > >> > far. When the resolution actually exceeds the device capabilities the
> > >> > codec just fails to configure itself.
> > >> >
> > >> > However I did not try to craft samples with really high resolutions 
> > >> > (higher
> > >> > than ~4K) to test the patch against.
> > >> >
> > >> > I will double check what is happening with both SW output and surface
> > >> > output.
> > >>
> > >> I tested on a bunch of devices with different chipsets and they all fail 
> > >> at
> > >> the configuration step.
> > >>
> > >
> > > If there is no objection, I will push the patchset in one day.
> > >
> > 
> > If you have confirmed that it still fails gracefully but accepts more
> > streams, then LGTM.
> 
> Thanks.

Pushed a different version of the patchset (struct declarations have been
moved at the beginning of the file so MediaFormat methods can be re-used
in ff_AMediaCodecList_getCodecByName (and are not redeclared)).

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: improve error messages

2016-06-07 Thread Matthieu Bouron
On Mon, Jun 06, 2016 at 10:08:10PM +0200, Michael Niedermayer wrote:
> On Mon, Jun 06, 2016 at 10:05:38AM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> >  libavcodec/mediacodecdec.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> LGTM
> 
> thx

Pushed. Thanks.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-06-06 Thread Matthieu Bouron
On Mon, Jun 06, 2016 at 11:29:03AM +0200, Hendrik Leppkes wrote:
> On Mon, Jun 6, 2016 at 9:54 AM, Matthieu Bouron
>  wrote:
> > On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote:
> >> On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote:
> >> > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote:
> >> > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron
> >> > >  wrote:
> >> > > > From: Matthieu Bouron 
> >> > > >
> >> > > > Codec width/height restrictions seem hardcoded at the OMX level and
> >> > > > seem arbitrary. Bypassing those restrictions allows a device to 
> >> > > > decode
> >> > > > streams at higher resolutions.
> >> > > >
> >> > > > For example it allows a Nexus 5 to decode h264 streams with a 
> >> > > > resolution
> >> > > > higher than 1920x1080.
> >> > >
> >> > >
> >> > > What happens if the resolution actually exceeds the devices 
> >> > > capabilities?
> >> >
> >> > The patch has been tested on various devices and it has been working so
> >> > far. When the resolution actually exceeds the device capabilities the
> >> > codec just fails to configure itself.
> >> >
> >> > However I did not try to craft samples with really high resolutions 
> >> > (higher
> >> > than ~4K) to test the patch against.
> >> >
> >> > I will double check what is happening with both SW output and surface
> >> > output.
> >>
> >> I tested on a bunch of devices with different chipsets and they all fail at
> >> the configuration step.
> >>
> >
> > If there is no objection, I will push the patchset in one day.
> >
> 
> If you have confirmed that it still fails gracefully but accepts more
> streams, then LGTM.

Thanks.

I'm working on an another patch to check what profile the codec supports
and fail at init time if the stream profile is too high (currently the
init passes but it fails afterwards while trying to decode frames, which
is annoying if an application wants to do some kind of fallback at init
time).

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] FFmpeg 3.1

2016-06-06 Thread Matthieu Bouron
On Mon, Jun 06, 2016 at 03:28:19AM +0200, Michael Niedermayer wrote:
> Hi all
> 
> its time for making the next major release
> If you want something in dont forget to push it to git master

I'd like to have the 3 pending MediaCodec patches merged before the
release.
I'll re-send to the ml the MediaCodec hwaccel patch after the release.

Thanks,
Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/mediacodec: improve error messages

2016-06-06 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodecdec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c
index 712f984..676ade7 100644
--- a/libavcodec/mediacodecdec.c
+++ b/libavcodec/mediacodecdec.c
@@ -198,7 +198,7 @@ static int mediacodec_wrap_buffer(AVCodecContext *avctx,
 done:
 status = ff_AMediaCodec_releaseOutputBuffer(s->codec, index, 0);
 if (status < 0) {
-av_log(NULL, AV_LOG_ERROR, "Failed to release output buffer\n");
+av_log(avctx, AV_LOG_ERROR, "Failed to release output buffer\n");
 ret = AVERROR_EXTERNAL;
 }
 
@@ -539,7 +539,7 @@ int ff_mediacodec_dec_flush(AVCodecContext *avctx, 
MediaCodecDecContext *s)
 
 status = ff_AMediaCodec_flush(codec);
 if (status < 0) {
-av_log(NULL, AV_LOG_ERROR, "Failed to flush MediaCodec %p", codec);
+av_log(avctx, AV_LOG_ERROR, "Failed to flush codec\n");
 return AVERROR_EXTERNAL;
 }
 
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-06-06 Thread Matthieu Bouron
On Tue, May 31, 2016 at 05:41:16PM +0200, Matthieu Bouron wrote:
> On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote:
> > On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote:
> > > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron
> > >  wrote:
> > > > From: Matthieu Bouron 
> > > >
> > > > Codec width/height restrictions seem hardcoded at the OMX level and
> > > > seem arbitrary. Bypassing those restrictions allows a device to decode
> > > > streams at higher resolutions.
> > > >
> > > > For example it allows a Nexus 5 to decode h264 streams with a resolution
> > > > higher than 1920x1080.
> > > 
> > > 
> > > What happens if the resolution actually exceeds the devices capabilities?
> > 
> > The patch has been tested on various devices and it has been working so
> > far. When the resolution actually exceeds the device capabilities the
> > codec just fails to configure itself.
> > 
> > However I did not try to craft samples with really high resolutions (higher
> > than ~4K) to test the patch against.
> > 
> > I will double check what is happening with both SW output and surface
> > output.
> 
> I tested on a bunch of devices with different chipsets and they all fail at
> the configuration step.
> 

If there is no objection, I will push the patchset in one day.

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] vaapi_encode_h26[45]: Reject bitrate targets higher than 2^31

2016-06-03 Thread Matthieu Bouron
On Thu, Jun 02, 2016 at 10:36:21PM +0100, Mark Thompson wrote:
> On 02/06/16 22:00, Matthieu Bouron wrote:
> > On Thu, Jun 02, 2016 at 07:13:39PM +0100, Mark Thompson wrote:
> >> ---
> >> ... something like this.
> >>
> >>  libavcodec/vaapi_encode_h264.c | 6 ++
> >>  libavcodec/vaapi_encode_h265.c | 6 ++
> >>  2 files changed, 12 insertions(+)
> >>
> >> diff --git a/libavcodec/vaapi_encode_h264.c 
> >> b/libavcodec/vaapi_encode_h264.c
> >> index 0a99bb1..019ed1f 100644
> >> --- a/libavcodec/vaapi_encode_h264.c
> >> +++ b/libavcodec/vaapi_encode_h264.c
> >> @@ -731,6 +731,12 @@ static av_cold int 
> >> vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx
> >>  int hrd_buffer_size;
> >>  int hrd_initial_buffer_fullness;
> >>
> >> +if (avctx->bit_rate >= 1u << 31) {
> > 
> > Wouldn't INT32_MAX be more aproriate ?
> 
> Hmm.  No preference - I went for 1u << 31 to match the 2^31 in the error 
> message, but maybe INT32_MAX makes the code constraint slightly clearer.

IMHO, I think it's clearer to use INT32_MAX but as you are the maintainer
of those encoders, it's up to you to decide.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/vaapi_encoder_{h264, h265}: fix bad format warning

2016-06-03 Thread Matthieu Bouron
On Thu, Jun 02, 2016 at 07:09:16PM +0100, Mark Thompson wrote:
> On 02/06/16 17:20, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> >  libavcodec/vaapi_encode_h264.c | 2 +-
> >  libavcodec/vaapi_encode_h265.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
> > index 0a99bb1..dc7774b 100644
> > --- a/libavcodec/vaapi_encode_h264.c
> > +++ b/libavcodec/vaapi_encode_h264.c
> > @@ -769,7 +769,7 @@ static av_cold int 
> > vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx
> >  priv->fixed_qp_p   = 26;
> >  priv->fixed_qp_b   = 26;
> >  
> > -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n",
> > +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" 
> > bps.\n",
> > avctx->bit_rate);
> >  return 0;
> >  }
> > diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
> > index 05d3aa4..17cd900 100644
> > --- a/libavcodec/vaapi_encode_h265.c
> > +++ b/libavcodec/vaapi_encode_h265.c
> > @@ -1196,7 +1196,7 @@ static av_cold int 
> > vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx
> >  priv->fixed_qp_p   = 30;
> >  priv->fixed_qp_b   = 30;
> >  
> > -av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n",
> > +av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" 
> > bps.\n",
> > avctx->bit_rate);
> >  return 0;
> >  }
> > 
> 
> LGTM to fix the warning.
> 
> I didn't realise that bit_rate has a different type in the two tines - I 
> think a bit more is needed here to just reject higher numbers because all of 
> the relevant fields in va.h structures are 32-bit anyway...

Pushed. Thanks.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API

2016-06-03 Thread Matthieu Bouron
On Wed, Jun 01, 2016 at 11:25:07AM +0200, Matthieu Bouron wrote:
> On Tue, May 31, 2016 at 10:13:40AM +0200, Matthieu Bouron wrote:
> > On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote:
> > > On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote:
> > > > From: Matthieu Bouron 
> > > > 
> > > > ---
> > > >  libavcodec/mediacodecdec_h264.c | 61 
> > > > +
> > > >  1 file changed, 37 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/libavcodec/mediacodecdec_h264.c 
> > > > b/libavcodec/mediacodecdec_h264.c
> > > > index 2d1d525..7f764e9 100644
> > > > --- a/libavcodec/mediacodecdec_h264.c
> > > > +++ b/libavcodec/mediacodecdec_h264.c
> > > > @@ -23,6 +23,7 @@
> > > >  #include 
> > > >  #include 
> > > >  
> > > > +#include "libavutil/avassert.h"
> > > >  #include "libavutil/common.h"
> > > >  #include "libavutil/fifo.h"
> > > >  #include "libavutil/opt.h"
> > > > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext {
> > > >  
> > > >  MediaCodecDecContext ctx;
> > > >  
> > > > -AVBitStreamFilterContext *bsf;
> > > > +AVBSFContext *bsf;
> > > >  
> > > >  AVFifoBuffer *fifo;
> > > >  
> > > > -AVPacket input_ref;
> > > >  AVPacket filtered_pkt;
> > > > -uint8_t *filtered_data;
> > > >  
> > > >  } MediaCodecH264DecContext;
> > > >  
> > > > @@ -156,8 +155,9 @@ static av_cold int 
> > > > mediacodec_decode_close(AVCodecContext *avctx)
> > > >  ff_mediacodec_dec_close(avctx, &s->ctx);
> > > >  
> > > >  av_fifo_free(s->fifo);
> > > > +av_bsf_free(&s->bsf);
> > > >  
> > > > -av_bitstream_filter_close(s->bsf);
> > > > +av_packet_unref(&s->filtered_pkt);
> > > >  
> > > >  return 0;
> > > >  }
> > > > @@ -211,12 +211,23 @@ static av_cold int 
> > > > mediacodec_decode_init(AVCodecContext *avctx)
> > > >  goto done;
> > > >  }
> > > >  
> > > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb");
> > > > -if (!s->bsf) {
> > > > -ret = AVERROR(ENOMEM);
> > > > +const AVBitStreamFilter *bsf = 
> > > > av_bsf_get_by_name("h264_mp4toannexb");
> > > > +if(!bsf) {
> > > > +ret = AVERROR_BSF_NOT_FOUND;
> > > >  goto done;
> > > >  }
> > > >  
> > > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) {
> > > > +goto done;
> > > > +}
> > > > +
> > > > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, 
> > > > avctx)) < 0) ||
> > > > +((ret = av_bsf_init(s->bsf)) < 0)) {
> > > > +  goto done;
> > > > +}
> > > > +
> > > > +av_init_packet(&s->filtered_pkt);
> > > > +
> > > >  done:
> > > >  if (format) {
> > > >  ff_AMediaFormat_delete(format);
> > > > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext 
> > > > *avctx, void *data,
> > > >  while (!*got_frame) {
> > > >  /* prepare the input data -- convert to Annex B if needed */
> > > >  if (s->filtered_pkt.size <= 0) {
> > > > -int size;
> > > > +AVPacket input_pkt = { 0 };
> > > > +
> > > > +av_packet_unref(&s->filtered_pkt);
> > > >  
> > > >  /* no more data */
> > > >  if (av_fifo_size(s->fifo) < sizeof(AVPacket)) {
> > > > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext 
> > > > *avctx, void *data,
> > > >  ff_mediacodec_dec_decode(avctx, &s->ctx, frame, 
> > > > got_frame, avpkt);
> > > >  }
> > > >  
> > > > -if (s->filtered_data != s->input_ref.data)
> > > > -av_freep(&s->filtered_data);
> > > > -  

Re: [FFmpeg-devel] [PATCH] vaapi_encode_h26[45]: Reject bitrate targets higher than 2^31

2016-06-02 Thread Matthieu Bouron
On Thu, Jun 02, 2016 at 07:13:39PM +0100, Mark Thompson wrote:
> ---
> ... something like this.
> 
>  libavcodec/vaapi_encode_h264.c | 6 ++
>  libavcodec/vaapi_encode_h265.c | 6 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
> index 0a99bb1..019ed1f 100644
> --- a/libavcodec/vaapi_encode_h264.c
> +++ b/libavcodec/vaapi_encode_h264.c
> @@ -731,6 +731,12 @@ static av_cold int 
> vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx
>  int hrd_buffer_size;
>  int hrd_initial_buffer_fullness;
> 
> +if (avctx->bit_rate >= 1u << 31) {

Wouldn't INT32_MAX be more aproriate ?

> +av_log(avctx, AV_LOG_ERROR, "Target bitrate of 2^31 bps or "
> +   "higher is not supported.\n");
> +return AVERROR(EINVAL);
> +}
> +
>  if (avctx->rc_buffer_size)
>  hrd_buffer_size = avctx->rc_buffer_size;
>  else
> diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
> index 05d3aa4..060c7b7 100644
> --- a/libavcodec/vaapi_encode_h265.c
> +++ b/libavcodec/vaapi_encode_h265.c
> @@ -1158,6 +1158,12 @@ static av_cold int 
> vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx
>  int hrd_buffer_size;
>  int hrd_initial_buffer_fullness;
> 
> +if (avctx->bit_rate >= 1u << 31) {a

Same comment as above.

> +av_log(avctx, AV_LOG_ERROR, "Target bitrate of 2^31 bps or "
> +   "higher is not supported.\n");
> +return AVERROR(EINVAL);
> +}
> +
>  if (avctx->rc_buffer_size)
>  hrd_buffer_size = avctx->rc_buffer_size;
>  else
> -- 
> 2.8.1
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/vaapi_encoder_{h264, h265}: fix bad format warning

2016-06-02 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/vaapi_encode_h264.c | 2 +-
 libavcodec/vaapi_encode_h265.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
index 0a99bb1..dc7774b 100644
--- a/libavcodec/vaapi_encode_h264.c
+++ b/libavcodec/vaapi_encode_h264.c
@@ -769,7 +769,7 @@ static av_cold int 
vaapi_encode_h264_init_constant_bitrate(AVCodecContext *avctx
 priv->fixed_qp_p   = 26;
 priv->fixed_qp_b   = 26;
 
-av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n",
+av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" bps.\n",
avctx->bit_rate);
 return 0;
 }
diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
index 05d3aa4..17cd900 100644
--- a/libavcodec/vaapi_encode_h265.c
+++ b/libavcodec/vaapi_encode_h265.c
@@ -1196,7 +1196,7 @@ static av_cold int 
vaapi_encode_h265_init_constant_bitrate(AVCodecContext *avctx
 priv->fixed_qp_p   = 30;
 priv->fixed_qp_b   = 30;
 
-av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %d bps.\n",
+av_log(avctx, AV_LOG_DEBUG, "Using constant-bitrate = %"PRId64" bps.\n",
avctx->bit_rate);
 return 0;
 }
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API

2016-06-01 Thread Matthieu Bouron
On Tue, May 31, 2016 at 10:13:40AM +0200, Matthieu Bouron wrote:
> On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote:
> > On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote:
> > > From: Matthieu Bouron 
> > > 
> > > ---
> > >  libavcodec/mediacodecdec_h264.c | 61 
> > > +
> > >  1 file changed, 37 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/libavcodec/mediacodecdec_h264.c 
> > > b/libavcodec/mediacodecdec_h264.c
> > > index 2d1d525..7f764e9 100644
> > > --- a/libavcodec/mediacodecdec_h264.c
> > > +++ b/libavcodec/mediacodecdec_h264.c
> > > @@ -23,6 +23,7 @@
> > >  #include 
> > >  #include 
> > >  
> > > +#include "libavutil/avassert.h"
> > >  #include "libavutil/common.h"
> > >  #include "libavutil/fifo.h"
> > >  #include "libavutil/opt.h"
> > > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext {
> > >  
> > >  MediaCodecDecContext ctx;
> > >  
> > > -AVBitStreamFilterContext *bsf;
> > > +AVBSFContext *bsf;
> > >  
> > >  AVFifoBuffer *fifo;
> > >  
> > > -AVPacket input_ref;
> > >  AVPacket filtered_pkt;
> > > -uint8_t *filtered_data;
> > >  
> > >  } MediaCodecH264DecContext;
> > >  
> > > @@ -156,8 +155,9 @@ static av_cold int 
> > > mediacodec_decode_close(AVCodecContext *avctx)
> > >  ff_mediacodec_dec_close(avctx, &s->ctx);
> > >  
> > >  av_fifo_free(s->fifo);
> > > +av_bsf_free(&s->bsf);
> > >  
> > > -av_bitstream_filter_close(s->bsf);
> > > +av_packet_unref(&s->filtered_pkt);
> > >  
> > >  return 0;
> > >  }
> > > @@ -211,12 +211,23 @@ static av_cold int 
> > > mediacodec_decode_init(AVCodecContext *avctx)
> > >  goto done;
> > >  }
> > >  
> > > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb");
> > > -if (!s->bsf) {
> > > -ret = AVERROR(ENOMEM);
> > > +const AVBitStreamFilter *bsf = 
> > > av_bsf_get_by_name("h264_mp4toannexb");
> > > +if(!bsf) {
> > > +ret = AVERROR_BSF_NOT_FOUND;
> > >  goto done;
> > >  }
> > >  
> > > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) {
> > > +goto done;
> > > +}
> > > +
> > > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) 
> > > < 0) ||
> > > +((ret = av_bsf_init(s->bsf)) < 0)) {
> > > +  goto done;
> > > +}
> > > +
> > > +av_init_packet(&s->filtered_pkt);
> > > +
> > >  done:
> > >  if (format) {
> > >  ff_AMediaFormat_delete(format);
> > > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext 
> > > *avctx, void *data,
> > >  while (!*got_frame) {
> > >  /* prepare the input data -- convert to Annex B if needed */
> > >  if (s->filtered_pkt.size <= 0) {
> > > -int size;
> > > +AVPacket input_pkt = { 0 };
> > > +
> > > +av_packet_unref(&s->filtered_pkt);
> > >  
> > >  /* no more data */
> > >  if (av_fifo_size(s->fifo) < sizeof(AVPacket)) {
> > > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext 
> > > *avctx, void *data,
> > >  ff_mediacodec_dec_decode(avctx, &s->ctx, frame, 
> > > got_frame, avpkt);
> > >  }
> > >  
> > > -if (s->filtered_data != s->input_ref.data)
> > > -av_freep(&s->filtered_data);
> > > -s->filtered_data = NULL;
> > > -av_packet_unref(&s->input_ref);
> > > +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), 
> > > NULL);
> > > +
> > > +ret = av_bsf_send_packet(s->bsf, &input_pkt);
> > > +if (ret < 0) {
> > > +return ret;
> > > +}
> > > +
> > > +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt);
> > > +  

Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-05-31 Thread Matthieu Bouron
On Tue, May 31, 2016 at 03:51:20PM +0200, Matthieu Bouron wrote:
> On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote:
> > On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron
> >  wrote:
> > > From: Matthieu Bouron 
> > >
> > > Codec width/height restrictions seem hardcoded at the OMX level and
> > > seem arbitrary. Bypassing those restrictions allows a device to decode
> > > streams at higher resolutions.
> > >
> > > For example it allows a Nexus 5 to decode h264 streams with a resolution
> > > higher than 1920x1080.
> > 
> > 
> > What happens if the resolution actually exceeds the devices capabilities?
> 
> The patch has been tested on various devices and it has been working so
> far. When the resolution actually exceeds the device capabilities the
> codec just fails to configure itself.
> 
> However I did not try to craft samples with really high resolutions (higher
> than ~4K) to test the patch against.
> 
> I will double check what is happening with both SW output and surface
> output.

I tested on a bunch of devices with different chipsets and they all fail at
the configuration step.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-05-31 Thread Matthieu Bouron
On Tue, May 31, 2016 at 03:35:49PM +0200, Hendrik Leppkes wrote:
> On Tue, May 31, 2016 at 3:00 PM, Matthieu Bouron
>  wrote:
> > From: Matthieu Bouron 
> >
> > Codec width/height restrictions seem hardcoded at the OMX level and
> > seem arbitrary. Bypassing those restrictions allows a device to decode
> > streams at higher resolutions.
> >
> > For example it allows a Nexus 5 to decode h264 streams with a resolution
> > higher than 1920x1080.
> 
> 
> What happens if the resolution actually exceeds the devices capabilities?

The patch has been tested on various devices and it has been working so
far. When the resolution actually exceeds the device capabilities the
codec just fails to configure itself.

However I did not try to craft samples with really high resolutions (higher
than ~4K) to test the patch against.

I will double check what is happening with both SW output and surface
output.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: bypass width/height restrictions when looking for a decoder

2016-05-31 Thread Matthieu Bouron
From: Matthieu Bouron 

Codec width/height restrictions seem hardcoded at the OMX level and
seem arbitrary. Bypassing those restrictions allows a device to decode
streams at higher resolutions.

For example it allows a Nexus 5 to decode h264 streams with a resolution
higher than 1920x1080.
---
 libavcodec/mediacodec_wrapper.c | 31 ++-
 libavcodec/mediacodec_wrapper.h |  2 +-
 libavcodec/mediacodecdec.c  |  2 +-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
index c847a11..2e3fcef 100644
--- a/libavcodec/mediacodec_wrapper.c
+++ b/libavcodec/mediacodec_wrapper.c
@@ -33,7 +33,8 @@
 struct JNIAMediaCodecListFields {
 
 jclass mediaformat_class;
-jmethodID create_video_format_id;
+jmethodID mediaformat_init_id;
+jmethodID set_string_id;
 
 jclass mediacodec_list_class;
 jmethodID init_id;
@@ -51,7 +52,8 @@ struct JNIAMediaCodecListFields {
 
 static const struct FFJniField jfields_mapping[] = {
 { "android/media/MediaFormat", NULL, NULL, FF_JNI_CLASS, offsetof(struct 
JNIAMediaCodecListFields, mediaformat_class), 1 },
-{ "android/media/MediaFormat", "createVideoFormat", 
"(Ljava/lang/String;II)Landroid/media/MediaFormat;", FF_JNI_STATIC_METHOD, 
offsetof(struct JNIAMediaCodecListFields, create_video_format_id), 1 },
+{ "android/media/MediaFormat", "", "()V", FF_JNI_METHOD, 
offsetof(struct JNIAMediaCodecListFields, mediaformat_init_id), 1},
+{ "android/media/MediaFormat", "setString", 
"(Ljava/lang/String;Ljava/lang/String;)V", FF_JNI_METHOD, offsetof(struct 
JNIAMediaCodecListFields, set_string_id), 1},
 
 { "android/media/MediaCodecList", NULL, NULL, FF_JNI_CLASS, 
offsetof(struct JNIAMediaCodecListFields, mediacodec_list_class), 1 },
 { "android/media/MediaCodecList", "", "(I)V", FF_JNI_METHOD, 
offsetof(struct JNIAMediaCodecListFields, init_id), 0 },
@@ -87,7 +89,7 @@ static const struct FFJniField jfields_mapping[] = {
 ff_jni_detach_env(log_ctx);\
 } while (0)
 
-char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int 
height, void *log_ctx)
+char *ff_AMediaCodecList_getCodecNameByType(const char *mime, void *log_ctx)
 {
 int ret;
 char *name = NULL;
@@ -99,6 +101,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char 
*mime, int width, int hei
 
 jobject format = NULL;
 jobject codec = NULL;
+jstring key = NULL;
 jstring tmp = NULL;
 
 jobject info = NULL;
@@ -112,15 +115,29 @@ char *ff_AMediaCodecList_getCodecNameByType(const char 
*mime, int width, int hei
 }
 
 if (jfields.init_id && jfields.find_decoder_for_format_id) {
+key = ff_jni_utf_chars_to_jstring(env, "mime", log_ctx);
+if (!key) {
+goto done;
+}
+
 tmp = ff_jni_utf_chars_to_jstring(env, mime, log_ctx);
 if (!tmp) {
 goto done;
 }
 
-format = (*env)->CallStaticObjectMethod(env, 
jfields.mediaformat_class, jfields.create_video_format_id, tmp, width, height);
+format = (*env)->NewObject(env, jfields.mediaformat_class, 
jfields.mediaformat_init_id);
+if (ff_jni_exception_check(env, 1, log_ctx) < 0) {
+goto done;
+}
+
+(*env)->CallVoidMethod(env, format, jfields.set_string_id, key, tmp);
 if (ff_jni_exception_check(env, 1, log_ctx) < 0) {
 goto done;
 }
+
+(*env)->DeleteLocalRef(env, key);
+key = NULL;
+
 (*env)->DeleteLocalRef(env, tmp);
 tmp = NULL;
 
@@ -135,7 +152,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char 
*mime, int width, int hei
 }
 if (!tmp) {
 av_log(NULL, AV_LOG_ERROR, "Could not find decoder in media codec 
list "
-   "for format { mime=%s width=%d 
height=%d }\n", mime, width, height);
+   "for format { mime=%s }\n", mime);
 goto done;
 }
 
@@ -232,6 +249,10 @@ done:
 (*env)->DeleteLocalRef(env, codec);
 }
 
+if (key) {
+(*env)->DeleteLocalRef(env, key);
+}
+
 if (tmp) {
 (*env)->DeleteLocalRef(env, tmp);
 }
diff --git a/libavcodec/mediacodec_wrapper.h b/libavcodec/mediacodec_wrapper.h
index a804b61..36cd258 100644
--- a/libavcodec/mediacodec_wrapper.h
+++ b/libavcodec/mediacodec_wrapper.h
@@ -52,7 +52,7 @@
  *
  */
 
-char *ff_AMediaCodecList_getCodecNameByType(const char *mime, int width, int 
height, void *log_ctx);
+char *ff_AMediaCodecList_getCodecNameByType(const char *mime, void *log_ctx);
 
 struct FFAMediaFormat;
 typedef struct FFAMediaFormat FFAMediaFormat;
diff --git a/li

[FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: do not delete a local reference twice in case of error

2016-05-31 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodec_wrapper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
index 053c164..c847a11 100644
--- a/libavcodec/mediacodec_wrapper.c
+++ b/libavcodec/mediacodec_wrapper.c
@@ -122,6 +122,7 @@ char *ff_AMediaCodecList_getCodecNameByType(const char 
*mime, int width, int hei
 goto done;
 }
 (*env)->DeleteLocalRef(env, tmp);
+tmp = NULL;
 
 codec = (*env)->NewObject(env, jfields.mediacodec_list_class, 
jfields.init_id, 0);
 if (ff_jni_exception_check(env, 1, log_ctx) < 0) {
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API

2016-05-31 Thread Matthieu Bouron
On Sun, May 29, 2016 at 10:15:44AM +0200, Matthieu Bouron wrote:
> On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> >  libavcodec/mediacodecdec_h264.c | 61 
> > +
> >  1 file changed, 37 insertions(+), 24 deletions(-)
> > 
> > diff --git a/libavcodec/mediacodecdec_h264.c 
> > b/libavcodec/mediacodecdec_h264.c
> > index 2d1d525..7f764e9 100644
> > --- a/libavcodec/mediacodecdec_h264.c
> > +++ b/libavcodec/mediacodecdec_h264.c
> > @@ -23,6 +23,7 @@
> >  #include 
> >  #include 
> >  
> > +#include "libavutil/avassert.h"
> >  #include "libavutil/common.h"
> >  #include "libavutil/fifo.h"
> >  #include "libavutil/opt.h"
> > @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext {
> >  
> >  MediaCodecDecContext ctx;
> >  
> > -AVBitStreamFilterContext *bsf;
> > +AVBSFContext *bsf;
> >  
> >  AVFifoBuffer *fifo;
> >  
> > -AVPacket input_ref;
> >  AVPacket filtered_pkt;
> > -uint8_t *filtered_data;
> >  
> >  } MediaCodecH264DecContext;
> >  
> > @@ -156,8 +155,9 @@ static av_cold int 
> > mediacodec_decode_close(AVCodecContext *avctx)
> >  ff_mediacodec_dec_close(avctx, &s->ctx);
> >  
> >  av_fifo_free(s->fifo);
> > +av_bsf_free(&s->bsf);
> >  
> > -av_bitstream_filter_close(s->bsf);
> > +av_packet_unref(&s->filtered_pkt);
> >  
> >  return 0;
> >  }
> > @@ -211,12 +211,23 @@ static av_cold int 
> > mediacodec_decode_init(AVCodecContext *avctx)
> >  goto done;
> >  }
> >  
> > -s->bsf = av_bitstream_filter_init("h264_mp4toannexb");
> > -if (!s->bsf) {
> > -ret = AVERROR(ENOMEM);
> > +const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb");
> > +if(!bsf) {
> > +ret = AVERROR_BSF_NOT_FOUND;
> >  goto done;
> >  }
> >  
> > +if ((ret = av_bsf_alloc(bsf, &s->bsf))) {
> > +goto done;
> > +}
> > +
> > +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < 
> > 0) ||
> > +((ret = av_bsf_init(s->bsf)) < 0)) {
> > +  goto done;
> > +}
> > +
> > +av_init_packet(&s->filtered_pkt);
> > +
> >  done:
> >  if (format) {
> >  ff_AMediaFormat_delete(format);
> > @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext 
> > *avctx, void *data,
> >  while (!*got_frame) {
> >  /* prepare the input data -- convert to Annex B if needed */
> >  if (s->filtered_pkt.size <= 0) {
> > -int size;
> > +AVPacket input_pkt = { 0 };
> > +
> > +av_packet_unref(&s->filtered_pkt);
> >  
> >  /* no more data */
> >  if (av_fifo_size(s->fifo) < sizeof(AVPacket)) {
> > @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext 
> > *avctx, void *data,
> >  ff_mediacodec_dec_decode(avctx, &s->ctx, frame, 
> > got_frame, avpkt);
> >  }
> >  
> > -if (s->filtered_data != s->input_ref.data)
> > -av_freep(&s->filtered_data);
> > -s->filtered_data = NULL;
> > -av_packet_unref(&s->input_ref);
> > +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), 
> > NULL);
> > +
> > +ret = av_bsf_send_packet(s->bsf, &input_pkt);
> > +if (ret < 0) {
> > +return ret;
> > +}
> > +
> > +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt);
> > +if (ret == AVERROR(EAGAIN)) {
> > +goto done;
> > +}
> > +
> > +/* h264_mp4toannexb is used here and does not require flushing 
> > */
> > +av_assert0(ret != AVERROR_EOF);
> >  
> > -av_fifo_generic_read(s->fifo, &s->input_ref, 
> > sizeof(s->input_ref), NULL);
> > -ret = av_bitstream_filter_filter(s->bsf, avctx, NULL,
> > - &s->filtered_data, &size,
> > - s

Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API

2016-05-29 Thread Matthieu Bouron
On Fri, May 27, 2016 at 10:13:20AM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> ---
>  libavcodec/mediacodecdec_h264.c | 61 
> +
>  1 file changed, 37 insertions(+), 24 deletions(-)
> 
> diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
> index 2d1d525..7f764e9 100644
> --- a/libavcodec/mediacodecdec_h264.c
> +++ b/libavcodec/mediacodecdec_h264.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  
> +#include "libavutil/avassert.h"
>  #include "libavutil/common.h"
>  #include "libavutil/fifo.h"
>  #include "libavutil/opt.h"
> @@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext {
>  
>  MediaCodecDecContext ctx;
>  
> -AVBitStreamFilterContext *bsf;
> +AVBSFContext *bsf;
>  
>  AVFifoBuffer *fifo;
>  
> -AVPacket input_ref;
>  AVPacket filtered_pkt;
> -uint8_t *filtered_data;
>  
>  } MediaCodecH264DecContext;
>  
> @@ -156,8 +155,9 @@ static av_cold int mediacodec_decode_close(AVCodecContext 
> *avctx)
>  ff_mediacodec_dec_close(avctx, &s->ctx);
>  
>  av_fifo_free(s->fifo);
> +av_bsf_free(&s->bsf);
>  
> -av_bitstream_filter_close(s->bsf);
> +av_packet_unref(&s->filtered_pkt);
>  
>  return 0;
>  }
> @@ -211,12 +211,23 @@ static av_cold int 
> mediacodec_decode_init(AVCodecContext *avctx)
>  goto done;
>  }
>  
> -s->bsf = av_bitstream_filter_init("h264_mp4toannexb");
> -if (!s->bsf) {
> -ret = AVERROR(ENOMEM);
> +const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb");
> +if(!bsf) {
> +ret = AVERROR_BSF_NOT_FOUND;
>  goto done;
>  }
>  
> +if ((ret = av_bsf_alloc(bsf, &s->bsf))) {
> +goto done;
> +}
> +
> +if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < 0) 
> ||
> +((ret = av_bsf_init(s->bsf)) < 0)) {
> +  goto done;
> +}
> +
> +av_init_packet(&s->filtered_pkt);
> +
>  done:
>  if (format) {
>  ff_AMediaFormat_delete(format);
> @@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
> void *data,
>  while (!*got_frame) {
>  /* prepare the input data -- convert to Annex B if needed */
>  if (s->filtered_pkt.size <= 0) {
> -int size;
> +AVPacket input_pkt = { 0 };
> +
> +av_packet_unref(&s->filtered_pkt);
>  
>  /* no more data */
>  if (av_fifo_size(s->fifo) < sizeof(AVPacket)) {
> @@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext 
> *avctx, void *data,
>  ff_mediacodec_dec_decode(avctx, &s->ctx, frame, 
> got_frame, avpkt);
>  }
>  
> -if (s->filtered_data != s->input_ref.data)
> -av_freep(&s->filtered_data);
> -s->filtered_data = NULL;
> -av_packet_unref(&s->input_ref);
> +av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), 
> NULL);
> +
> +ret = av_bsf_send_packet(s->bsf, &input_pkt);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt);
> +if (ret == AVERROR(EAGAIN)) {
> +goto done;
> +}
> +
> +/* h264_mp4toannexb is used here and does not require flushing */
> +av_assert0(ret != AVERROR_EOF);
>  
> -av_fifo_generic_read(s->fifo, &s->input_ref, 
> sizeof(s->input_ref), NULL);
> -ret = av_bitstream_filter_filter(s->bsf, avctx, NULL,
> - &s->filtered_data, &size,
> - s->input_ref.data, 
> s->input_ref.size, 0);
>  if (ret < 0) {
> -s->filtered_data = s->input_ref.data;
> -size = s->input_ref.size;
> +return ret;
>  }
> -s->filtered_pkt  = s->input_ref;
> -s->filtered_pkt.data = s->filtered_data;
> -s->filtered_pkt.size = size;
>  }
>  
>  ret = mediacodec_process_data(avctx, frame, got_frame, 
> &s->filtered_pkt);
> @@ -298,7 +313,7 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
> void *data,
>  

[FFmpeg-devel] [PATCH 1/2] lavc/mediacodecdec_h264: switch to new BSF API

2016-05-27 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodecdec_h264.c | 61 +
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 2d1d525..7f764e9 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include "libavutil/avassert.h"
 #include "libavutil/common.h"
 #include "libavutil/fifo.h"
 #include "libavutil/opt.h"
@@ -41,13 +42,11 @@ typedef struct MediaCodecH264DecContext {
 
 MediaCodecDecContext ctx;
 
-AVBitStreamFilterContext *bsf;
+AVBSFContext *bsf;
 
 AVFifoBuffer *fifo;
 
-AVPacket input_ref;
 AVPacket filtered_pkt;
-uint8_t *filtered_data;
 
 } MediaCodecH264DecContext;
 
@@ -156,8 +155,9 @@ static av_cold int mediacodec_decode_close(AVCodecContext 
*avctx)
 ff_mediacodec_dec_close(avctx, &s->ctx);
 
 av_fifo_free(s->fifo);
+av_bsf_free(&s->bsf);
 
-av_bitstream_filter_close(s->bsf);
+av_packet_unref(&s->filtered_pkt);
 
 return 0;
 }
@@ -211,12 +211,23 @@ static av_cold int mediacodec_decode_init(AVCodecContext 
*avctx)
 goto done;
 }
 
-s->bsf = av_bitstream_filter_init("h264_mp4toannexb");
-if (!s->bsf) {
-ret = AVERROR(ENOMEM);
+const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb");
+if(!bsf) {
+ret = AVERROR_BSF_NOT_FOUND;
 goto done;
 }
 
+if ((ret = av_bsf_alloc(bsf, &s->bsf))) {
+goto done;
+}
+
+if (((ret = avcodec_parameters_from_context(s->bsf->par_in, avctx)) < 0) ||
+((ret = av_bsf_init(s->bsf)) < 0)) {
+  goto done;
+}
+
+av_init_packet(&s->filtered_pkt);
+
 done:
 if (format) {
 ff_AMediaFormat_delete(format);
@@ -265,7 +276,9 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
void *data,
 while (!*got_frame) {
 /* prepare the input data -- convert to Annex B if needed */
 if (s->filtered_pkt.size <= 0) {
-int size;
+AVPacket input_pkt = { 0 };
+
+av_packet_unref(&s->filtered_pkt);
 
 /* no more data */
 if (av_fifo_size(s->fifo) < sizeof(AVPacket)) {
@@ -273,22 +286,24 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
void *data,
 ff_mediacodec_dec_decode(avctx, &s->ctx, frame, got_frame, 
avpkt);
 }
 
-if (s->filtered_data != s->input_ref.data)
-av_freep(&s->filtered_data);
-s->filtered_data = NULL;
-av_packet_unref(&s->input_ref);
+av_fifo_generic_read(s->fifo, &input_pkt, sizeof(input_pkt), NULL);
+
+ret = av_bsf_send_packet(s->bsf, &input_pkt);
+if (ret < 0) {
+return ret;
+}
+
+ret = av_bsf_receive_packet(s->bsf, &s->filtered_pkt);
+if (ret == AVERROR(EAGAIN)) {
+goto done;
+}
+
+/* h264_mp4toannexb is used here and does not require flushing */
+av_assert0(ret != AVERROR_EOF);
 
-av_fifo_generic_read(s->fifo, &s->input_ref, sizeof(s->input_ref), 
NULL);
-ret = av_bitstream_filter_filter(s->bsf, avctx, NULL,
- &s->filtered_data, &size,
- s->input_ref.data, 
s->input_ref.size, 0);
 if (ret < 0) {
-s->filtered_data = s->input_ref.data;
-size = s->input_ref.size;
+return ret;
 }
-s->filtered_pkt  = s->input_ref;
-s->filtered_pkt.data = s->filtered_data;
-s->filtered_pkt.size = size;
 }
 
 ret = mediacodec_process_data(avctx, frame, got_frame, 
&s->filtered_pkt);
@@ -298,7 +313,7 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
void *data,
 s->filtered_pkt.size -= ret;
 s->filtered_pkt.data += ret;
 }
-
+done:
 return avpkt->size;
 }
 
@@ -313,8 +328,6 @@ static void mediacodec_decode_flush(AVCodecContext *avctx)
 }
 av_fifo_reset(s->fifo);
 
-av_packet_unref(&s->input_ref);
-
 av_init_packet(&s->filtered_pkt);
 s->filtered_pkt.data = NULL;
 s->filtered_pkt.size = 0;
-- 
2.8.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] lavc/mediacodecdec_h264: rename input_ref to input_pkt

2016-05-27 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodecdec_h264.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libavcodec/mediacodecdec_h264.c b/libavcodec/mediacodecdec_h264.c
index 7f764e9..3a31798 100644
--- a/libavcodec/mediacodecdec_h264.c
+++ b/libavcodec/mediacodecdec_h264.c
@@ -257,19 +257,19 @@ static int mediacodec_decode_frame(AVCodecContext *avctx, 
void *data,
 
 /* buffer the input packet */
 if (avpkt->size) {
-AVPacket input_ref = { 0 };
+AVPacket input_pkt = { 0 };
 
-if (av_fifo_space(s->fifo) < sizeof(input_ref)) {
+if (av_fifo_space(s->fifo) < sizeof(input_pkt)) {
 ret = av_fifo_realloc2(s->fifo,
-   av_fifo_size(s->fifo) + sizeof(input_ref));
+   av_fifo_size(s->fifo) + sizeof(input_pkt));
 if (ret < 0)
 return ret;
 }
 
-ret = av_packet_ref(&input_ref, avpkt);
+ret = av_packet_ref(&input_pkt, avpkt);
 if (ret < 0)
 return ret;
-av_fifo_generic_write(s->fifo, &input_ref, sizeof(input_ref), NULL);
+av_fifo_generic_write(s->fifo, &input_pkt, sizeof(input_pkt), NULL);
 }
 
 /* process buffered data */
-- 
2.8.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks

2016-05-25 Thread Matthieu Bouron
On Thu, May 19, 2016 at 11:46:22AM +0200, Matthieu Bouron wrote:
> On Tue, May 17, 2016 at 03:20:54PM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> >  libavcodec/mediacodec_wrapper.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/libavcodec/mediacodec_wrapper.c 
> > b/libavcodec/mediacodec_wrapper.c
> > index 8ce3b32..5c047ea 100644
> > --- a/libavcodec/mediacodec_wrapper.c
> > +++ b/libavcodec/mediacodec_wrapper.c
> > @@ -1056,6 +1056,10 @@ FFAMediaCodec* 
> > ff_AMediaCodec_createCodecByName(const char *name)
> >  goto fail;
> >  }
> >  
> > +if (codec->jfields.get_input_buffer_id && 
> > codec->jfields.get_output_buffer_id) {
> > +codec->has_get_i_o_buffer = 1;
> > +}
> > +
> >  JNI_DETACH_ENV(attached, codec);
> >  
> >  return codec;
> > @@ -1178,6 +1182,10 @@ FFAMediaCodec* 
> > ff_AMediaCodec_createEncoderByType(const char *mime)
> >  goto fail;
> >  }
> >  
> > +if (codec->jfields.get_input_buffer_id && 
> > codec->jfields.get_output_buffer_id) {
> > +codec->has_get_i_o_buffer = 1;
> > +}
> > +
> >  JNI_DETACH_ENV(attached, NULL);
> >  
> >  return codec;
> 
> I will push both patch in one day if there is no objection.

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization

2016-05-25 Thread Matthieu Bouron
On Tue, May 17, 2016 at 04:44:57PM +0200, Matthieu Bouron wrote:
> On Tue, May 17, 2016 at 03:20:53PM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> >  libavcodec/mediacodec_wrapper.c | 167 
> > ++--
> >  1 file changed, 57 insertions(+), 110 deletions(-)
> > 
> > diff --git a/libavcodec/mediacodec_wrapper.c 
> > b/libavcodec/mediacodec_wrapper.c
> > index 6b3f905..8ce3b32 100644
> > --- a/libavcodec/mediacodec_wrapper.c
> > +++ b/libavcodec/mediacodec_wrapper.c
> > @@ -958,83 +958,101 @@ struct FFAMediaCodec {
> >  int has_get_i_o_buffer;
> >  };
> >  
> > -FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name)
> > +static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec)
> 
> ff_ prefix removed locally as this function is not meant to be exported.

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks

2016-05-19 Thread Matthieu Bouron
On Tue, May 17, 2016 at 03:20:54PM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> ---
>  libavcodec/mediacodec_wrapper.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
> index 8ce3b32..5c047ea 100644
> --- a/libavcodec/mediacodec_wrapper.c
> +++ b/libavcodec/mediacodec_wrapper.c
> @@ -1056,6 +1056,10 @@ FFAMediaCodec* ff_AMediaCodec_createCodecByName(const 
> char *name)
>  goto fail;
>  }
>  
> +if (codec->jfields.get_input_buffer_id && 
> codec->jfields.get_output_buffer_id) {
> +codec->has_get_i_o_buffer = 1;
> +}
> +
>  JNI_DETACH_ENV(attached, codec);
>  
>  return codec;
> @@ -1178,6 +1182,10 @@ FFAMediaCodec* 
> ff_AMediaCodec_createEncoderByType(const char *mime)
>  goto fail;
>  }
>  
> +if (codec->jfields.get_input_buffer_id && 
> codec->jfields.get_output_buffer_id) {
> +codec->has_get_i_o_buffer = 1;
> +}
> +
>  JNI_DETACH_ENV(attached, NULL);
>  
>  return codec;

I will push both patch in one day if there is no objection.

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization

2016-05-17 Thread Matthieu Bouron
On Tue, May 17, 2016 at 03:20:53PM +0200, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> ---
>  libavcodec/mediacodec_wrapper.c | 167 
> ++--
>  1 file changed, 57 insertions(+), 110 deletions(-)
> 
> diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
> index 6b3f905..8ce3b32 100644
> --- a/libavcodec/mediacodec_wrapper.c
> +++ b/libavcodec/mediacodec_wrapper.c
> @@ -958,83 +958,101 @@ struct FFAMediaCodec {
>  int has_get_i_o_buffer;
>  };
>  
> -FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name)
> +static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec)

ff_ prefix removed locally as this function is not meant to be exported.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/2] lavc/mediacodec: factorize static fields initialization

2016-05-17 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodec_wrapper.c | 167 ++--
 1 file changed, 57 insertions(+), 110 deletions(-)

diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
index 6b3f905..8ce3b32 100644
--- a/libavcodec/mediacodec_wrapper.c
+++ b/libavcodec/mediacodec_wrapper.c
@@ -958,83 +958,101 @@ struct FFAMediaCodec {
 int has_get_i_o_buffer;
 };
 
-FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name)
+static int ff_AMediaCodec_init_static_fields(FFAMediaCodec *codec)
 {
+int ret = 0;
 int attached = 0;
 JNIEnv *env = NULL;
-FFAMediaCodec *codec = NULL;
-jstring codec_name = NULL;
 
-codec = av_mallocz(sizeof(FFAMediaCodec));
-if (!codec) {
-return NULL;
-}
-codec->class = &amediacodec_class;
+JNI_ATTACH_ENV_OR_RETURN(env, &attached, codec, AVERROR_EXTERNAL);
 
-env = ff_jni_attach_env(&attached, codec);
-if (!env) {
-av_freep(&codec);
-return NULL;
+codec->INFO_TRY_AGAIN_LATER = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.info_try_again_later_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
+goto fail;
 }
 
-if (ff_jni_init_jfields(env, &codec->jfields, jni_amediacodec_mapping, 1, 
codec) < 0) {
+codec->BUFFER_FLAG_CODEC_CONFIG = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_codec_config_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
-codec_name = ff_jni_utf_chars_to_jstring(env, name, codec);
-if (!codec_name) {
+codec->BUFFER_FLAG_END_OF_STREAM = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_end_of_stream_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
-codec->object = (*env)->CallStaticObjectMethod(env, 
codec->jfields.mediacodec_class, codec->jfields.create_by_codec_name_id, 
codec_name);
-if (ff_jni_exception_check(env, 1, codec) < 0) {
-goto fail;
+if (codec->jfields.buffer_flag_key_frame_id) {
+codec->BUFFER_FLAG_KEY_FRAME = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_key_frame_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
+goto fail;
+}
 }
 
-codec->object = (*env)->NewGlobalRef(env, codec->object);
-if (!codec->object) {
+codec->CONFIGURE_FLAG_ENCODE = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.configure_flag_encode_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
 codec->INFO_TRY_AGAIN_LATER = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.info_try_again_later_id);
-if (ff_jni_exception_check(env, 1, codec) < 0) {
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
-codec->BUFFER_FLAG_CODEC_CONFIG = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_codec_config_id);
-if (ff_jni_exception_check(env, 1, codec) < 0) {
+codec->INFO_OUTPUT_BUFFERS_CHANGED = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.info_output_buffers_changed_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
-codec->BUFFER_FLAG_END_OF_STREAM = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_end_of_stream_id);
-if (ff_jni_exception_check(env, 1, codec) < 0) {
+codec->INFO_OUTPUT_FORMAT_CHANGED = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.info_output_format_changed_id);
+if ((ret = ff_jni_exception_check(env, 1, codec)) < 0) {
 goto fail;
 }
 
-if (codec->jfields.buffer_flag_key_frame_id) {
-codec->BUFFER_FLAG_KEY_FRAME = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec->jfields.buffer_flag_key_frame_id);
-if (ff_jni_exception_check(env, 1, codec) < 0) {
-goto fail;
-}
+fail:
+JNI_DETACH_ENV(attached, NULL);
+
+return ret;
+}
+
+FFAMediaCodec* ff_AMediaCodec_createCodecByName(const char *name)
+{
+int attached = 0;
+JNIEnv *env = NULL;
+FFAMediaCodec *codec = NULL;
+jstring codec_name = NULL;
+
+codec = av_mallocz(sizeof(FFAMediaCodec));
+if (!codec) {
+return NULL;
 }
+codec->class = &amediacodec_class;
 
-codec->CONFIGURE_FLAG_ENCODE = (*env)->GetStaticIntField(env, 
codec->jfields.mediacodec_class, codec

[FFmpeg-devel] [PATCH 2/2] lavc/mediacodec: add missing MediaCodec.Get{Input, Output}Buffer() checks

2016-05-17 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavcodec/mediacodec_wrapper.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/libavcodec/mediacodec_wrapper.c b/libavcodec/mediacodec_wrapper.c
index 8ce3b32..5c047ea 100644
--- a/libavcodec/mediacodec_wrapper.c
+++ b/libavcodec/mediacodec_wrapper.c
@@ -1056,6 +1056,10 @@ FFAMediaCodec* ff_AMediaCodec_createCodecByName(const 
char *name)
 goto fail;
 }
 
+if (codec->jfields.get_input_buffer_id && 
codec->jfields.get_output_buffer_id) {
+codec->has_get_i_o_buffer = 1;
+}
+
 JNI_DETACH_ENV(attached, codec);
 
 return codec;
@@ -1178,6 +1182,10 @@ FFAMediaCodec* ff_AMediaCodec_createEncoderByType(const 
char *mime)
 goto fail;
 }
 
+if (codec->jfields.get_input_buffer_id && 
codec->jfields.get_output_buffer_id) {
+codec->has_get_i_o_buffer = 1;
+}
+
 JNI_DETACH_ENV(attached, NULL);
 
 return codec;
-- 
2.8.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

2016-05-13 Thread Matthieu Bouron
On Thu, May 12, 2016 at 3:50 PM, Benoit Fouet  wrote:

> Hi,
>
>
> On 12/05/2016 15:22, Matthieu Bouron wrote:
>
>> On Thu, May 12, 2016 at 10:01 AM, Benoit Fouet 
>> wrote:
>>
>> Hi,
>>>
>>> I mostly have nits remarks.
>>>
>>> On 11/05/2016 18:39, Matthieu Bouron wrote:
>>>
>>> From: Matthieu Bouron 
>>>>
>>>>
>>>> [...]
>>>
>>> diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S
>>>
>>>> new file mode 100644
>>>> index 000..13462e3
>>>> --- /dev/null
>>>> +++ b/libswresample/arm/resample.S
>>>> @@ -0,0 +1,77 @@
>>>>
>>>> [...]
>>>>
>>>> +function ff_resample_common_apply_filter_x4_float_neon, export=1
>>>> +vmov.f32q0, #0.0
>>>>  @
>>>> accumulator
>>>> +1:  vld1.32 {q1}, [r1]!
>>>> @
>>>> src
>>>> +vld1.32 {q2}, [r2]!
>>>> @
>>>> filter
>>>> +vmla.f32q0, q1, q2
>>>>  @
>>>> src + {0..3} * filter + {0..3}
>>>>
>>>> nit: the comment could be "accu += src[0..3] . filter[0..3]"
>>> same for the other ones below
>>>
>>> [...]
>>>
>>> +subsr3, #4 @
>>>
>>>> filter_length -= 4
>>>> +bgt 1b
>>>>  @
>>>> loop until filter_length
>>>> +vpadd.f32   d0, d0, d1
>>>>  @
>>>> pair adding of the 4x32-bit accumulated values
>>>> +vpadd.f32   d0, d0, d0
>>>>  @
>>>> pair adding of the 4x32-bit accumulator values
>>>> +vst1.32 {d0[0]}, [r0]
>>>> @
>>>> write accumulator
>>>> +mov pc, lr
>>>> +endfunc
>>>> +
>>>> +function ff_resample_common_apply_filter_x8_float_neon, export=1
>>>> +vmov.f32q0, #0.0
>>>>  @
>>>> accumulator
>>>> +1:  vld1.32 {q1}, [r1]!
>>>> @
>>>> src1
>>>> +vld1.32 {q2}, [r2]!
>>>> @
>>>> filter1
>>>> +vld1.32 {q8}, [r1]!
>>>> @
>>>> src2
>>>> +vld1.32 {q9}, [r2]!
>>>> @
>>>> filter2
>>>> +vmla.f32q0, q1, q2
>>>>  @
>>>> src1 + {0..3} * filter1 + {0..3}
>>>> +vmla.f32q0, q8, q9
>>>>  @
>>>> src2 + {0..3} * filter2 + {0..3}
>>>>
>>>> instead of using src1 and src2, you may want to use src[0..3] and
>>> src[4..7]
>>> so, if I reuse the formulation I proposed above:
>>> accu += src[0..3] . filter[0..3]
>>> accu += src[4..7] . filter[4..7]
>>>
>>> Fixed locally (as well as the other case you mentionned) with:
>> -vmla.f32q0, q1, q2 @
>> src1 + {0..3} * filter1 + {0..3}
>> -vmla.f32q0, q8, q9 @
>> src2 + {0..3} * filter2 + {0..3}
>> +vmla.f32q0, q1, q2 @
>> accumulator += src1 + {0..3} * filter1 + {0..3}
>> +vmla.f32q0, q8, q9 @
>> accumulator += src2 + {4..7} * filter2 + {4..7}
>>
>> I prefer to use + {0..3} instead of [0..3] to make the comments consistent
>> with what has been done in swscale/arm.
>>
>>
> Fine for me (I chose the "[]" notation to be consistent with the "."
> notation also, in order to do as if it were a dot product between two
> vectors).
>
>
> +subsr3, #8 @
>>>
>>>> filter_length -= 4
>>>>
>>>> -= 8
>>>
>>> Fixed locally.
>>
>>
>> [...]
>>>
>>> diff --git a/libswresample/arm/resample_init.c
>>>
>>>> b/libswresample/arm/resample_init.c
>>>> new file mode 100644
>>>> index 000..c817d03
>>>> --- /dev/null
>>>> +++ b/libswresample/arm/resample_init.c
>>>>
>>>> [...]
>>>>
>>>> +static int ff

Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

2016-05-12 Thread Matthieu Bouron
On Thu, May 12, 2016 at 10:01 AM, Benoit Fouet  wrote:

> Hi,
>
> I mostly have nits remarks.
>
> On 11/05/2016 18:39, Matthieu Bouron wrote:
>
>> From: Matthieu Bouron 
>>
>>
> [...]
>
> diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S
>> new file mode 100644
>> index 000..13462e3
>> --- /dev/null
>> +++ b/libswresample/arm/resample.S
>> @@ -0,0 +1,77 @@
>>
>> [...]
>>
>> +function ff_resample_common_apply_filter_x4_float_neon, export=1
>> +vmov.f32q0, #0.0   @
>> accumulator
>> +1:  vld1.32 {q1}, [r1]!@
>> src
>> +vld1.32 {q2}, [r2]!@
>> filter
>> +vmla.f32q0, q1, q2 @
>> src + {0..3} * filter + {0..3}
>>
>
> nit: the comment could be "accu += src[0..3] . filter[0..3]"
> same for the other ones below
>
> [...]
>
> +subsr3, #4 @
>> filter_length -= 4
>> +bgt 1b @
>> loop until filter_length
>> +vpadd.f32   d0, d0, d1 @
>> pair adding of the 4x32-bit accumulated values
>> +vpadd.f32   d0, d0, d0 @
>> pair adding of the 4x32-bit accumulator values
>> +vst1.32 {d0[0]}, [r0]  @
>> write accumulator
>> +mov pc, lr
>> +endfunc
>> +
>> +function ff_resample_common_apply_filter_x8_float_neon, export=1
>> +vmov.f32q0, #0.0   @
>> accumulator
>> +1:  vld1.32 {q1}, [r1]!@
>> src1
>> +vld1.32 {q2}, [r2]!@
>> filter1
>> +vld1.32 {q8}, [r1]!@
>> src2
>> +vld1.32 {q9}, [r2]!@
>> filter2
>> +vmla.f32q0, q1, q2 @
>> src1 + {0..3} * filter1 + {0..3}
>> +vmla.f32q0, q8, q9 @
>> src2 + {0..3} * filter2 + {0..3}
>>
>
> instead of using src1 and src2, you may want to use src[0..3] and src[4..7]
> so, if I reuse the formulation I proposed above:
> accu += src[0..3] . filter[0..3]
> accu += src[4..7] . filter[4..7]
>

Fixed locally (as well as the other case you mentionned) with:
-vmla.f32q0, q1, q2 @
src1 + {0..3} * filter1 + {0..3}
-vmla.f32q0, q8, q9 @
src2 + {0..3} * filter2 + {0..3}
+vmla.f32q0, q1, q2 @
accumulator += src1 + {0..3} * filter1 + {0..3}
+vmla.f32q0, q8, q9 @
accumulator += src2 + {4..7} * filter2 + {4..7}

I prefer to use + {0..3} instead of [0..3] to make the comments consistent
with what has been done in swscale/arm.


>
> +subsr3, #8 @
>> filter_length -= 4
>>
>
> -= 8
>

Fixed locally.


>
> [...]
>
> diff --git a/libswresample/arm/resample_init.c
>> b/libswresample/arm/resample_init.c
>> new file mode 100644
>> index 000..c817d03
>> --- /dev/null
>> +++ b/libswresample/arm/resample_init.c
>>
>> [...]
>>
>> +static int ff_resample_common_##TYPE##_neon(ResampleContext *c, void
>> *dest, const void *source,   \
>> +int n, int update_ctx)
>>   \
>> +{
>>  \
>> +DELEM *dst = dest;
>>   \
>> +const DELEM *src = source;
>>   \
>> +int dst_index;
>>   \
>> +int index= c->index;
>>   \
>> +int frac= c->frac;
>>   \
>> +int sample_index = index >> c->phase_shift;
>>  \
>> +int x4_aligned_filter_length = c->filter_length & ~3;
>>  \
>> +int x8_aligned_filter_length = c->filter_length & ~7;
>>  \
>> +
>>   \
>> +index &= c-&g

Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

2016-05-11 Thread Matthieu Bouron
Le 11 mai 2016 6:39 PM, "Matthieu Bouron"  a
écrit :
>
> From: Matthieu Bouron 
>
> ---
>
> Hello,
>
> Here are some benchmark on a rpi2 of the attached patch.
>
> ./ffmpeg -f lavfi -i
sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop
-t 1000 -f null -
>
> With patch:avg=0.001159 speed=44,1x
> Without patch: avg=0.001297 speed=40,8x
>
> ./ffmpeg -f lavfi -i
sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop
-t 1000 -f null -
>
> With patch:avg=0.001374 speed=45,6x
> Without patch: avg=0.000782 speed=64,6x

Without patch: avg=0.001374 speed=45,6x
With patch: avg=0.000782 speed=64,6x

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

2016-05-11 Thread Matthieu Bouron
On Wed, May 11, 2016 at 9:04 PM, Reimar Döffinger 
wrote:

>
>
> On 11.05.2016, at 20:37, Michael Niedermayer 
> wrote:
>
> > On Wed, May 11, 2016 at 06:39:20PM +0200, Matthieu Bouron wrote:
> >> From: Matthieu Bouron 
> >>
> >> ---
> >>
> >> Hello,
> >>
> >> Here are some benchmark on a rpi2 of the attached patch.
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >> With patch:avg=0.001159 speed=44,1x
> >> Without patch: avg=0.001297 speed=40,8x
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >
> >> With patch:avg=0.001374 speed=45,6x
> >> Without patch: avg=0.000782 speed=64,6x
> >
> > so its slower ? or am i misreading this ?
>

>
> Yes, that seems weird.
> Also, what are common filter lengths?
>

Sorry I inverted the two results, the neon version is actually faster:

With*out* patch:avg=0.001374 speed=45,6x
With patch: avg=0.000782 speed=64,6x



> Because for a length of 4 or 8 or 16 I'd think this would be much better
> fully unrolled.
> And for longer ones at least partially unrolled.
>


The common filter length seems to be 32 but it might depends.
Regarding the little performance gain on the float version it seems to be
due to the switch between vfp instructions versus neon instructions (i'm
not 100% sure).

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

2016-05-11 Thread Matthieu Bouron
From: Matthieu Bouron 

---

Hello,

Here are some benchmark on a rpi2 of the attached patch.

./ffmpeg -f lavfi -i 
sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop
 -t 1000 -f null -

With patch:avg=0.001159 speed=44,1x
Without patch: avg=0.001297 speed=40,8x

./ffmpeg -f lavfi -i 
sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop
 -t 1000 -f null -

With patch:avg=0.001374 speed=45,6x
Without patch: avg=0.000782 speed=64,6x

Matthieu

---
 libswresample/arm/Makefile|   7 ++-
 libswresample/arm/resample.S  |  77 
 libswresample/arm/resample_init.c | 120 ++
 libswresample/resample.h  |   1 +
 libswresample/resample_dsp.c  |   1 +
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 libswresample/arm/resample.S
 create mode 100644 libswresample/arm/resample_init.c

diff --git a/libswresample/arm/Makefile b/libswresample/arm/Makefile
index 60f3f6d..53ab462 100644
--- a/libswresample/arm/Makefile
+++ b/libswresample/arm/Makefile
@@ -1,5 +1,8 @@
-OBJS  += arm/audio_convert_init.o
+OBJS  += arm/audio_convert_init.o \
+ arm/resample_init.o
+
 
 OBJS-$(CONFIG_NEON_CLOBBER_TEST) += arm/neontest.o
 
-NEON-OBJS += arm/audio_convert_neon.o
+NEON-OBJS += arm/audio_convert_neon.o \
+ arm/resample.o
diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S
new file mode 100644
index 000..13462e3
--- /dev/null
+++ b/libswresample/arm/resample.S
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2016 Matthieu Bouron 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/arm/asm.S"
+
+function ff_resample_common_apply_filter_x4_float_neon, export=1
+vmov.f32q0, #0.0   @ 
accumulator
+1:  vld1.32 {q1}, [r1]!@ src
+vld1.32 {q2}, [r2]!@ filter
+vmla.f32q0, q1, q2 @ src + 
{0..3} * filter + {0..3}
+subsr3, #4 @ 
filter_length -= 4
+bgt 1b @ loop 
until filter_length
+vpadd.f32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
+vpadd.f32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
+vst1.32 {d0[0]}, [r0]  @ write 
accumulator
+mov pc, lr
+endfunc
+
+function ff_resample_common_apply_filter_x8_float_neon, export=1
+vmov.f32q0, #0.0   @ 
accumulator
+1:  vld1.32 {q1}, [r1]!@ src1
+vld1.32 {q2}, [r2]!@ 
filter1
+vld1.32 {q8}, [r1]!@ src2
+vld1.32 {q9}, [r2]!@ 
filter2
+vmla.f32q0, q1, q2 @ src1 
+ {0..3} * filter1 + {0..3}
+vmla.f32q0, q8, q9 @ src2 
+ {0..3} * filter2 + {0..3}
+subsr3, #8 @ 
filter_length -= 4
+bgt 1b @ loop 
until filter_length
+vpadd.f32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
+vpadd.f32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
+vst1.32 {d0[0]}, [r0]  @ write 
accumulator
+mov pc, lr
+endfunc
+
+function ff_resample_common_apply_filter_x4_s16_neon, export=1
+vmov.s32q0, #0 @ 
accumulator
+1:  vld1.16 {d2}, [r1]!@ src
+vld1.16

[FFmpeg-devel] [PATCH 1/3] lavfi/framepool: rename FFVideoFramePool to FFFramePool

2016-05-10 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavfilter/avfilter.c  |  2 +-
 libavfilter/avfilter.h  |  4 ++--
 libavfilter/framepool.c | 24 
 libavfilter/framepool.h | 32 
 libavfilter/video.c | 20 ++--
 5 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c
index 21f8d9e..2128f69 100644
--- a/libavfilter/avfilter.c
+++ b/libavfilter/avfilter.c
@@ -170,7 +170,7 @@ void avfilter_link_free(AVFilterLink **link)
 return;
 
 av_frame_free(&(*link)->partial_buf);
-ff_video_frame_pool_uninit((FFVideoFramePool**)&(*link)->video_frame_pool);
+ff_frame_pool_uninit((FFFramePool**)&(*link)->frame_pool);
 
 av_freep(link);
 }
diff --git a/libavfilter/avfilter.h b/libavfilter/avfilter.h
index 79227a7..b8585a9 100644
--- a/libavfilter/avfilter.h
+++ b/libavfilter/avfilter.h
@@ -533,9 +533,9 @@ struct AVFilterLink {
 int64_t frame_count;
 
 /**
- * A pointer to a FFVideoFramePool struct.
+ * A pointer to a FFFramePool struct.
  */
-void *video_frame_pool;
+void *frame_pool;
 
 /**
  * True if a frame is currently wanted on the input of this filter.
diff --git a/libavfilter/framepool.c b/libavfilter/framepool.c
index 6df574e..36c6e8f 100644
--- a/libavfilter/framepool.c
+++ b/libavfilter/framepool.c
@@ -26,7 +26,7 @@
 #include "libavutil/mem.h"
 #include "libavutil/pixfmt.h"
 
-struct FFVideoFramePool {
+struct FFFramePool {
 
 int width;
 int height;
@@ -37,20 +37,20 @@ struct FFVideoFramePool {
 
 };
 
-FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* (*alloc)(int size),
-   int width,
-   int height,
-   enum AVPixelFormat format,
-   int align)
+FFFramePool *ff_frame_pool_video_init(AVBufferRef* (*alloc)(int size),
+  int width,
+  int height,
+  enum AVPixelFormat format,
+  int align)
 {
 int i, ret;
-FFVideoFramePool *pool;
+FFFramePool *pool;
 const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(format);
 
 if (!desc)
 return NULL;
 
-pool = av_mallocz(sizeof(FFVideoFramePool));
+pool = av_mallocz(sizeof(FFFramePool));
 if (!pool)
 return NULL;
 
@@ -100,11 +100,11 @@ FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* 
(*alloc)(int size),
 return pool;
 
 fail:
-ff_video_frame_pool_uninit(&pool);
+ff_frame_pool_uninit(&pool);
 return NULL;
 }
 
-int ff_video_frame_pool_get_config(FFVideoFramePool *pool,
+int ff_frame_pool_get_video_config(FFFramePool *pool,
int *width,
int *height,
enum AVPixelFormat *format,
@@ -122,7 +122,7 @@ int ff_video_frame_pool_get_config(FFVideoFramePool *pool,
 }
 
 
-AVFrame *ff_video_frame_pool_get(FFVideoFramePool *pool)
+AVFrame *ff_frame_pool_get(FFFramePool *pool)
 {
 int i;
 AVFrame *frame;
@@ -174,7 +174,7 @@ fail:
 return NULL;
 }
 
-void ff_video_frame_pool_uninit(FFVideoFramePool **pool)
+void ff_frame_pool_uninit(FFFramePool **pool)
 {
 int i;
 
diff --git a/libavfilter/framepool.h b/libavfilter/framepool.h
index 2a6c9e8..4824824 100644
--- a/libavfilter/framepool.h
+++ b/libavfilter/framepool.h
@@ -25,11 +25,11 @@
 #include "libavutil/frame.h"
 
 /**
- * Video frame pool. This structure is opaque and not meant to be accessed
- * directly. It is allocated with ff_video_frame_pool_init() and freed with
- * ff_video_frame_pool_uninit().
+ * Frame pool. This structure is opaque and not meant to be accessed
+ * directly. It is allocated with ff_frame_pool_init() and freed with
+ * ff_frame_pool_uninit().
  */
-typedef struct FFVideoFramePool FFVideoFramePool;
+typedef struct FFFramePool FFFramePool;
 
 /**
  * Allocate and initialize a video frame pool.
@@ -41,21 +41,21 @@ typedef struct FFVideoFramePool FFVideoFramePool;
  * @param height height of each frame in this pool
  * @param format format of each frame in this pool
  * @param align buffers alignement of each frame in this pool
- * @return newly created video frame pool on success, NULL on error.
+ * @return newly created frame pool on success, NULL on error.
  */
-FFVideoFramePool *ff_video_frame_pool_init(AVBufferRef* (*alloc)(int size),
-   int width,
-   int height,
-   enum AVPixelFormat format,
-   int align);
+FFFramePool *ff_frame_pool_video_init(AVBufferRef* (*alloc)(int size),
+  i

[FFmpeg-devel] [PATCH 2/3] lavfi/framepool: add audio support

2016-05-10 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavfilter/framepool.c | 106 
 libavfilter/framepool.h |  36 +++-
 2 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/libavfilter/framepool.c b/libavfilter/framepool.c
index 36c6e8f..4a63fe9 100644
--- a/libavfilter/framepool.c
+++ b/libavfilter/framepool.c
@@ -20,6 +20,7 @@
 
 #include "framepool.h"
 #include "libavutil/avassert.h"
+#include "libavutil/avutil.h"
 #include "libavutil/buffer.h"
 #include "libavutil/frame.h"
 #include "libavutil/imgutils.h"
@@ -28,8 +29,15 @@
 
 struct FFFramePool {
 
+int type;
+
 int width;
 int height;
+
+int planes;
+int channels;
+int nb_samples;
+
 int format;
 int align;
 int linesize[4];
@@ -54,6 +62,7 @@ FFFramePool *ff_frame_pool_video_init(AVBufferRef* 
(*alloc)(int size),
 if (!pool)
 return NULL;
 
+pool->type = AVMEDIA_TYPE_VIDEO;
 pool->width = width;
 pool->height = height;
 pool->format = format;
@@ -104,6 +113,49 @@ fail:
 return NULL;
 }
 
+FFFramePool *ff_frame_pool_audio_init(AVBufferRef* (*alloc)(int size),
+  int channels,
+  int nb_samples,
+  enum AVSampleFormat format,
+  int align)
+{
+int ret, planar;
+FFFramePool *pool;
+
+pool = av_mallocz(sizeof(FFFramePool));
+if (!pool)
+return NULL;
+
+planar = av_sample_fmt_is_planar(format);
+
+pool->type = AVMEDIA_TYPE_AUDIO;
+pool->planes = planar ? channels : 1;
+pool->channels = channels;
+pool->nb_samples = nb_samples;
+pool->format = format;
+pool->align = align;
+
+ret = av_samples_get_buffer_size(&pool->linesize[0], channels,
+ nb_samples, format, 0);
+if (ret < 0) {
+goto fail;
+}
+
+pool->pools[0] = av_buffer_pool_init(pool->linesize[0], NULL);
+if (!pool->pools[0]) {
+ret = AVERROR(ENOMEM);
+goto fail;
+}
+
+return pool;
+
+fail:
+ff_frame_pool_uninit(&pool);
+return NULL;
+}
+
+
+
 int ff_frame_pool_get_video_config(FFFramePool *pool,
int *width,
int *height,
@@ -121,6 +173,22 @@ int ff_frame_pool_get_video_config(FFFramePool *pool,
 return 0;
 }
 
+int ff_frame_pool_get_audio_config(FFFramePool *pool,
+   int *channels,
+   int *nb_samples,
+   enum AVSampleFormat *format,
+   int *align)
+{
+if (!pool)
+return AVERROR(EINVAL);
+
+*channels = pool->channels;
+*nb_samples = pool->nb_samples;
+*format = pool->format;
+*align = pool->align;
+
+return 0;
+}
 
 AVFrame *ff_frame_pool_get(FFFramePool *pool)
 {
@@ -133,6 +201,8 @@ AVFrame *ff_frame_pool_get(FFFramePool *pool)
 return NULL;
 }
 
+if (pool->type == AVMEDIA_TYPE_VIDEO) {
+
 desc = av_pix_fmt_desc_get(pool->format);
 if (!desc) {
 goto fail;
@@ -167,6 +237,42 @@ AVFrame *ff_frame_pool_get(FFFramePool *pool)
 }
 
 frame->extended_data = frame->data;
+} else if (pool->type == AVMEDIA_TYPE_AUDIO) {
+frame->nb_samples = pool->nb_samples;
+av_frame_set_channels(frame, pool->channels);
+frame->format = pool->format;
+frame->linesize[0] = pool->linesize[0];
+
+if (pool->planes > AV_NUM_DATA_POINTERS) {
+frame->extended_data = av_mallocz_array(pool->planes,
+
sizeof(*frame->extended_data));
+frame->nb_extended_buf = pool->planes - AV_NUM_DATA_POINTERS;
+frame->extended_buf = av_mallocz_array(frame->nb_extended_buf,
+   
sizeof(*frame->extended_buf));
+if (!frame->extended_data || !frame->extended_buf) {
+goto fail;
+}
+} else {
+frame->extended_data = frame->data;
+av_assert0(frame->nb_extended_buf == 0);
+}
+
+for (i = 0; i < FFMIN(pool->planes, AV_NUM_DATA_POINTERS); i++) {
+frame->buf[i] = av_buffer_pool_get(pool->pools[0]);
+if (!frame->buf[i])
+goto fail;
+frame->extended_data[i] = frame->data[i] = frame->buf[i]->data;
+}
+for (i = 0; i < frame->nb_extended_buf; i++) {
+frame->extended_buf[i] = av_buffer_pool_get(pool->pools[0]);
+if (!frame->extended_buf[i])
+goto fail;
+frame->ex

[FFmpeg-devel] [PATCH 3/3] lavfi: use an audio frame pool for each link of the filtergraph

2016-05-10 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libavfilter/audio.c | 51 +--
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/libavfilter/audio.c b/libavfilter/audio.c
index 51fef03..dbc92d6 100644
--- a/libavfilter/audio.c
+++ b/libavfilter/audio.c
@@ -28,6 +28,9 @@
 #include "avfilter.h"
 #include "internal.h"
 
+#define BUFFER_ALIGN 0
+
+
 AVFrame *ff_null_get_audio_buffer(AVFilterLink *link, int nb_samples)
 {
 return ff_get_audio_buffer(link->dst->outputs[0], nb_samples);
@@ -35,29 +38,49 @@ AVFrame *ff_null_get_audio_buffer(AVFilterLink *link, int 
nb_samples)
 
 AVFrame *ff_default_get_audio_buffer(AVFilterLink *link, int nb_samples)
 {
-AVFrame *frame = av_frame_alloc();
+AVFrame *frame = NULL;
 int channels = link->channels;
-int ret;
 
 av_assert0(channels == 
av_get_channel_layout_nb_channels(link->channel_layout) || 
!av_get_channel_layout_nb_channels(link->channel_layout));
 
-if (!frame)
-return NULL;
+if (!link->frame_pool) {
+link->frame_pool = ff_frame_pool_audio_init(av_buffer_allocz, channels,
+nb_samples, link->format, 
BUFFER_ALIGN);
+if (!link->frame_pool)
+return NULL;
+} else {
+int pool_channels = 0;
+int pool_nb_samples = 0;
+int pool_align = 0;
+enum AVSampleFormat pool_format = AV_SAMPLE_FMT_NONE;
 
-frame->nb_samples = nb_samples;
-frame->format = link->format;
-av_frame_set_channels(frame, link->channels);
-frame->channel_layout = link->channel_layout;
-frame->sample_rate= link->sample_rate;
-ret = av_frame_get_buffer(frame, 0);
-if (ret < 0) {
-av_frame_free(&frame);
+if (ff_frame_pool_get_audio_config(link->frame_pool,
+   &pool_channels, &pool_nb_samples,
+   &pool_format, &pool_align) < 0) {
+return NULL;
+}
+
+if (pool_channels != channels || pool_nb_samples < nb_samples ||
+pool_format != link->format || pool_align != BUFFER_ALIGN) {
+
+ff_frame_pool_uninit((FFFramePool **)&link->frame_pool);
+link->frame_pool = ff_frame_pool_audio_init(av_buffer_allocz, 
channels,
+nb_samples, 
link->format, BUFFER_ALIGN);
+if (!link->frame_pool)
+return NULL;
+}
+}
+
+frame = ff_frame_pool_get(link->frame_pool);
+if (!frame) {
 return NULL;
 }
 
-av_samples_set_silence(frame->extended_data, 0, nb_samples, channels,
-   link->format);
+frame->nb_samples = nb_samples;
+frame->channel_layout = link->channel_layout;
+frame->sample_rate = link->sample_rate;
 
+av_samples_set_silence(frame->extended_data, 0, nb_samples, channels, 
link->format);
 
 return frame;
 }
-- 
2.8.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] (no subject)

2016-05-10 Thread Matthieu Bouron
Hello,

The following patchset add an audio frame pool for each link of the
filtergraph. It extends the FFVideoFramePool API to support audio (and renames
it to FFFramePool).

The performance gain on a rpi2 is very little. malloc+free goes from 2.50% to
1.84% cpu time with the following command line:

perf record ./ffmpeg_g -f lavfi -i 
sine=440,asetnsamples=4096,aformat=sample_fmts=s16,aresample=48000 -t 500 -f 
null -

For reference, most of the time (81%) is spend in request_frame and in the
resample function.

Given the performance gain, i'm not sure if it's really useful. However it
could be a good idea to have this API to use it in both libavcodec (and replace
what is being done in lavc/utils.c) and libavfilter for audio and video if made
public or semi-public later on.

Best regards,
Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support

2016-04-14 Thread Matthieu Bouron
On Thu, Apr 7, 2016 at 4:18 PM, wm4  wrote:

> On Fri, 18 Mar 2016 17:50:39 +0100
> Matthieu Bouron  wrote:
>
> > From: Matthieu Bouron 
> >
> > ---
> >
> > Hello,
>
> Can't say much about this, so just some minor confused comments.
>

Thanks for your comments and sorry for the late reply.


>
> >
> > The following patch add hwaccel support to the mediacodec (h264) decoder
> by allowing
> > the user to render the output frames directly on a surface.
> >
> > In order to do so the user needs to initialize the hwaccel through the
> use of
> > av_mediacodec_alloc_context and av_mediacodec_default_init functions.
> The later
> > takes a reference to an android/view/Surface as parameter.
> >
> > If the hwaccel successfully initialize, the decoder output frames pix
> fmt will be
> > AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to
> render
> > the frames on the surface:
> >
> > AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3];
> > av_mediacodec_release_buffer(buffer, 1);
> >
> > The last argument of av_mediacodec_release_buffer enable rendering of the
> > buffer on the surface (or not if set to 0).
> >
>
> I don't understand this (at all), but unreferencing the AVFrame should
> unref the underlying surface.
>

In this case, the underlying surface will remain (it is owned by the codec
itself) but the output buffer (that should be renderered to the surface)
will be discarded.


>
> > Regarding the internal changes in the mediacodec decoder:
> >
> > MediaCodec.flush() discards both input and output buffers meaning that if
> > MediaCodec.flush() is called all output buffers the user has a reference
> on are
> > now invalid (and cannot be used).
> > This behaviour does not fit well in the avcodec API.
> >
> > When the decoder is configured to output software buffers, there is no
> issue as
> > the buffers are copied.
> >
> > Now when the decoder is configured to output to a surface, the user
> might not
> > want to render all the frames as fast as the decoder can go and might
> want to
> > control *when* the frame are rendered, so we need to make sure that the
> > MediaCodec.flush() call is delayed until all the frames the user retains
> has
> > been released or rendered.
> >
> > Delaying the call to MediaCodec.flush() means buffering any inputs that
> come
> > the decoder until the user has released/renderer the frame he retains.
> >
> > This is a limitation of this hwaccel implementation, if the user retains
> a
> > frame (a), then issue a flush command to the decoder, the packets he
> feeds to
> > the decoder at that point will be queued in the internal decoder packet
> queue
> > (until he releases the frame (a)). This scenario leads to a memory usage
> > increase to say the least.
> >
> > Currently there is no limitation on the size of the internal decoder
> packet
> > queue but this is something that can be added easily. Then, if the queue
> is
> > full, what would be the behaviour of the decoder ? Can it block ? Or
> should it
> > returns something like AVERROR(EAGAIN) ?
>
> The current API can't do anything like this. It has to output 0 or 1
> frame per input packet. (If it outputs nothing, the frame is either
> discarded or queued internally. The queue can be emptied only when
> draining the decoder at the end of the stream.)
>
> So it looks like all you can do is blocking. (Which could lead to a
> deadlock in the API user, depending of how the user's code works?)
>

Yes if I block at some point, it can lead to a deadlock if the user never
releases all the frames. I'm considering buffering a few input packets
before blocking.


>
> >
> > About the other internal decoder changes I introduced:
> >
> > The MediaCodecDecContext is now refcounted (using the lavu/atomic api)
> since
> > the (hwaccel) frames can be retained by the user, we need to delay the
> > destruction of the codec until the user has released all the frames he
> has a
> > reference on.
> > The reference counter of the MediaCodecDecContext is incremented each
> time an
> > (hwaccel) frame is outputted by the decoder and decremented each time a
> > (hwaccel) frame is released.
> >
> > Also, when the decoder is configured to output to a surface the pts that
> are
> > given to the MediaCodec API are now rescaled based on the codec_timebase
> as
> > those timestamps values are propagated to the frames rendered on the
> surface
> > since Andr

Re: [FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon

2016-04-11 Thread Matthieu Bouron
On Mon, Apr 11, 2016 at 4:18 PM, Matthieu Bouron 
wrote:

>
>
> On Mon, Apr 11, 2016 at 9:58 AM, Benoit Fouet 
> wrote:
>
>> Hi,
>>
>> (again, thanks to both of you for documenting all this assembly /NEON
>> code)
>>
>> On 09/04/2016 10:22, Matthieu Bouron wrote:
>>
>>> From: Matthieu Bouron 
>>>
>>> ---
>>>
>>> Hello,
>>>
>>> The following patch add yuv2planeX_8_neon function for the arm
>>> platform.  It is
>>> currently restricted to 8-bit per component sources until I fix fate
>>> issues
>>> with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured
>>> out yet
>>> where it comes from).
>>>
>>> Matthieu
>>>
>>> ---
>>>   libswscale/arm/Makefile  |  1 +
>>>   libswscale/arm/output.S  | 78
>>> 
>>>   libswscale/arm/swscale.c |  7 +
>>>   libswscale/utils.c   |  3 +-
>>>   4 files changed, 88 insertions(+), 1 deletion(-)
>>>   create mode 100644 libswscale/arm/output.S
>>>
>>> [...]
>>>
>>> diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S
>>> new file mode 100644
>>> index 000..4437447
>>> --- /dev/null
>>> +++ b/libswscale/arm/output.S
>>> @@ -0,0 +1,78 @@
>>>
>>
>> [...]
>>
>>
>> +function ff_yuv2planeX_8_neon, export=1
>>> +push {r4-r12, lr}
>>> +vpush {q4-q7}
>>> +ldr r4, [sp, #104]
>>>  @ dstW
>>> +ldr r5, [sp, #108]
>>>  @ dither
>>> +ldr r6, [sp, #112]
>>>  @ offset
>>> +vld1.8  {d0}, [r5]
>>>  @ load 8x8-bit dither values
>>> +tst r6, #0
>>>  @ check offsetting which can be 0 or 3 only
>>> +beq 1f
>>> +vext.u8 d0, d0, d0, #3
>>>  @ honor offseting which can be 3 only
>>> +1:  vmovl.u8q0, d0
>>>  @ extend dither to 16-bit
>>> +vshll.u16   q1, d0, #12
>>> @ extend dither to 32-bit with left shift by 12 (part 1)
>>> +vshll.u16   q2, d1, #12
>>> @ extend dither to 32-bit with left shift by 12 (part 2)
>>> +mov r7, #0
>>>  @ i = 0
>>> +2:  vmov.u8 q3, q1
>>>  @ initialize accumulator with dithering values (part 1)
>>> +vmov.u8 q4, q2
>>>  @ initialize accumulator with dithering values (part 2)
>>> +mov r8, r1
>>>  @ tmpFilterSize = filterSize
>>> +mov r9, r2
>>>  @ srcp
>>> +mov r10, r0
>>> @ filterp
>>> +3:  ldr r11, [r9], #4
>>> @ get pointer @ src[j]
>>> +ldr r12, [r9], #4
>>> @ get pointer @ src[j+1]
>>> +add r11, r11, r7, lsl #1
>>>  @ &src[j][i]
>>> +add r12, r12, r7, lsl #1
>>>  @ &src[j+1][i]
>>> +vld1.16 {q5}, [r11]
>>> @ read 8x16-bit @ src[j  ][i + {0..7}]: A,B,C,D,E,F,G,H
>>> +vld1.16 {q6}, [r12]
>>> @ read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
>>> +ldr r11, [r10], #4
>>>  @ read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1])
>>> +vmov.16 q7, q5
>>>  @ copy 8x16-bit @ src[j  ][i + {0..7}] for following inplace zip
>>> instruction
>>> +vmov.16 q8, q6
>>>  @ copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip
>>> instruction
>>> +vzip.16 q7, q8
>>>  @ A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L
>>>
>>
>> nit: O,H,P
>
>
> Fixed.
>
> Patch updated fixing fate issues with 10-bit sources (the code was not
> honoring offsetting: tst r6, #0 has been replaced with cmp r6, #0).
> If there is no objection, I will push the patch in the next hours.
>

Patch applied.

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon

2016-04-11 Thread Matthieu Bouron
On Mon, Apr 11, 2016 at 9:58 AM, Benoit Fouet  wrote:

> Hi,
>
> (again, thanks to both of you for documenting all this assembly /NEON code)
>
> On 09/04/2016 10:22, Matthieu Bouron wrote:
>
>> From: Matthieu Bouron 
>>
>> ---
>>
>> Hello,
>>
>> The following patch add yuv2planeX_8_neon function for the arm platform.
>> It is
>> currently restricted to 8-bit per component sources until I fix fate
>> issues
>> with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured
>> out yet
>> where it comes from).
>>
>> Matthieu
>>
>> ---
>>   libswscale/arm/Makefile  |  1 +
>>   libswscale/arm/output.S  | 78
>> 
>>   libswscale/arm/swscale.c |  7 +
>>   libswscale/utils.c   |  3 +-
>>   4 files changed, 88 insertions(+), 1 deletion(-)
>>   create mode 100644 libswscale/arm/output.S
>>
>> [...]
>>
>> diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S
>> new file mode 100644
>> index 000..4437447
>> --- /dev/null
>> +++ b/libswscale/arm/output.S
>> @@ -0,0 +1,78 @@
>>
>
> [...]
>
>
> +function ff_yuv2planeX_8_neon, export=1
>> +push {r4-r12, lr}
>> +vpush {q4-q7}
>> +ldr r4, [sp, #104] @
>> dstW
>> +ldr r5, [sp, #108] @
>> dither
>> +ldr r6, [sp, #112] @
>> offset
>> +vld1.8  {d0}, [r5] @
>> load 8x8-bit dither values
>> +tst r6, #0 @
>> check offsetting which can be 0 or 3 only
>> +beq 1f
>> +vext.u8 d0, d0, d0, #3 @
>> honor offseting which can be 3 only
>> +1:  vmovl.u8q0, d0 @
>> extend dither to 16-bit
>> +vshll.u16   q1, d0, #12@
>> extend dither to 32-bit with left shift by 12 (part 1)
>> +vshll.u16   q2, d1, #12@
>> extend dither to 32-bit with left shift by 12 (part 2)
>> +mov r7, #0 @
>> i = 0
>> +2:  vmov.u8 q3, q1 @
>> initialize accumulator with dithering values (part 1)
>> +vmov.u8 q4, q2 @
>> initialize accumulator with dithering values (part 2)
>> +mov r8, r1 @
>> tmpFilterSize = filterSize
>> +mov r9, r2 @
>> srcp
>> +mov r10, r0@
>> filterp
>> +3:  ldr r11, [r9], #4  @
>> get pointer @ src[j]
>> +ldr r12, [r9], #4  @
>> get pointer @ src[j+1]
>> +add r11, r11, r7, lsl #1   @
>> &src[j][i]
>> +add r12, r12, r7, lsl #1   @
>> &src[j+1][i]
>> +vld1.16 {q5}, [r11]@
>> read 8x16-bit @ src[j  ][i + {0..7}]: A,B,C,D,E,F,G,H
>> +vld1.16 {q6}, [r12]@
>> read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
>> +ldr r11, [r10], #4 @
>> read 2x16-bit coeffs (X, Y) at (filter[j], filter[j+1])
>> +vmov.16 q7, q5 @
>> copy 8x16-bit @ src[j  ][i + {0..7}] for following inplace zip instruction
>> +vmov.16 q8, q6     @
>> copy 8x16-bit @ src[j+1][i + {0..7}] for following inplace zip instruction
>> +vzip.16 q7, q8 @
>> A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L
>>
>
> nit: O,H,P


Fixed.

Patch updated fixing fate issues with 10-bit sources (the code was not
honoring offsetting: tst r6, #0 has been replaced with cmp r6, #0).
If there is no objection, I will push the patch in the next hours.

Thanks for the review,
Matthieu
From 95186d8459c1cb1615299edd6756292140f7fb68 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 8 Apr 2016 15:32:24 

[FFmpeg-devel] [PATCH] swscale/arm: add yuv2planeX_8_neon

2016-04-09 Thread Matthieu Bouron
From: Matthieu Bouron 

---

Hello,

The following patch add yuv2planeX_8_neon function for the arm platform.  It is
currently restricted to 8-bit per component sources until I fix fate issues
with 10-bit sources (the dnxhd-*-10bit tests fail but I haven't figured out yet
where it comes from).

Matthieu

---
 libswscale/arm/Makefile  |  1 +
 libswscale/arm/output.S  | 78 
 libswscale/arm/swscale.c |  7 +
 libswscale/utils.c   |  3 +-
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 libswscale/arm/output.S

diff --git a/libswscale/arm/Makefile b/libswscale/arm/Makefile
index b8b0134..792da6b 100644
--- a/libswscale/arm/Makefile
+++ b/libswscale/arm/Makefile
@@ -4,4 +4,5 @@ OBJS+= arm/swscale.o\
 NEON-OBJS   += arm/rgb2yuv_neon_32.o
 NEON-OBJS   += arm/rgb2yuv_neon_16.o
 NEON-OBJS   += arm/hscale.o \
+   arm/output.o \
arm/yuv2rgb_neon.o   \
diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S
new file mode 100644
index 000..4437447
--- /dev/null
+++ b/libswscale/arm/output.S
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2016 Clément Bœsch 
+ * Copyright (c) 2016 Matthieu Bouron 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/arm/asm.S"
+
+function ff_yuv2planeX_8_neon, export=1
+push {r4-r12, lr}
+vpush {q4-q7}
+ldr r4, [sp, #104] @ dstW
+ldr r5, [sp, #108] @ dither
+ldr r6, [sp, #112] @ offset
+vld1.8  {d0}, [r5] @ load 
8x8-bit dither values
+tst r6, #0 @ check 
offsetting which can be 0 or 3 only
+beq 1f
+vext.u8 d0, d0, d0, #3 @ honor 
offseting which can be 3 only
+1:  vmovl.u8q0, d0 @ 
extend dither to 16-bit
+vshll.u16   q1, d0, #12@ 
extend dither to 32-bit with left shift by 12 (part 1)
+vshll.u16   q2, d1, #12@ 
extend dither to 32-bit with left shift by 12 (part 2)
+mov r7, #0 @ i = 0
+2:  vmov.u8 q3, q1 @ 
initialize accumulator with dithering values (part 1)
+vmov.u8 q4, q2 @ 
initialize accumulator with dithering values (part 2)
+mov r8, r1 @ 
tmpFilterSize = filterSize
+mov r9, r2 @ srcp
+mov r10, r0@ 
filterp
+3:  ldr r11, [r9], #4  @ get 
pointer @ src[j]
+ldr r12, [r9], #4  @ get 
pointer @ src[j+1]
+add r11, r11, r7, lsl #1   @ 
&src[j][i]
+add r12, r12, r7, lsl #1   @ 
&src[j+1][i]
+vld1.16 {q5}, [r11]@ read 
8x16-bit @ src[j  ][i + {0..7}]: A,B,C,D,E,F,G,H
+vld1.16 {q6}, [r12]@ read 
8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
+ldr r11, [r10], #4 @ read 
2x16-bit coeffs (X, Y) at (filter[j], filter[j+1])
+vmov.16 q7, q5 @ copy 
8x16-bit @ src[j  ][i + {0..7}] for following inplace zip instruction
+vmov.16 q8, q6 @ copy 
8x16-bit @ src[j+1][i + {0..7}] for following inplace zip instruction
+vzip.16 q7, q8 @ 
A,I,B,J,C,K,D,L,E,M,F,N,G,O,H,L
+vdup.32 q15, r11

Re: [FFmpeg-devel] [PATCH] swscale/arm: add ff_hscale_8_to_15_neon

2016-04-08 Thread Matthieu Bouron
On Fri, Apr 8, 2016 at 10:27 PM, Michael Niedermayer  wrote:

> On Fri, Apr 08, 2016 at 12:24:13PM +0200, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> >
> > ---
> >  libswscale/arm/Makefile   |  6 ++--
> >  libswscale/arm/hscale.S   | 70
> +++
> >  libswscale/arm/swscale.c  | 37 +++
> >  libswscale/swscale.c  |  2 ++
> >  libswscale/swscale_internal.h |  1 +
> >  5 files changed, 114 insertions(+), 2 deletions(-)
> >  create mode 100644 libswscale/arm/hscale.S
> >  create mode 100644 libswscale/arm/swscale.c
>
> tested, works (fate)
>


Applied with minor changes in the comments.

Thanks,
Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] swscale/arm: add ff_hscale_8_to_15_neon

2016-04-08 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/Makefile   |  6 ++--
 libswscale/arm/hscale.S   | 70 +++
 libswscale/arm/swscale.c  | 37 +++
 libswscale/swscale.c  |  2 ++
 libswscale/swscale_internal.h |  1 +
 5 files changed, 114 insertions(+), 2 deletions(-)
 create mode 100644 libswscale/arm/hscale.S
 create mode 100644 libswscale/arm/swscale.c

diff --git a/libswscale/arm/Makefile b/libswscale/arm/Makefile
index 9ccec3b..b8b0134 100644
--- a/libswscale/arm/Makefile
+++ b/libswscale/arm/Makefile
@@ -1,5 +1,7 @@
-OBJS+= arm/swscale_unscaled.o
+OBJS+= arm/swscale.o\
+   arm/swscale_unscaled.o   \
 
 NEON-OBJS   += arm/rgb2yuv_neon_32.o
 NEON-OBJS   += arm/rgb2yuv_neon_16.o
-NEON-OBJS   += arm/yuv2rgb_neon.o
+NEON-OBJS   += arm/hscale.o \
+   arm/yuv2rgb_neon.o   \
diff --git a/libswscale/arm/hscale.S b/libswscale/arm/hscale.S
new file mode 100644
index 000..d559b3d
--- /dev/null
+++ b/libswscale/arm/hscale.S
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2016 Clément Bœsch 
+ * Copyright (c) 2016 Matthieu Bouron 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/arm/asm.S"
+
+function ff_hscale_8_to_15_neon, export=1
+push{r4-r12, lr}
+vpush   {q4-q7}
+ldr r4, [sp, #104] @ filter
+ldr r5, [sp, #108] @ 
filterPos
+ldr r6, [sp, #112] @ 
filterSize
+add r10, r4, r6, lsl #1@ 
filter2 = filter + filterSize * 2
+1:  ldr r8, [r5], #4   @ 
filterPos[0]
+ldr r9, [r5], #4   @ 
filterPos[1]
+vmov.s32q4, #0 @ val 
accumulator
+vmov.s32q5, #0 @ val 
accumulator
+mov r7, r6 @ 
filterSize counter
+mov r0, r3 @ srcp
+2:  add r11, r0, r8@ srcp 
+ filterPos[0]
+add r12, r0, r9@ srcp 
+ filterPos[1]
+vld1.8  d0, [r11]  @ 
srcp[filterPos[0] + {0..7}]
+vld1.8  d2, [r12]  @ 
srcp[filterPos[1] + {0..7}]
+vld1.16 {q2}, [r4]!@ load 
8x16-bit filter values
+vld1.16 {q3}, [r10]!   @ load 
8x16-bit filter values
+vmovl.u8q0, d0 @ 
unpack src values to 16-bit
+vmovl.u8q1, d2 @ 
unpack src values to 16-bit
+vmull.s16   q8, d0, d4 @ 
srcp[filterPos[0] + {0..7}] * filter[{0..7}] (part 1)
+vmull.s16   q9, d1, d5 @ 
srcp[filterPos[0] + {0..7}] * filter[{0..7}] (part 2)
+vmull.s16   q10, d2, d6@ 
srcp[filterPos[1] + {0..7}] * filter[{0..7}] (part 1)
+vmull.s16   q11, d3, d7@ 
srcp[filterPos[1] + {0..7}] * filter[{0..7}] (part 2)
+vpadd.s32   d16, d16, d17  @ 
horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 1)
+vpadd.s32   d17, d18, d19  @ 
horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 2)
+vpadd.s32   d20, d20, d21  @ 
horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 1)
+vpadd.s32   d21, d22, d23  @ 
horizontal pair adding of the 8x32-bit multiplied values into 4x32-bit (part 2)
+vadd.s32

Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support

2016-04-07 Thread Matthieu Bouron
On Wed, Mar 23, 2016 at 6:16 PM, Matthieu Bouron 
wrote:

>
>
> On Tue, Mar 22, 2016 at 10:04 AM, Matthieu Bouron <
> matthieu.bou...@gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron <
>> matthieu.bou...@gmail.com> wrote:
>>
>>> From: Matthieu Bouron 
>>>
>>> ---
>>>
>>> Hello,
>>>
>>> The following patch add hwaccel support to the mediacodec (h264) decoder
>>> by allowing
>>> the user to render the output frames directly on a surface.
>>>
>>> In order to do so the user needs to initialize the hwaccel through the
>>> use of
>>> av_mediacodec_alloc_context and av_mediacodec_default_init functions.
>>> The later
>>> takes a reference to an android/view/Surface as parameter.
>>>
>>> If the hwaccel successfully initialize, the decoder output frames pix
>>> fmt will be
>>> AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to
>>> render
>>> the frames on the surface:
>>>
>>> AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3];
>>> av_mediacodec_release_buffer(buffer, 1);
>>>
>>> The last argument of av_mediacodec_release_buffer enable rendering of the
>>> buffer on the surface (or not if set to 0).
>>>
>>> Regarding the internal changes in the mediacodec decoder:
>>>
>>> MediaCodec.flush() discards both input and output buffers meaning that if
>>> MediaCodec.flush() is called all output buffers the user has a reference
>>> on are
>>> now invalid (and cannot be used).
>>> This behaviour does not fit well in the avcodec API.
>>>
>>> When the decoder is configured to output software buffers, there is no
>>> issue as
>>> the buffers are copied.
>>>
>>> Now when the decoder is configured to output to a surface, the user
>>> might not
>>> want to render all the frames as fast as the decoder can go and might
>>> want to
>>> control *when* the frame are rendered, so we need to make sure that the
>>> MediaCodec.flush() call is delayed until all the frames the user retains
>>> has
>>> been released or rendered.
>>>
>>> Delaying the call to MediaCodec.flush() means buffering any inputs that
>>> come
>>> the decoder until the user has released/renderer the frame he retains.
>>>
>>> This is a limitation of this hwaccel implementation, if the user retains
>>> a
>>> frame (a), then issue a flush command to the decoder, the packets he
>>> feeds to
>>> the decoder at that point will be queued in the internal decoder packet
>>> queue
>>> (until he releases the frame (a)). This scenario leads to a memory usage
>>> increase to say the least.
>>>
>>> Currently there is no limitation on the size of the internal decoder
>>> packet
>>> queue but this is something that can be added easily. Then, if the queue
>>> is
>>> full, what would be the behaviour of the decoder ? Can it block ? Or
>>> should it
>>> returns something like AVERROR(EAGAIN) ?
>>>
>>> About the other internal decoder changes I introduced:
>>>
>>> The MediaCodecDecContext is now refcounted (using the lavu/atomic api)
>>> since
>>> the (hwaccel) frames can be retained by the user, we need to delay the
>>> destruction of the codec until the user has released all the frames he
>>> has a
>>> reference on.
>>> The reference counter of the MediaCodecDecContext is incremented each
>>> time an
>>> (hwaccel) frame is outputted by the decoder and decremented each time a
>>> (hwaccel) frame is released.
>>>
>>> Also, when the decoder is configured to output to a surface the pts that
>>> are
>>> given to the MediaCodec API are now rescaled based on the codec_timebase
>>> as
>>> those timestamps values are propagated to the frames rendered on the
>>> surface
>>> since Android M. Not sure if it's really useful though.
>>>
>>> On the performance side:
>>>
>>> On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps:
>>>   - software output + rgba conversion goes at 59~60fps
>>>   - surface output + render on a surface goes at 100~110fps
>>>
>>>
>> [...]
>>
>> Patch updated with the following differences:
>>   * the public mediacodec api is now always built (not only when
>> mediacodec is available) (and the build when mediacodec is not available
>> has been fixed)
>>   * the documentation of av_mediacodec_release_buffer has been improved a
>> bit
>>
>
> Patch updated with the following differences:
>   MediaCodecBuffer->released type is now a volatile int (instead of a int*)
>   MediaCodecContext->refcount type is now a volatile int (instead of a
> int*)
>

Ping.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-04-01 Thread Matthieu Bouron
On Fri, Apr 1, 2016 at 4:15 PM, Matthieu Bouron 
wrote:

>
>
> On Mon, Mar 28, 2016 at 9:12 PM, Matthieu Bouron <
> matthieu.bou...@gmail.com> wrote:
>
>>
>>
>> On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron <
>> matthieu.bou...@gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron <
>>> matthieu.bou...@gmail.com> wrote:
>>>
>>>> The following patchset aims to make bitexact the yuv->rgba armv7 neon
>>>> code path
>>>> with the aarch64 one. It also aims to make the two code bases as close
>>>> as
>>>> possible.
>>>>
>>>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path
>>>>
>>>> The current 32bit code path which is unused is removed.
>>>>
>>>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
>>>>
>>>> The code process only one line at a time for the yuv420p,nv12 and nv21
>>>> formats
>>>> with no regression in performance observed on a rpi2 (I've even
>>>> observed a
>>>> slight increase of performance for the nv12 and nv21 formats).
>>>>
>>>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its
>>>>
>>>> The last patch of the serie makes the code bitexact with the aarch64
>>>> version.
>>>> The increase of precision (which introduces a performance loss) is
>>>> compensated
>>>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh.
>>>>
>>>> ./ffmpeg_g -nostats -f lavfi -i
>>>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
>>>> null -
>>>>
>>>> without patchset :
>>>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605
>>>>
>>>> with patchset:
>>>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884
>>>
>>>
>>> I've managed tu run the code on a beagle bone black board, here are the
>>> results:
>>>
>>> nv12->bgra
>>> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743
>>> max:0.032600 min:0.011513
>>> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438
>>> avg:0.013659 max:0.034427 min:0.013411
>>> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554
>>> avg:0.012751 max:0.034288 min:0.012523
>>>
>>> yuv420p->bgra
>>> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159
>>> max:0.033866 min:0.012945
>>> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154
>>> avg:0.015358 max:0.036186 min:0.015134
>>> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623
>>> avg:0.014784 max:0.035487 min:0.014568
>>>
>>> So it looks like processing one line at a time as negative effect on
>>> performance on this board (as opposed to the rpi2). I'll try to keep the
>>> two line processing code and post some result (so we can decide, which
>>> version to choose).
>>>
>>
>> I've managed to update the patchset to keep processing two line at a time
>> for the nv12,nv21 and yuv420p formats, here are the results:
>>
>> ./ffmpeg_g -nostats -f lavfi -i
>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
>> null -
>>
>> Beagle bone black:
>> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743
>> max:0.032600 min:0.011513
>> with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751
>> max:0.034288 min:0.012523
>> with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408
>> max:0.032124 min:0.011202
>>
>> Nexus5:
>> without patchset: avg: ~2,869ms
>> with patchset v1: avg: ~3,008ms
>> with patchset v2: avg: ~2,702ms
>>
>> RPI2:
>> without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813
>> max:0.039399 min:0.020605
>> with patchset v1:  [bench @ 0xe5f6a0] t:0.018924 avg:0.019075
>> max:0.037472 min:0.01884
>> with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184
>> min:0.020768
>>
>> Given the following the results, i will drop the current patchset and
>> submit another one (which keeps processing two lines at a time).
>>
>
> I will push the updated patchset (which takes into account Benoit's
> comments) in one hour~.
>

Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-04-01 Thread Matthieu Bouron
On Mon, Mar 28, 2016 at 9:12 PM, Matthieu Bouron 
wrote:

>
>
> On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron <
> matthieu.bou...@gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron <
>> matthieu.bou...@gmail.com> wrote:
>>
>>> The following patchset aims to make bitexact the yuv->rgba armv7 neon
>>> code path
>>> with the aarch64 one. It also aims to make the two code bases as close as
>>> possible.
>>>
>>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path
>>>
>>> The current 32bit code path which is unused is removed.
>>>
>>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
>>>
>>> The code process only one line at a time for the yuv420p,nv12 and nv21
>>> formats
>>> with no regression in performance observed on a rpi2 (I've even observed
>>> a
>>> slight increase of performance for the nv12 and nv21 formats).
>>>
>>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its
>>>
>>> The last patch of the serie makes the code bitexact with the aarch64
>>> version.
>>> The increase of precision (which introduces a performance loss) is
>>> compensated
>>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh.
>>>
>>> ./ffmpeg_g -nostats -f lavfi -i
>>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
>>> null -
>>>
>>> without patchset :
>>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605
>>>
>>> with patchset:
>>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884
>>
>>
>> I've managed tu run the code on a beagle bone black board, here are the
>> results:
>>
>> nv12->bgra
>> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743
>> max:0.032600 min:0.011513
>> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659
>> max:0.034427 min:0.013411
>> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554
>> avg:0.012751 max:0.034288 min:0.012523
>>
>> yuv420p->bgra
>> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866
>> min:0.012945
>> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154
>> avg:0.015358 max:0.036186 min:0.015134
>> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623
>> avg:0.014784 max:0.035487 min:0.014568
>>
>> So it looks like processing one line at a time as negative effect on
>> performance on this board (as opposed to the rpi2). I'll try to keep the
>> two line processing code and post some result (so we can decide, which
>> version to choose).
>>
>
> I've managed to update the patchset to keep processing two line at a time
> for the nv12,nv21 and yuv420p formats, here are the results:
>
> ./ffmpeg_g -nostats -f lavfi -i
> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
> null -
>
> Beagle bone black:
> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600
> min:0.011513
> with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288
> min:0.012523
> with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 max:0.032124
> min:0.011202
>
> Nexus5:
> without patchset: avg: ~2,869ms
> with patchset v1: avg: ~3,008ms
> with patchset v2: avg: ~2,702ms
>
> RPI2:
> without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399
> min:0.020605
> with patchset v1:  [bench @ 0xe5f6a0] t:0.018924 avg:0.019075
> max:0.037472 min:0.01884
> with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184
> min:0.020768
>
> Given the following the results, i will drop the current patchset and
> submit another one (which keeps processing two lines at a time).
>

I will push the updated patchset (which takes into account Benoit's
comments) in one hour~.

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2 8/9] swscale/arm/yuv2rgb: save a few instructions by processing the luma line interleaved

2016-03-31 Thread Matthieu Bouron
On Thu, Mar 31, 2016 at 11:17 AM, Benoit Fouet  wrote:

> Hi,
>
> On 28/03/2016 21:19, Matthieu Bouron wrote:
>
>> ---
>>   libswscale/arm/yuv2rgb_neon.S | 88
>> +--
>>   1 file changed, 34 insertions(+), 54 deletions(-)
>>
>> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
>> index 124d7d3..6b911c8 100644
>> --- a/libswscale/arm/yuv2rgb_neon.S
>> +++ b/libswscale/arm/yuv2rgb_neon.S
>>
>> [...]
>>
>> @@ -94,25 +67,29 @@
>>   .ifc \ofmt,bgra
>>   compute_rgbad8, d7, d6, d9, d12, d11, d10, d13
>>   .endif
>> +
>> +vzip.8  d6, d10
>> +vzip.8  d7, d11
>> +vzip.8  d8, d12
>> +vzip.8  d9, d13
>>
>
> Adding a comment to explain the resulting interleaving would be nice


Added locally:

+vzip.8  d6, d10@
d6 = R1R2R3R4R5R6R7R8 d10 = R9R10R11R12R13R14R15R16
+vzip.8  d7, d11@
d7 = G1G2G3G4G5G6G7G8 d11 = G9G10G11G12G13G14G15G16
+vzip.8  d8, d12@
d8 = B1B2B3B4B5B6B7B8 d12 = B9B10B11B12B13B14B15B16
+vzip.8  d9, d13@
d9 = A1A2A3A4A5A6A7A8 d13 = A9A10A11A12A13A14A15A16


>
>
>   vst4.8  {q3, q4}, [\dst,:128]!
>>   vst4.8  {q5, q6}, [\dst,:128]!
>> -
>>   .endm
>> .macro process_1l ofmt
>> -compute_premult d28, d29, d30, d31
>> -vld1.8  {q7}, [r4]!
>> -compute r2, d14, d15, \ofmt
>> +compute_premult
>> +vld2.8  {d14, d15}, [r4]!
>> +compute r2, \ofmt
>>   .endm
>> .macro process_2l ofmt
>> -compute_premult d28, d29, d30, d31
>> +compute_premult
>>   -vld1.8  {q7}, [r4]!
>> @ first line of luma
>> -compute r2, d14, d15, \ofmt
>> +vld2.8  {d14, d15}, [r4]!  @
>> q7 = Y (interleaved)
>> +compute r2, \ofmt
>>   -vld1.8  {q7}, [r12]!
>>  @ second line of luma
>> -compute r11, d14, d15, \ofmt
>> +vld2.8  {d14, d15}, [r12]! @
>> q7 = Y (interleaved)
>> +compute r11, \ofmt
>>   .endm
>>
>>
>
> What about adding a level of macro here? Something like:
> .macro process_1l_internal ofmt src_addr res
> compute_premult
> vld2.8{d14, d15}, [\src_addr]!
> compute\res, \ofmt
> .endm
>
> (again, the naming could be changed, according to your own taste :-) )
>
> This way, we would get:
> .macro process_1l ofmt
> process_1l_internal \ofmt, r4, r2
> .endm
>
> .macro process_2l ofmt
> process_1l_internal \ofmt, r4,  r2
> process_1l_internal \ofmt, r12, r11
> .endm


Added locally:
process_1l_16px_internal added to the macro-ify patch and then renamed to
process_1l_internal in a later patch.

Thanks,
Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v2 6/9] swscale/arm/yuv2rgb: macro-ify

2016-03-31 Thread Matthieu Bouron
On Thu, Mar 31, 2016 at 10:48 AM, Benoit Fouet  wrote:

> Hi,
>
> (sorry for the first mail, fuzzy fingers...)
>
> On 28/03/2016 21:19, Matthieu Bouron wrote:
>
>> ---
>>   libswscale/arm/yuv2rgb_neon.S | 137
>> ++
>>   1 file changed, 60 insertions(+), 77 deletions(-)
>>
>> diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
>> index ef7b0a6..e1b68c1 100644
>> --- a/libswscale/arm/yuv2rgb_neon.S
>> +++ b/libswscale/arm/yuv2rgb_neon.S
>> @@ -64,7 +64,7 @@
>>   vmov.u8 \a2, #255
>>   .endm
>>   -.macro compute_16px dst y0 y1 ofmt
>> +.macro compute dst y0 y1 ofmt
>>   vmovl.u8q14, \y0
>>  @ 8px of y
>>   vmovl.u8q15, \y1
>>  @ 8px of y
>>   @@ -99,23 +99,23 @@
>> .endm
>>   -.macro process_1l_16px ofmt
>> +.macro process_1l ofmt
>>   compute_premult d28, d29, d30, d31
>>   vld1.8  {q7}, [r4]!
>> -compute_16pxr2, d14, d15, \ofmt
>> +compute r2, d14, d15, \ofmt
>>   .endm
>>   -.macro process_2l_16px ofmt
>> +.macro process_2l ofmt
>>   compute_premult d28, d29, d30, d31
>> vld1.8  {q7}, [r4]!
>>   @ first line of luma
>> -compute_16pxr2, d14, d15, \ofmt
>> +compute r2, d14, d15, \ofmt
>> vld1.8  {q7}, [r12]!
>>  @ second line of luma
>> -compute_16pxr11, d14, d15, \ofmt
>> +compute r11, d14, d15, \ofmt
>>   .endm
>>
>>
>
> This renaming could be split
>

Splitted locally.


>
> [...]
>
>
> @@ -232,68 +204,79 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
>>   vld1.8  d3, [r10]!
>>  @ d3: chroma blue line
>>   vsubl.u8q14, d2, d10
>>  @ q14 = U - 128
>>   vsubl.u8q15, d3, d10
>>  @ q15 = V - 128
>> +.endm
>>   -process_2l_16px \ofmt
>> -.endif
>> -
>> -.ifc \ifmt,yuv422p
>> +.macro load_chroma_yuv422p
>>   pld [r10, #64*3]
>> vld1.8  d2, [r6]!
>>   @ d2: chroma red line
>>   vld1.8  d3, [r10]!
>>  @ d3: chroma blue line
>>   vsubl.u8q14, d2, d10
>>  @ q14 = U - 128
>>   vsubl.u8q15, d3, d10
>>  @ q15 = V - 128
>> +.endm
>>   -process_1l_16px \ofmt
>> -.endif
>> -
>> -subsr8, r8, #16@
>> width -= 16
>> -bgt 2b
>> -
>> -add r2, r2, r3 @
>> dst   += padding
>> -add r4, r4, r5 @
>> srcY  += paddingY
>> -
>> -.ifc \ifmt,nv12
>> +.macro increment_nv12
>>
>
> How about increment_and test_nv12? Same for the other ones.
> (I'm not happy with the name I found, but am trying to come up with a
> solution to have a more explicit naming)


Renamed to increment_and_test_* locally.

Thanks,
Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

2016-03-30 Thread Matthieu Bouron
On Wed, Mar 30, 2016 at 11:36:34PM +0200, Benoit Fouet wrote:
> Hi,

Hi Benoit,

> 
> Le 26/03/2016 13:05, Matthieu Bouron a écrit :
> >On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer  >>>wrote:
> >>>On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> >>>> >From: Matthieu Bouron
> >>>> >
> >>>> >---
> >>>> >  libswscale/arm/yuv2rgb_neon.S | 89
> >>>---
> >>>> >  1 file changed, 24 insertions(+), 65 deletions(-)
> >>>
> >>>breaks build
> >>>
> >>>  make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
> >>>--cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
> >>>-mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
> >>>--enable-cross-compile && make -j12
> >>>
> >>>CC  libavutil/arm/float_dsp_init_arm.o
> >>>src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `subeq r6,r6,r0'
> >>>src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> >>>instruction should be in IT block -- `addne r6,r7'
> >>>
> >[...]
> >
> >Patch updated with the relevant it instructions added. It still does build
> >on my rpi2 setup but is not tested on the same setup as yours.
> >Can you confirm it builds/works on your setup ?
> >
> >If it works, i will send an updated version of the next patch (07/10) to
> >resolve the conflicts.
> >
> >Matthieu
> >
> >0006-swscale-arm-yuv2rgb-only-process-one-line-at-a-time-.patch
> >
> >
> > From 7b3a405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
> >From: Matthieu Bouron
> >Date: Wed, 23 Mar 2016 11:26:13 +
> >Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
> >  for the yuv420p and nv{12,21} formats
> >
> >---
> >  libswscale/arm/yuv2rgb_neon.S | 92 
> > +--
> >  1 file changed, 27 insertions(+), 65 deletions(-)
> >
> >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> >index ef7b0a6..6aeccae 100644
> >--- a/libswscale/arm/yuv2rgb_neon.S
> >+++ b/libswscale/arm/yuv2rgb_neon.S
> >@@ -105,16 +105,6 @@
> >  compute_16pxr2, d14, d15, \ofmt
> >  .endm
> >-.macro process_2l_16px ofmt
> >-compute_premult d28, d29, d30, d31
> >-
> >-vld1.8  {q7}, [r4]!@ 
> >first line of luma
> >-compute_16pxr2, d14, d15, \ofmt
> >-
> >-vld1.8  {q7}, [r12]!   @ 
> >second line of luma
> >-compute_16pxr11, d14, d15, \ofmt
> >-.endm
> >-
> >  .macro load_args_nvx
> >  push{r4-r12, lr}
> >  vpush   {q4-q7}
> >@@ -127,13 +117,9 @@
> >  ldr r10,[sp, #128] @ 
> > r10 = y_coeff
> >  vdup.16 d0, r10@ 
> > d0  = y_coeff
> >  vld1.16 {d1}, [r8] @ 
> > d1  = *table
> >-add r11, r2, r3@ 
> >r11 = dst + linesize (dst2)
> >-add r12, r4, r5@ 
> >r12 = srcY + linesizeY (srcY2)
> 
> Nit: this lets r11 and r12 unused by the NV conversions. It should be
> possible not to push/pop them
> If not (which I would certainly understand), what would you think about
> moving the registers save out of the 'load_args_*' macro?
> It seems weird to have all the push/vpush that are not factored, and the
> pop/vpop that is done in only one place, at the end of each function.

Thanks for the review, I unfortunately dropped this part of the patch set,
processing only one line at a time proved to be slower on devices other
than the rpi2. (I will keep your remark in mind if I ever switch back to
processing only one line at a time for all formats).

The v2 patch set is in reply of the following thread:
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192272.html

Would you mind taking a look at it ?

Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 02/10] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p

2016-03-30 Thread Matthieu Bouron
On Wed, Mar 30, 2016 at 11:34:59PM +0200, Benoit Fouet wrote:
> Hi,
> 
> Le 25/03/2016 23:45, Matthieu Bouron a écrit :
> >From: Matthieu Bouron
> >
> >---
> >  libswscale/arm/yuv2rgb_neon.S | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> >diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
> >index f40327b..aac0773 100644
> >--- a/libswscale/arm/yuv2rgb_neon.S
> >+++ b/libswscale/arm/yuv2rgb_neon.S
> >@@ -172,11 +172,10 @@
> >  vdup.16 d0, r10@ 
> > d0  = y_coeff
> >  vld1.16 {d1}, [r8] @ 
> > d1  = *table
> >  add r11, r2, r3@ 
> > r11 = dst + linesize (dst2)
> >-lsl r8, r0, #2
> >-sub r3, r3, r8 @ r3 
> >= linesize  * 2 - width * 4 (padding)
> >-sub r5, r5, r0 @ r5 
> >= linesizeY * 2 - width (paddingY)
> >-sub r7, r7, r0, lsr #1 @ r7 
> >= linesizeU - width / 2 (paddingU)
> >-sub r12,r12,r0, lsr #1 @ 
> >r12 = linesizeV- width / 2 (paddingV)
> >+sub r3, r3, r0, lsl #2 @ r3 
> > = linesize  - width * 4 (padding)
> >+sub r5, r5, r0 @ r5 
> > = linesizeY - width (paddingY)
> >+sub r7, r7, r0, lsr #1 @ r7 
> > = linesizeU - width / 2 (paddingU)
> >+sub r12,r12,r0, lsr #1 @ 
> >r12 = linesizeV - width / 2 (paddingV)
> >  ldr r10,[sp, #120] @ 
> > r10 = srcV
> >  .endm
> 
> nit: it would be cool to split: one for the comments and the other one for
> the lsl factorization.

Splitted locally in the v2 patch set.

Thanks,
Matthieu

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 6/9] swscale/arm/yuv2rgb: macro-ify

2016-03-28 Thread Matthieu Bouron
---
 libswscale/arm/yuv2rgb_neon.S | 137 ++
 1 file changed, 60 insertions(+), 77 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..e1b68c1 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -64,7 +64,7 @@
 vmov.u8 \a2, #255
 .endm
 
-.macro compute_16px dst y0 y1 ofmt
+.macro compute dst y0 y1 ofmt
 vmovl.u8q14, \y0   @ 8px 
of y
 vmovl.u8q15, \y1   @ 8px 
of y
 
@@ -99,23 +99,23 @@
 
 .endm
 
-.macro process_1l_16px ofmt
+.macro process_1l ofmt
 compute_premult d28, d29, d30, d31
 vld1.8  {q7}, [r4]!
-compute_16pxr2, d14, d15, \ofmt
+compute r2, d14, d15, \ofmt
 .endm
 
-.macro process_2l_16px ofmt
+.macro process_2l ofmt
 compute_premult d28, d29, d30, d31
 
 vld1.8  {q7}, [r4]!@ first 
line of luma
-compute_16pxr2, d14, d15, \ofmt
+compute r2, d14, d15, \ofmt
 
 vld1.8  {q7}, [r12]!   @ 
second line of luma
-compute_16pxr11, d14, d15, \ofmt
+compute r11, d14, d15, \ofmt
 .endm
 
-.macro load_args_nvx
+.macro load_args_nv12
 push{r4-r12, lr}
 vpush   {q4-q7}
 ldr r4, [sp, #104] @ r4  = 
srcY
@@ -136,6 +136,10 @@
 sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
 .endm
 
+.macro load_args_nv21
+load_args_nv12
+.endm
+
 .macro load_args_yuv420p
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -176,55 +180,23 @@
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
 
-.macro declare_func ifmt ofmt
-function ff_\ifmt\()_to_\ofmt\()_neon, export=1
-
-.ifc \ifmt,nv12
-load_args_nvx
-.endif
-
-.ifc \ifmt,nv21
-load_args_nvx
-.endif
-
-.ifc \ifmt,yuv420p
-load_args_yuv420p
-.endif
-
-
-.ifc \ifmt,yuv422p
-load_args_yuv422p
-.endif
-
-1:
-mov r8, r0 @ r8 = 
width
-2:
-pld [r6, #64*3]
-pld [r4, #64*3]
-
-vmov.i8 d10, #128
-
-.ifc \ifmt,nv12
+.macro load_chroma_nv12
 pld [r12, #64*3]
 
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+.endm
 
-process_2l_16px \ofmt
-.endif
-
-.ifc \ifmt,nv21
+.macro load_chroma_nv21
 pld [r12, #64*3]
 
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
 vsubl.u8q14, d3, d10   @ q14 = 
U - 128
 vsubl.u8q15, d2, d10   @ q15 = 
V - 128
+.endm
 
-process_2l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv420p
+.macro load_chroma_yuv420p
 pld [r10, #64*3]
 pld [r12, #64*3]
 
@@ -232,68 +204,79 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 vld1.8  d3, [r10]! @ d3: 
chroma blue line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+.endm
 
-process_2l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv422p
+.macro load_chroma_yuv422p
 pld [r10, #64*3]
 
 vld1.8  d2, [r6]!  @ d2: 
chroma red line
 vld1.8  d3, [r10]! @ d3: 
chroma blue line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-subsr8, r8, #16@ width 
-= 16
-bgt 2b
-
-add r2, r2, r3 @ dst   
+= padding
-add r4, r4, r5 @ srcY  
+= paddingY
-
-.ifc \ifmt,nv12
+.macro increment_nv12
 add r11, r11, r3   @ dst2  
+= padding
 add r12, r12, r5   @ srcY2 
+= paddingY
-
 add r6, r6, r7 @ srcC  
+= paddingC
-
 subsr1, r1, #2 @ 
height -= 2
-.endif
-
-.ifc \ifmt,nv21
-add r11, r11, r3   @

[FFmpeg-devel] [PATCH v2 2/9] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p

2016-03-28 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index f40327b..aac0773 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -172,11 +172,10 @@
 vdup.16 d0, r10@ d0  = 
y_coeff
 vld1.16 {d1}, [r8] @ d1  = 
*table
 add r11, r2, r3@ r11 = 
dst + linesize (dst2)
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
-sub r7, r7, r0, lsr #1 @ r7 = 
linesizeU - width / 2 (paddingU)
-sub r12,r12,r0, lsr #1 @ r12 = 
linesizeV- width / 2 (paddingV)
+sub r3, r3, r0, lsl #2 @ r3  = 
linesize  - width * 4 (padding)
+sub r5, r5, r0 @ r5  = 
linesizeY - width (paddingY)
+sub r7, r7, r0, lsr #1 @ r7  = 
linesizeU - width / 2 (paddingU)
+sub r12,r12,r0, lsr #1 @ r12 = 
linesizeV - width / 2 (paddingV)
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
 
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 7/9] swscale/arm/yuv2rgb: re-order compute_rgba macro arguments

2016-03-28 Thread Matthieu Bouron
---
 libswscale/arm/yuv2rgb_neon.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index e1b68c1..124d7d3 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -56,8 +56,8 @@
 vqrshrun.s16\dst_comp2, q2, #6
 .endm
 
-.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
-compute_color   \r1, \r2, q8,  q9
+.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
+compute_color   \r1, \r2, q8, q9
 compute_color   \g1, \g2, q10, q11
 compute_color   \b1, \b2, q12, q13
 vmov.u8 \a1, #255
@@ -80,19 +80,19 @@
 
 
 .ifc \ofmt,argb
-compute_rgbad7, d11, d8, d12, d9, d13, d6, d10
+compute_rgbad7, d8, d9, d6, d11, d12, d13, d10
 .endif
 
 .ifc \ofmt,rgba
-compute_rgbad6, d10, d7, d11, d8, d12, d9, d13
+compute_rgbad6, d7, d8, d9, d10, d11, d12, d13
 .endif
 
 .ifc \ofmt,abgr
-compute_rgbad9, d13, d8, d12, d7, d11, d6, d10
+compute_rgbad9, d8, d7, d6, d13, d12, d11, d10
 .endif
 
 .ifc \ofmt,bgra
-compute_rgbad8, d12, d7, d11, d6, d10, d9, d13
+compute_rgbad8, d7, d6, d9, d12, d11, d10, d13
 .endif
 vst4.8  {q3, q4}, [\dst,:128]!
 vst4.8  {q5, q6}, [\dst,:128]!
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 4/9] swscale/arm/yuv2rgb: factorize lsl in load_args_yuv420p

2016-03-28 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 22864ec..4601a79 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -152,8 +152,7 @@
 add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
 lsl r3, r3, #1
 lsl r5, r5, #1
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
+sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
 sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 5/9] swscale/arm/yuv2rgb: factorize lsl in load_args_nvx

2016-03-28 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 4601a79..ef7b0a6 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -131,8 +131,7 @@
 add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
 lsl r3, r3, #1
 lsl r5, r5, #1
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
+sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
 sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
 sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
 .endm
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 1/9] swscale/arm/yuv2rgb: remove 32bit code path

2016-03-28 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/swscale_unscaled.c |  72 --
 libswscale/arm/yuv2rgb_neon.S | 156 --
 2 files changed, 66 insertions(+), 162 deletions(-)

diff --git a/libswscale/arm/swscale_unscaled.c 
b/libswscale/arm/swscale_unscaled.c
index 8aa933c..149208c 100644
--- a/libswscale/arm/swscale_unscaled.c
+++ b/libswscale/arm/swscale_unscaled.c
@@ -61,14 +61,14 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext 
*context, const uint8_t *src[
 return 0;
 }
 
-#define YUV_TO_RGB_TABLE(precision)
 \
-c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-
-#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)   
 \
-int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, 
 \
+#define YUV_TO_RGB_TABLE   
 \
+c->yuv2rgb_v2r_coeff / (1 << 7),   
 \
+c->yuv2rgb_u2g_coeff / (1 << 7),   
 \
+c->yuv2rgb_v2g_coeff / (1 << 7),   
 \
+c->yuv2rgb_u2b_coeff / (1 << 7),   
 \
+
+#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)  
 \
+int ff_##ifmt##_to_##ofmt##_neon(int w, int h, 
 \
  uint8_t *dst, int linesize,   
 \
  const uint8_t *srcY, int linesizeY,   
 \
  const uint8_t *srcU, int linesizeU,   
 \
@@ -77,37 +77,34 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
  int y_offset, 
 \
  int y_coeff); 
 \

 \
-static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, const 
uint8_t *src[], \
+static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t 
*src[], \
int srcStride[], int srcSliceY, int 
srcSliceH,   \
uint8_t *dst[], int dstStride[]) {  
 \
-const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };   
 \
+const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE };  
 \

 \
-ff_##ifmt##_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,   
 \
+ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH,   
 \
  dst[0] + srcSliceY * dstStride[0], 
dstStride[0],   \
  src[0], srcStride[0], 
 \
  src[1], srcStride[1], 
 \
  src[2], srcStride[2], 
 \
  yuv2rgb_table,
 \
  c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / ((precision) == 16 ? 1 
<< 7 : 1));\
+ c->yuv2rgb_y_coeff / (1 << 7));   
 \

 \
 return 0;  
 \
 }  
 \
 
-#define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx, precision) 
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, b

[FFmpeg-devel] [PATCH v2 3/9] swscale/arm/yuv2rgb: remove unused store of dst + linesize in load_args_yuv422p

2016-03-28 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index aac0773..22864ec 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -171,7 +171,6 @@
 ldr r10,[sp, #136] @ r10 = 
y_coeff
 vdup.16 d0, r10@ d0  = 
y_coeff
 vld1.16 {d1}, [r8] @ d1  = 
*table
-add r11, r2, r3@ r11 = 
dst + linesize (dst2)
 sub r3, r3, r0, lsl #2 @ r3  = 
linesize  - width * 4 (padding)
 sub r5, r5, r0 @ r5  = 
linesizeY - width (paddingY)
 sub r7, r7, r0, lsr #1 @ r7  = 
linesizeU - width / 2 (paddingU)
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: fix zero stride for OMX.allwinner.video.decoder.avc

2016-03-28 Thread Matthieu Bouron
On Mon, Mar 28, 2016 at 07:51:24PM +0300, Kirill Gavrilov wrote:
> ---
>  libavcodec/mediacodecdec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavcodec/mediacodecdec.c b/libavcodec/mediacodecdec.c
> index 5c1368f..c21ceba 100644
> --- a/libavcodec/mediacodecdec.c
> +++ b/libavcodec/mediacodecdec.c
> @@ -247,7 +247,7 @@ static int mediacodec_dec_parse_format(AVCodecContext 
> *avctx, MediaCodecDecConte
>  av_freep(&format);
>  return AVERROR_EXTERNAL;
>  }
> -s->stride = value >= 0 ? value : s->width;
> +s->stride = value > 0 ? value : s->width;
>  
>  if (!ff_AMediaFormat_getInt32(s->format, "slice-height", &value)) {
>  format = ff_AMediaFormat_toString(s->format);
> -- 
> 2.6.1.windows.1

Applied, thanks.

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v2 9/9] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-28 Thread Matthieu Bouron
---
 libswscale/arm/swscale_unscaled.c | 18 +-
 libswscale/arm/yuv2rgb_neon.S | 40 +--
 2 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/libswscale/arm/swscale_unscaled.c 
b/libswscale/arm/swscale_unscaled.c
index 149208c..e1597ab 100644
--- a/libswscale/arm/swscale_unscaled.c
+++ b/libswscale/arm/swscale_unscaled.c
@@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext 
*context, const uint8_t *src[
 }
 
 #define YUV_TO_RGB_TABLE   
 \
-c->yuv2rgb_v2r_coeff / (1 << 7),   
 \
-c->yuv2rgb_u2g_coeff / (1 << 7),   
 \
-c->yuv2rgb_v2g_coeff / (1 << 7),   
 \
-c->yuv2rgb_u2b_coeff / (1 << 7),   
 \
+c->yuv2rgb_v2r_coeff,  
 \
+c->yuv2rgb_u2g_coeff,  
 \
+c->yuv2rgb_v2g_coeff,  
 \
+c->yuv2rgb_u2b_coeff,  
 \
 
 #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)  
 \
 int ff_##ifmt##_to_##ofmt##_neon(int w, int h, 
 \
@@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, 
const uint8_t *src[],
  src[1], srcStride[1], 
 \
  src[2], srcStride[2], 
 \
  yuv2rgb_table,
 \
- c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / (1 << 7));   
 \
+ c->yuv2rgb_y_offset >> 6, 
 \
+ c->yuv2rgb_y_coeff);  
 \

 \
 return 0;  
 \
 }  
 \
@@ -117,12 +117,12 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, 
const uint8_t *src[],
uint8_t *dst[], int dstStride[]) {  
 \
 const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE };  
 \

 \
-ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH,   
 \
+ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH,   
 \
  dst[0] + srcSliceY * dstStride[0], 
dstStride[0],   \
  src[0], srcStride[0], src[1], srcStride[1],   
 \
  yuv2rgb_table,
 \
- c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / (1 << 7));   
 \
+ c->yuv2rgb_y_offset >> 6, 
 \
+ c->yuv2rgb_y_coeff);  
 \

 \
 return 0;  
 \
 }  
 \
diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 6b911c8..741928d 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -23,17 +23,20 @@
 
 
 .macro compute_premult
-vmul.s16q8, q15, d1[0] @ q8  = 
V * v2r
-vmul.s16q9, q14, d1[1] @ q9  = 
U * u2g
-vmla.s16q9, q15, d1[2] @ q9  = 
U * u2g + V * v2g
-vmul.s16q10,q14, d1[3] @ q10 = 
U * u2b
+vsub.u16q14,q11@ q14 = 
U * (1 << 3) - 128 * (1 << 3)
+vsub.u16q15,q11@ q15 = 
V * (1 << 3) - 128 * (1 << 3)
+vqdmulh.s16 q8, q15, d1[0]  

[FFmpeg-devel] [PATCH v2 8/9] swscale/arm/yuv2rgb: save a few instructions by processing the luma line interleaved

2016-03-28 Thread Matthieu Bouron
---
 libswscale/arm/yuv2rgb_neon.S | 88 +--
 1 file changed, 34 insertions(+), 54 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 124d7d3..6b911c8 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -22,62 +22,35 @@
 #include "libavutil/arm/asm.S"
 
 
-.macro compute_premult half_u1, half_u2, half_v1, half_v2
-vmovd2, \half_u1   @ copy 
left q14 to left q1
-vmovd3, \half_u1   @ copy 
left q14 to right q1
-vmovd4, \half_u2   @ copy 
right q14 to left q2
-vmovd5, \half_u2   @ copy 
right q14 to right q2
-
-vmovd6, \half_v1   @ copy 
left q15 to left q3
-vmovd7, \half_v1   @ copy 
left q15 to right q3
-vmovd8, \half_v2   @ copy 
right q15 to left q4
-vmovd9, \half_v2   @ copy 
right q15 to right q4
-
-vzip.16 d2, d3 @ 
U1U1U2U2U3U3U4U4
-vzip.16 d4, d5 @ 
U5U5U6U6U7U7U8U8
-
-vzip.16 d6, d7 @ 
V1V1V2V2V3V3V4V4
-vzip.16 d8, d9 @ 
V5V5V6V6V7V7V8V8
-
-vmul.s16q8,  q3, d1[0] @  V * 
v2r (left,  red)
-vmul.s16q9,  q4, d1[0] @  V * 
v2r (right, red)
-vmul.s16q10, q1, d1[1] @  U * 
u2g
-vmul.s16q11, q2, d1[1] @  U * 
u2g
-vmla.s16q10, q3, d1[2] @  U * 
u2g + V * v2g   (left,  green)
-vmla.s16q11, q4, d1[2] @  U * 
u2g + V * v2g   (right, green)
-vmul.s16q12, q1, d1[3] @  U * 
u2b (left,  blue)
-vmul.s16q13, q2, d1[3] @  U * 
u2b (right, blue)
+.macro compute_premult
+vmul.s16q8, q15, d1[0] @ q8  = 
V * v2r
+vmul.s16q9, q14, d1[1] @ q9  = 
U * u2g
+vmla.s16q9, q15, d1[2] @ q9  = 
U * u2g + V * v2g
+vmul.s16q10,q14, d1[3] @ q10 = 
U * u2b
 .endm
 
-.macro compute_color dst_comp1 dst_comp2 pre1 pre2
-vadd.s16q1, q14, \pre1
-vadd.s16q2, q15, \pre2
+.macro compute_color dst_comp1 dst_comp2 pre
+vadd.s16q1, q14, \pre
+vadd.s16q2, q15, \pre
 vqrshrun.s16\dst_comp1, q1, #6
 vqrshrun.s16\dst_comp2, q2, #6
 .endm
 
 .macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
-compute_color   \r1, \r2, q8, q9
-compute_color   \g1, \g2, q10, q11
-compute_color   \b1, \b2, q12, q13
+compute_color   \r1, \r2, q8
+compute_color   \g1, \g2, q9
+compute_color   \b1, \b2, q10
 vmov.u8 \a1, #255
 vmov.u8 \a2, #255
 .endm
 
-.macro compute dst y0 y1 ofmt
-vmovl.u8q14, \y0   @ 8px 
of y
-vmovl.u8q15, \y1   @ 8px 
of y
-
-vdup.16 q5, r9 @ q5  = 
y_offset
-vmovd14, d0@ q7  = 
y_coeff
-vmovd15, d0@ q7  = 
y_coeff
-
-vsub.s16q14, q5
-vsub.s16q15, q5
-
-vmul.s16q14, q7@ q14 = 
(srcY - y_offset) * y_coeff (left)
-vmul.s16q15, q7@ q15 = 
(srcY - y_offset) * y_coeff (right)
-
+.macro compute dst ofmt
+vmovl.u8q14, d14   @ q14 = 
Y
+vmovl.u8q15, d15   @ q15 = 
Y
+vsub.s16q14, q12   @ q14 = 
(srcY - y_offset)
+vsub.s16q15, q12   @ q15 = 
(srcY - y_offset)
+vmul.s16q14, q13   @ q14 = 
(srcY - y_offset) * y_coeff (left)
+vmul.s16q15, q13   @ q15 = 
(srcY - y_offset) * y_coeff (right)
 
 .ifc \ofmt,argb
 compute_rgba  

Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-28 Thread Matthieu Bouron
On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron 
wrote:

>
>
> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron <
> matthieu.bou...@gmail.com> wrote:
>
>> The following patchset aims to make bitexact the yuv->rgba armv7 neon
>> code path
>> with the aarch64 one. It also aims to make the two code bases as close as
>> possible.
>>
>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path
>>
>> The current 32bit code path which is unused is removed.
>>
>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
>>
>> The code process only one line at a time for the yuv420p,nv12 and nv21
>> formats
>> with no regression in performance observed on a rpi2 (I've even observed a
>> slight increase of performance for the nv12 and nv21 formats).
>>
>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its
>>
>> The last patch of the serie makes the code bitexact with the aarch64
>> version.
>> The increase of precision (which introduces a performance loss) is
>> compensated
>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh.
>>
>> ./ffmpeg_g -nostats -f lavfi -i
>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
>> null -
>>
>> without patchset :
>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605
>>
>> with patchset:
>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884
>
>
> I've managed tu run the code on a beagle bone black board, here are the
> results:
>
> nv12->bgra
> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600
> min:0.011513
> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659
> max:0.034427 min:0.013411
> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751
> max:0.034288 min:0.012523
>
> yuv420p->bgra
> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866
> min:0.012945
> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 avg:0.015358
> max:0.036186 min:0.015134
> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 avg:0.014784
> max:0.035487 min:0.014568
>
> So it looks like processing one line at a time as negative effect on
> performance on this board (as opposed to the rpi2). I'll try to keep the
> two line processing code and post some result (so we can decide, which
> version to choose).
>

I've managed to update the patchset to keep processing two line at a time
for the nv12,nv21 and yuv420p formats, here are the results:

./ffmpeg_g -nostats -f lavfi -i
testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
null -

Beagle bone black:
without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600
min:0.011513
with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 max:0.034288
min:0.012523
with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 max:0.032124
min:0.011202

Nexus5:
without patchset: avg: ~2,869ms
with patchset v1: avg: ~3,008ms
with patchset v2: avg: ~2,702ms

RPI2:
without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399
min:0.020605
with patchset v1:  [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472
min:0.01884
with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184
min:0.020768

Given the following the results, i will drop the current patchset and
submit another one (which keeps processing two lines at a time).

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: fix zero stride for OMX.allwinner.video.decoder.avc

2016-03-28 Thread Matthieu Bouron
On Sun, Mar 27, 2016 at 11:15 PM, Kirill Gavrilov 
wrote:

> Hi,
>

Hi,


>
> on my device ("OMX.allwinner.video.decoder.avc") returned stride property
> is always 0.
> I have found that stride is overridden for "OMX.SEC.avc.dec" and prepared
> the similar patch.
> But probably it is better to change comparison at the line above to "value
> > 0"?
> >s->stride = value >= 0 ? value : s->width
>

I think it would be better to change the comparaison line. Can you send the
relevant patch ?

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-27 Thread Matthieu Bouron
On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron  wrote:

> The following patchset aims to make bitexact the yuv->rgba armv7 neon code
> path
> with the aarch64 one. It also aims to make the two code bases as close as
> possible.
>
> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path
>
> The current 32bit code path which is unused is removed.
>
> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
>
> The code process only one line at a time for the yuv420p,nv12 and nv21
> formats
> with no regression in performance observed on a rpi2 (I've even observed a
> slight increase of performance for the nv12 and nv21 formats).
>
> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its
>
> The last patch of the serie makes the code bitexact with the aarch64
> version.
> The increase of precision (which introduces a performance loss) is
> compensated
> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh.
>
> ./ffmpeg_g -nostats -f lavfi -i
> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f
> null -
>
> without patchset :
> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605
>
> with patchset:
> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884


I've managed tu run the code on a beagle bone black board, here are the
results:

nv12->bgra
without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 max:0.032600
min:0.011513
with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 avg:0.013659
max:0.034427 min:0.013411
with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751
max:0.034288 min:0.012523

yuv420p->bgra
without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 max:0.033866
min:0.012945
with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 avg:0.015358
max:0.036186 min:0.015134
with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 avg:0.014784
max:0.035487 min:0.014568

So it looks like processing one line at a time as negative effect on
performance on this board (as opposed to the rpi2). I'll try to keep the
two line processing code and post some result (so we can decide, which
version to choose).

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-26 Thread Matthieu Bouron
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron  wrote:

> From: Matthieu Bouron 
>
> ---
>  libswscale/arm/swscale_unscaled.c | 16 +++
>  libswscale/arm/yuv2rgb_neon.S | 89
> +--
>  2 files changed, 47 insertions(+), 58 deletions(-)
>
>
Patch updated (resolve a conflict with the updated version of patch 06/10).
From 24b2371eb5ea859b2a68ef1ee3cf9a0098d9375a Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Wed, 23 Mar 2016 16:51:20 +
Subject: [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its
 aarch64 counter part

---
 libswscale/arm/swscale_unscaled.c | 16 +++
 libswscale/arm/yuv2rgb_neon.S | 89 +--
 2 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/libswscale/arm/swscale_unscaled.c b/libswscale/arm/swscale_unscaled.c
index 149208c..1986d65 100644
--- a/libswscale/arm/swscale_unscaled.c
+++ b/libswscale/arm/swscale_unscaled.c
@@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext *context, const uint8_t *src[
 }
 
 #define YUV_TO_RGB_TABLE\
-c->yuv2rgb_v2r_coeff / (1 << 7),\
-c->yuv2rgb_u2g_coeff / (1 << 7),\
-c->yuv2rgb_v2g_coeff / (1 << 7),\
-c->yuv2rgb_u2b_coeff / (1 << 7),\
+c->yuv2rgb_v2r_coeff,   \
+c->yuv2rgb_u2g_coeff,   \
+c->yuv2rgb_v2g_coeff,   \
+c->yuv2rgb_u2b_coeff,   \
 
 #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)   \
 int ff_##ifmt##_to_##ofmt##_neon(int w, int h,  \
@@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[],
  src[1], srcStride[1],  \
  src[2], srcStride[2],  \
  yuv2rgb_table, \
- c->yuv2rgb_y_offset >> 9,  \
- c->yuv2rgb_y_coeff / (1 << 7));\
+ c->yuv2rgb_y_offset >> 6,  \
+ c->yuv2rgb_y_coeff);   \
 \
 return 0;   \
 }   \
@@ -121,8 +121,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[],
  dst[0] + srcSliceY * dstStride[0], dstStride[0],   \
  src[0], srcStride[0], src[1], srcStride[1],\
  yuv2rgb_table, \
- c->yuv2rgb_y_offset >> 9,  \
- c->yuv2rgb_y_coeff / (1 << 7));\
+ c->yuv2rgb_y_offset >> 6,  \
+ c->yuv2rgb_y_coeff);   \
 \
 return 0;   \
 }   \
diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 4a5ce11..bd994e3 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -68,14 +68,14 @@
 
 .macro load_chroma_nv12
 vld2.8  {d2, d3}, [r6]!@ q1: interleaved chroma line
-vsubl.u8q14, d2, d10   @ q14 = U - 128
-vsubl.u8q15, d3, d10   @ q15 = V - 128
+vshll.u8q14, d2, #3@ q14 = U * (1 << 3)
+vshll.u8   

Re: [FFmpeg-devel] [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the compute_rgba macro

2016-03-26 Thread Matthieu Bouron
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron  wrote:

> From: Matthieu Bouron 
>
> ---
>  libswscale/arm/yuv2rgb_neon.S | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>

Patch updated (resolve a conflict with the updated version of patch 06/10).
From 41b0ff49706d82ef964faa75888e95d86f69df34 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 25 Mar 2016 15:38:37 +
Subject: [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the
 compute_rgba macro

---
 libswscale/arm/yuv2rgb_neon.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 6a15778..4a5ce11 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -123,7 +123,7 @@
 vqrshrun.s16\dst_comp2, q2, #6
 .endm
 
-.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
+.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
 compute_color   \r1, \r2, q8,  q9
 compute_color   \g1, \g2, q10, q11
 compute_color   \b1, \b2, q12, q13
@@ -178,19 +178,19 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right)
 
 .ifc \ofmt,argb
-compute_rgbad7, d11, d8, d12, d9, d13, d6, d10
+compute_rgbad7, d8, d9, d6, d11, d12, d13, d10
 .endif
 
 .ifc \ofmt,rgba
-compute_rgbad6, d10, d7, d11, d8, d12, d9, d13
+compute_rgbad6, d7, d8, d9, d10, d11, d12, d13
 .endif
 
 .ifc \ofmt,abgr
-compute_rgbad9, d13, d8, d12, d7, d11, d6, d10
+compute_rgbad9, d8, d7, d6, d13, d12, d11, d10
 .endif
 
 .ifc \ofmt,bgra
-compute_rgbad8, d12, d7, d11, d6, d10, d9, d13
+compute_rgbad8, d7, d6, d9, d12, d11, d10, d13
 .endif
 
 vst4.8  {q3, q4}, [r2,:128]!
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its aarch64 counter part

2016-03-26 Thread Matthieu Bouron
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron  wrote:

> From: Matthieu Bouron 
>
> ---
>  libswscale/arm/yuv2rgb_neon.S | 154
> +++---
>  1 file changed, 69 insertions(+), 85 deletions(-)
>

Patch updated (resolve a conflict with the updated version of patch 06/10).
From d06a5437f9042e0b350556e9642d52866284e7a8 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Wed, 23 Mar 2016 14:10:45 +
Subject: [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its
 aarch64 counter part

---
 libswscale/arm/yuv2rgb_neon.S | 154 +++---
 1 file changed, 69 insertions(+), 85 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 6279637..6a15778 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -21,90 +21,6 @@
 
 #include "libavutil/arm/asm.S"
 
-
-.macro compute_premult half_u1, half_u2, half_v1, half_v2
-vmovd2, \half_u1   @ copy left q14 to left q1
-vmovd3, \half_u1   @ copy left q14 to right q1
-vmovd4, \half_u2   @ copy right q14 to left q2
-vmovd5, \half_u2   @ copy right q14 to right q2
-
-vmovd6, \half_v1   @ copy left q15 to left q3
-vmovd7, \half_v1   @ copy left q15 to right q3
-vmovd8, \half_v2   @ copy right q15 to left q4
-vmovd9, \half_v2   @ copy right q15 to right q4
-
-vzip.16 d2, d3 @ U1U1U2U2U3U3U4U4
-vzip.16 d4, d5 @ U5U5U6U6U7U7U8U8
-
-vzip.16 d6, d7 @ V1V1V2V2V3V3V4V4
-vzip.16 d8, d9 @ V5V5V6V6V7V7V8V8
-
-vmul.s16q8,  q3, d1[0] @  V * v2r (left,  red)
-vmul.s16q9,  q4, d1[0] @  V * v2r (right, red)
-vmul.s16q10, q1, d1[1] @  U * u2g
-vmul.s16q11, q2, d1[1] @  U * u2g
-vmla.s16q10, q3, d1[2] @  U * u2g + V * v2g   (left,  green)
-vmla.s16q11, q4, d1[2] @  U * u2g + V * v2g   (right, green)
-vmul.s16q12, q1, d1[3] @  U * u2b (left,  blue)
-vmul.s16q13, q2, d1[3] @  U * u2b (right, blue)
-.endm
-
-.macro compute_color dst_comp1 dst_comp2 pre1 pre2
-vadd.s16q1, q14, \pre1
-vadd.s16q2, q15, \pre2
-vqrshrun.s16\dst_comp1, q1, #6
-vqrshrun.s16\dst_comp2, q2, #6
-.endm
-
-.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
-compute_color   \r1, \r2, q8,  q9
-compute_color   \g1, \g2, q10, q11
-compute_color   \b1, \b2, q12, q13
-vmov.u8 \a1, #255
-vmov.u8 \a2, #255
-.endm
-
-.macro compute_16px dst y0 y1 ofmt
-vmovl.u8q14, \y0   @ 8px of y
-vmovl.u8q15, \y1   @ 8px of y
-
-vdup.16 q5, r9 @ q5  = y_offset
-vmovd14, d0@ q7  = y_coeff
-vmovd15, d0@ q7  = y_coeff
-
-vsub.s16q14, q5
-vsub.s16q15, q5
-
-vmul.s16q14, q7@ q14 = (srcY - y_offset) * y_coeff (left)
-vmul.s16q15, q7@ q15 = (srcY - y_offset) * y_coeff (right)
-
-
-.ifc \ofmt,argb
-compute_rgbad7, d11, d8, d12, d9, d13, d6, d10
-.endif
-
-.ifc \ofmt,rgba
-compute_rgbad6, d10, d7, d11, d8, d12, d9, d13
-.endif
-
-.ifc \ofmt,abgr
-compute_rgbad9, d13, d8, d12, d7, d11, d6, d10
-.endif
-
-.ifc \ofmt,bgra
-compute_rgbad8, d12, d7, d11, d6, d10, d9, d13
-.endif
-vst4.8  {q3, q4}, [\dst,:128]!
-vst4.8  {q5, q6}, [\dst,:128]!
-
-.endm
-
-.macro process_1l_16px ofmt
-compute_premult d28, d29, d30, d31
-vld1.8  {q7}, [r4]!
-compute_16pxr2, d14, d15, \ofmt
-.endm
-
 .macro load_args_nv12
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -200,6 +116,21 @@
 

Re: [FFmpeg-devel] [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify

2016-03-26 Thread Matthieu Bouron
On Fri, Mar 25, 2016 at 11:46 PM, Matthieu Bouron  wrote:

> From: Matthieu Bouron 
>
> ---
>  libswscale/arm/yuv2rgb_neon.S | 115
> ++
>  1 file changed, 39 insertions(+), 76 deletions(-)
>

[...]

Patch updated (resolve a conflict with the updated version of patch 06/10).
From f8a7db56aba4b38089698c2f87583b071d03bf29 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Wed, 23 Mar 2016 13:51:10 +
Subject: [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify

---
 libswscale/arm/yuv2rgb_neon.S | 116 ++
 1 file changed, 39 insertions(+), 77 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 6aeccae..6279637 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,7 +105,7 @@
 compute_16pxr2, d14, d15, \ofmt
 .endm
 
-.macro load_args_nvx
+.macro load_args_nv12
 push{r4-r12, lr}
 vpush   {q4-q7}
 ldr r4, [sp, #104] @ r4  = srcY
@@ -122,6 +122,10 @@
 sub r7, r7, r0 @ r7 = linesizeC - width (paddingC)
 .endm
 
+.macro load_args_nv21
+load_args_nv12
+.endm
+
 .macro load_args_yuv420p
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -146,116 +150,74 @@
 load_args_yuv420p
 .endm
 
-.macro declare_func ifmt ofmt
-function ff_\ifmt\()_to_\ofmt\()_neon, export=1
-
-.ifc \ifmt,nv12
-load_args_nvx
-.endif
-
-.ifc \ifmt,nv21
-load_args_nvx
-.endif
-
-.ifc \ifmt,yuv420p
-load_args_yuv420p
-.endif
-
-
-.ifc \ifmt,yuv422p
-load_args_yuv422p
-.endif
-
-1:
-mov r8, r0 @ r8 = width
-2:
-pld [r6, #64*3]
-pld [r4, #64*3]
-
-vmov.i8 d10, #128
-
-.ifc \ifmt,nv12
+.macro load_chroma_nv12
 vld2.8  {d2, d3}, [r6]!@ q1: interleaved chroma line
 vsubl.u8q14, d2, d10   @ q14 = U - 128
 vsubl.u8q15, d3, d10   @ q15 = V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,nv21
+.macro load_chroma_nv21
 vld2.8  {d2, d3}, [r6]!@ q1: interleaved chroma line
 vsubl.u8q14, d3, d10   @ q14 = U - 128
 vsubl.u8q15, d2, d10   @ q15 = V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv420p
-pld [r10, #64*3]
-
-vld1.8  d2, [r6]!  @ d2: chroma red line
-vld1.8  d3, [r10]! @ d3: chroma blue line
-vsubl.u8q14, d2, d10   @ q14 = U - 128
-vsubl.u8q15, d3, d10   @ q15 = V - 128
-
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv422p
+.macro load_chroma_yuv420p
 pld [r10, #64*3]
 
 vld1.8  d2, [r6]!  @ d2: chroma red line
 vld1.8  d3, [r10]! @ d3: chroma blue line
 vsubl.u8q14, d2, d10   @ q14 = U - 128
 vsubl.u8q15, d3, d10   @ q15 = V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-subsr8, r8, #16@ width -= 16
-bgt 2b
-
-add r2, r2, r3 @ dst   += padding
-add r4, r4, r5 @ srcY  += paddingY
-
-.ifc \ifmt,nv12
-tst r1, #1
-ite eq
-subeq   r6, r6, r0 @ if (height % 2 == 0) paddingU -= width
-addne   r6, r7 @ else paddingU += linesizeU - width
-
-subsr1, r1, #1 @ height -= 1
-.endif
+.macro load_chroma_yuv422p
+load_chroma_yuv420p
+.endm
 
-.ifc \ifmt,nv21
+.macro increment_nv12
 tst r1, #1
 ite eq
 subeq   r6, r6, r0 @ if (height % 2 == 0) paddingU -= width
 addne   r6, r7 @ else paddingU += linesizeU - width
+.endm
 
-subsr1, r1, #1 @ height -= 1
-.endif
+.macro increment_nv21
+increment_nv12
+.endm
 
-.ifc \ifmt,yuv420p
+.macro increment_yuv420p
 tst r1, #1
 itete   eq
 subeq

Re: [FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

2016-03-26 Thread Matthieu Bouron
On Sat, Mar 26, 2016 at 2:09 AM, Michael Niedermayer  wrote:

> On Fri, Mar 25, 2016 at 11:46:01PM +0100, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> >
> > ---
> >  libswscale/arm/yuv2rgb_neon.S | 89
> ---
> >  1 file changed, 24 insertions(+), 65 deletions(-)
>
> breaks build
>
>  make distclean ; ../configure --cross-prefix=/usr/arm-linux-gnueabi/bin/
> --cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon
> -mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux
> --enable-cross-compile && make -j12
>
> CC  libavutil/arm/float_dsp_init_arm.o
> src/libswscale/arm/yuv2rgb_neon.S: Assembler messages:
> src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> instruction should be in IT block -- `subeq r6,r6,r0'
> src/libswscale/arm/yuv2rgb_neon.S:269: Error: thumb conditional
> instruction should be in IT block -- `addne r6,r7'
>

[...]

Patch updated with the relevant it instructions added. It still does build
on my rpi2 setup but is not tested on the same setup as yours.
Can you confirm it builds/works on your setup ?

If it works, i will send an updated version of the next patch (07/10) to
resolve the conflicts.

Matthieu
From 7b3a405b2b483fb16f549b69ce6f21d8a946 Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Wed, 23 Mar 2016 11:26:13 +
Subject: [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time
 for the yuv420p and nv{12,21} formats

---
 libswscale/arm/yuv2rgb_neon.S | 92 +--
 1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..6aeccae 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,16 +105,6 @@
 compute_16pxr2, d14, d15, \ofmt
 .endm
 
-.macro process_2l_16px ofmt
-compute_premult d28, d29, d30, d31
-
-vld1.8  {q7}, [r4]!@ first line of luma
-compute_16pxr2, d14, d15, \ofmt
-
-vld1.8  {q7}, [r12]!   @ second line of luma
-compute_16pxr11, d14, d15, \ofmt
-.endm
-
 .macro load_args_nvx
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -127,13 +117,9 @@
 ldr r10,[sp, #128] @ r10 = y_coeff
 vdup.16 d0, r10@ d0  = y_coeff
 vld1.16 {d1}, [r8] @ d1  = *table
-add r11, r2, r3@ r11 = dst + linesize (dst2)
-add r12, r4, r5@ r12 = srcY + linesizeY (srcY2)
-lsl r3, r3, #1
-lsl r5, r5, #1
-sub r3, r3, r0, lsl #2 @ r3 = linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY)
-sub r7, r7, r0 @ r7 = linesizeC - width (paddingC)
+sub r3, r3, r0, lsl #2 @ r3 = linesize  - width * 4 (padding)
+sub r5, r5, r0 @ r5 = linesizeY - width (paddingY)
+sub r7, r7, r0 @ r7 = linesizeC - width (paddingC)
 .endm
 
 .macro load_args_yuv420p
@@ -142,26 +128,6 @@
 ldr r4, [sp, #104] @ r4  = srcY
 ldr r5, [sp, #108] @ r5  = linesizeY
 ldr r6, [sp, #112] @ r6  = srcU
-ldr r8, [sp, #128] @ r8  = table
-ldr r9, [sp, #132] @ r9  = y_offset
-ldr r10,[sp, #136] @ r10 = y_coeff
-vdup.16 d0, r10@ d0  = y_coeff
-vld1.16 {d1}, [r8] @ d1  = *table
-add r11, r2, r3@ r11 = dst + linesize (dst2)
-add r12, r4, r5@ r12 = srcY + linesizeY (srcY2)
-lsl r3, r3, #1
-lsl r5, r5, #1
-sub r3, r3, r0, lsl #2 @ r3 = linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = linesizeY * 2 - width (paddingY)
-ldr r10,[sp, #120]  

[FFmpeg-devel] [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time for the yuv420p and nv{12, 21} formats

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 89 ---
 1 file changed, 24 insertions(+), 65 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index ef7b0a6..8abb986 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,16 +105,6 @@
 compute_16pxr2, d14, d15, \ofmt
 .endm
 
-.macro process_2l_16px ofmt
-compute_premult d28, d29, d30, d31
-
-vld1.8  {q7}, [r4]!@ first 
line of luma
-compute_16pxr2, d14, d15, \ofmt
-
-vld1.8  {q7}, [r12]!   @ 
second line of luma
-compute_16pxr11, d14, d15, \ofmt
-.endm
-
 .macro load_args_nvx
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -127,13 +117,9 @@
 ldr r10,[sp, #128] @ r10 = 
y_coeff
 vdup.16 d0, r10@ d0  = 
y_coeff
 vld1.16 {d1}, [r8] @ d1  = 
*table
-add r11, r2, r3@ r11 = 
dst + linesize (dst2)
-add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
-lsl r3, r3, #1
-lsl r5, r5, #1
-sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
-sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
+sub r3, r3, r0, lsl #2 @ r3 = 
linesize  - width * 4 (padding)
+sub r5, r5, r0 @ r5 = 
linesizeY - width (paddingY)
+sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
 .endm
 
 .macro load_args_yuv420p
@@ -142,26 +128,6 @@
 ldr r4, [sp, #104] @ r4  = 
srcY
 ldr r5, [sp, #108] @ r5  = 
linesizeY
 ldr r6, [sp, #112] @ r6  = 
srcU
-ldr r8, [sp, #128] @ r8  = 
table
-ldr r9, [sp, #132] @ r9  = 
y_offset
-ldr r10,[sp, #136] @ r10 = 
y_coeff
-vdup.16 d0, r10@ d0  = 
y_coeff
-vld1.16 {d1}, [r8] @ d1  = 
*table
-add r11, r2, r3@ r11 = 
dst + linesize (dst2)
-add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
-lsl r3, r3, #1
-lsl r5, r5, #1
-sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
-ldr r10,[sp, #120] @ r10 = 
srcV
-.endm
-
-.macro load_args_yuv422p
-push{r4-r12, lr}
-vpush   {q4-q7}
-ldr r4, [sp, #104] @ r4  = 
srcY
-ldr r5, [sp, #108] @ r5  = 
linesizeY
-ldr r6, [sp, #112] @ r6  = 
srcU
 ldr r7, [sp, #116] @ r7  = 
linesizeU
 ldr r12,[sp, #124] @ r12 = 
linesizeV
 ldr r8, [sp, #128] @ r8  = 
table
@@ -176,6 +142,10 @@
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
 
+.macro load_args_yuv422p
+load_args_yuv420p
+.endm
+
 .macro declare_func ifmt ofmt
 function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 
@@ -205,35 +175,30 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 vmov.i8 d10, #128
 
 .ifc \ifmt,nv12
-pld [r12, #64*3]
-
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
 
-process_2l_16px \ofmt
+process_1l_16px \ofmt
 .endif
 
 .ifc \ifmt,nv21
-pld [r12, #64*3]
-
 vld2.8  {d2, d3}, [r6]!@ q1

[FFmpeg-devel] [PATCH 04/10] swscale/arm/yuv2rgb: factorize lsl in load_args_yuv420p

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 22864ec..4601a79 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -152,8 +152,7 @@
 add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
 lsl r3, r3, #1
 lsl r5, r5, #1
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
+sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
 sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 03/10] swscale/arm/yuv2rgb: remove unused store of dst + linesize in load_args_yuv422p

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index aac0773..22864ec 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -171,7 +171,6 @@
 ldr r10,[sp, #136] @ r10 = 
y_coeff
 vdup.16 d0, r10@ d0  = 
y_coeff
 vld1.16 {d1}, [r8] @ d1  = 
*table
-add r11, r2, r3@ r11 = 
dst + linesize (dst2)
 sub r3, r3, r0, lsl #2 @ r3  = 
linesize  - width * 4 (padding)
 sub r5, r5, r0 @ r5  = 
linesizeY - width (paddingY)
 sub r7, r7, r0, lsr #1 @ r7  = 
linesizeU - width / 2 (paddingU)
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 08/10] swscale/arm/yuv2rgb: re-organize the code like its aarch64 counter part

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 154 +++---
 1 file changed, 69 insertions(+), 85 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index f77f534..03d15cb 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -21,90 +21,6 @@
 
 #include "libavutil/arm/asm.S"
 
-
-.macro compute_premult half_u1, half_u2, half_v1, half_v2
-vmovd2, \half_u1   @ copy 
left q14 to left q1
-vmovd3, \half_u1   @ copy 
left q14 to right q1
-vmovd4, \half_u2   @ copy 
right q14 to left q2
-vmovd5, \half_u2   @ copy 
right q14 to right q2
-
-vmovd6, \half_v1   @ copy 
left q15 to left q3
-vmovd7, \half_v1   @ copy 
left q15 to right q3
-vmovd8, \half_v2   @ copy 
right q15 to left q4
-vmovd9, \half_v2   @ copy 
right q15 to right q4
-
-vzip.16 d2, d3 @ 
U1U1U2U2U3U3U4U4
-vzip.16 d4, d5 @ 
U5U5U6U6U7U7U8U8
-
-vzip.16 d6, d7 @ 
V1V1V2V2V3V3V4V4
-vzip.16 d8, d9 @ 
V5V5V6V6V7V7V8V8
-
-vmul.s16q8,  q3, d1[0] @  V * 
v2r (left,  red)
-vmul.s16q9,  q4, d1[0] @  V * 
v2r (right, red)
-vmul.s16q10, q1, d1[1] @  U * 
u2g
-vmul.s16q11, q2, d1[1] @  U * 
u2g
-vmla.s16q10, q3, d1[2] @  U * 
u2g + V * v2g   (left,  green)
-vmla.s16q11, q4, d1[2] @  U * 
u2g + V * v2g   (right, green)
-vmul.s16q12, q1, d1[3] @  U * 
u2b (left,  blue)
-vmul.s16q13, q2, d1[3] @  U * 
u2b (right, blue)
-.endm
-
-.macro compute_color dst_comp1 dst_comp2 pre1 pre2
-vadd.s16q1, q14, \pre1
-vadd.s16q2, q15, \pre2
-vqrshrun.s16\dst_comp1, q1, #6
-vqrshrun.s16\dst_comp2, q2, #6
-.endm
-
-.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
-compute_color   \r1, \r2, q8,  q9
-compute_color   \g1, \g2, q10, q11
-compute_color   \b1, \b2, q12, q13
-vmov.u8 \a1, #255
-vmov.u8 \a2, #255
-.endm
-
-.macro compute_16px dst y0 y1 ofmt
-vmovl.u8q14, \y0   @ 8px 
of y
-vmovl.u8q15, \y1   @ 8px 
of y
-
-vdup.16 q5, r9 @ q5  = 
y_offset
-vmovd14, d0@ q7  = 
y_coeff
-vmovd15, d0@ q7  = 
y_coeff
-
-vsub.s16q14, q5
-vsub.s16q15, q5
-
-vmul.s16q14, q7@ q14 = 
(srcY - y_offset) * y_coeff (left)
-vmul.s16q15, q7@ q15 = 
(srcY - y_offset) * y_coeff (right)
-
-
-.ifc \ofmt,argb
-compute_rgbad7, d11, d8, d12, d9, d13, d6, d10
-.endif
-
-.ifc \ofmt,rgba
-compute_rgbad6, d10, d7, d11, d8, d12, d9, d13
-.endif
-
-.ifc \ofmt,abgr
-compute_rgbad9, d13, d8, d12, d7, d11, d6, d10
-.endif
-
-.ifc \ofmt,bgra
-compute_rgbad8, d12, d7, d11, d6, d10, d9, d13
-.endif
-vst4.8  {q3, q4}, [\dst,:128]!
-vst4.8  {q5, q6}, [\dst,:128]!
-
-.endm
-
-.macro process_1l_16px ofmt
-compute_premult d28, d29, d30, d31
-vld1.8  {q7}, [r4]!
-compute_16pxr2, d14, d15, \ofmt
-.endm
-
 .macro load_args_nv12
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -198,6 +114,21 @@
 add r10,r10,r12@ srcV  
+= paddingV
 .endm
 
+.macro compute_color dst_comp1 dst_comp2 pre1 pre2
+vadd.s16q1, q14, \pre1
+vadd.s16q2, q15, \pre2
+vqrshrun.s16\dst_comp1, q1, #6
+vqrshrun.s16\dst_comp2, q2, #6
+.endm
+
+.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
+compute_color   \r1, \r2, q8,  q9
+compute_color   \g1, \g2, q10, q11
+compute_color   \b1, \b2, q12, q13
+ 

[FFmpeg-devel] [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/swscale_unscaled.c |  72 --
 libswscale/arm/yuv2rgb_neon.S | 156 --
 2 files changed, 66 insertions(+), 162 deletions(-)

diff --git a/libswscale/arm/swscale_unscaled.c 
b/libswscale/arm/swscale_unscaled.c
index 8aa933c..149208c 100644
--- a/libswscale/arm/swscale_unscaled.c
+++ b/libswscale/arm/swscale_unscaled.c
@@ -61,14 +61,14 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext 
*context, const uint8_t *src[
 return 0;
 }
 
-#define YUV_TO_RGB_TABLE(precision)
 \
-c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),   
 \
-
-#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)   
 \
-int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, 
 \
+#define YUV_TO_RGB_TABLE   
 \
+c->yuv2rgb_v2r_coeff / (1 << 7),   
 \
+c->yuv2rgb_u2g_coeff / (1 << 7),   
 \
+c->yuv2rgb_v2g_coeff / (1 << 7),   
 \
+c->yuv2rgb_u2b_coeff / (1 << 7),   
 \
+
+#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)  
 \
+int ff_##ifmt##_to_##ofmt##_neon(int w, int h, 
 \
  uint8_t *dst, int linesize,   
 \
  const uint8_t *srcY, int linesizeY,   
 \
  const uint8_t *srcU, int linesizeU,   
 \
@@ -77,37 +77,34 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
  int y_offset, 
 \
  int y_coeff); 
 \

 \
-static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, const 
uint8_t *src[], \
+static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t 
*src[], \
int srcStride[], int srcSliceY, int 
srcSliceH,   \
uint8_t *dst[], int dstStride[]) {  
 \
-const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };   
 \
+const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE };  
 \

 \
-ff_##ifmt##_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,   
 \
+ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH,   
 \
  dst[0] + srcSliceY * dstStride[0], 
dstStride[0],   \
  src[0], srcStride[0], 
 \
  src[1], srcStride[1], 
 \
  src[2], srcStride[2], 
 \
  yuv2rgb_table,
 \
  c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / ((precision) == 16 ? 1 
<< 7 : 1));\
+ c->yuv2rgb_y_coeff / (1 << 7));   
 \

 \
 return 0;  
 \
 }  
 \
 
-#define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx, precision) 
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr, precision)   
 \
-DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, b

[FFmpeg-devel] [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/swscale_unscaled.c | 16 +++
 libswscale/arm/yuv2rgb_neon.S | 89 +--
 2 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/libswscale/arm/swscale_unscaled.c 
b/libswscale/arm/swscale_unscaled.c
index 149208c..1986d65 100644
--- a/libswscale/arm/swscale_unscaled.c
+++ b/libswscale/arm/swscale_unscaled.c
@@ -62,10 +62,10 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext 
*context, const uint8_t *src[
 }
 
 #define YUV_TO_RGB_TABLE   
 \
-c->yuv2rgb_v2r_coeff / (1 << 7),   
 \
-c->yuv2rgb_u2g_coeff / (1 << 7),   
 \
-c->yuv2rgb_v2g_coeff / (1 << 7),   
 \
-c->yuv2rgb_u2b_coeff / (1 << 7),   
 \
+c->yuv2rgb_v2r_coeff,  
 \
+c->yuv2rgb_u2g_coeff,  
 \
+c->yuv2rgb_v2g_coeff,  
 \
+c->yuv2rgb_u2b_coeff,  
 \
 
 #define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)  
 \
 int ff_##ifmt##_to_##ofmt##_neon(int w, int h, 
 \
@@ -88,8 +88,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, 
const uint8_t *src[],
  src[1], srcStride[1], 
 \
  src[2], srcStride[2], 
 \
  yuv2rgb_table,
 \
- c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / (1 << 7));   
 \
+ c->yuv2rgb_y_offset >> 6, 
 \
+ c->yuv2rgb_y_coeff);  
 \

 \
 return 0;  
 \
 }  
 \
@@ -121,8 +121,8 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, 
const uint8_t *src[],
  dst[0] + srcSliceY * dstStride[0], 
dstStride[0],   \
  src[0], srcStride[0], src[1], srcStride[1],   
 \
  yuv2rgb_table,
 \
- c->yuv2rgb_y_offset >> 9, 
 \
- c->yuv2rgb_y_coeff / (1 << 7));   
 \
+ c->yuv2rgb_y_offset >> 6, 
 \
+ c->yuv2rgb_y_coeff);  
 \

 \
 return 0;  
 \
 }  
 \
diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index fe5dd04..9345bae 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -68,14 +68,14 @@
 
 .macro load_chroma_nv12
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
-vsubl.u8q14, d2, d10   @ q14 = 
U - 128
-vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+vshll.u8q14, d2, #3@ q14 = 
U * (1 << 3)
+vshll.u8q15, d3, #3@ q15 = 
V * (1 << 3)
 .endm
 
 .macro load_chroma_nv21
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
-vsubl.u8q14, d3, d10   @ q14 = 
U - 128
-vsubl.u8q15, d2, d10   @ q15 = 
V - 128
+vshll.u8q14, d3, #3@ q14 = 
U * (1 << 3)
+vshll.u8q15, d2, #3 

[FFmpeg-devel] [PATCH 07/10] swscale/arm/yuv2rgb: macro-ify

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 115 ++
 1 file changed, 39 insertions(+), 76 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 8abb986..f77f534 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -105,7 +105,7 @@
 compute_16pxr2, d14, d15, \ofmt
 .endm
 
-.macro load_args_nvx
+.macro load_args_nv12
 push{r4-r12, lr}
 vpush   {q4-q7}
 ldr r4, [sp, #104] @ r4  = 
srcY
@@ -122,6 +122,10 @@
 sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
 .endm
 
+.macro load_args_nv21
+load_args_nv12
+.endm
+
 .macro load_args_yuv420p
 push{r4-r12, lr}
 vpush   {q4-q7}
@@ -146,113 +150,72 @@
 load_args_yuv420p
 .endm
 
-.macro declare_func ifmt ofmt
-function ff_\ifmt\()_to_\ofmt\()_neon, export=1
-
-.ifc \ifmt,nv12
-load_args_nvx
-.endif
-
-.ifc \ifmt,nv21
-load_args_nvx
-.endif
-
-.ifc \ifmt,yuv420p
-load_args_yuv420p
-.endif
-
-
-.ifc \ifmt,yuv422p
-load_args_yuv422p
-.endif
-
-1:
-mov r8, r0 @ r8 = 
width
-2:
-pld [r6, #64*3]
-pld [r4, #64*3]
-
-vmov.i8 d10, #128
-
-.ifc \ifmt,nv12
+.macro load_chroma_nv12
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,nv21
+.macro load_chroma_nv21
 vld2.8  {d2, d3}, [r6]!@ q1: 
interleaved chroma line
 vsubl.u8q14, d3, d10   @ q14 = 
U - 128
 vsubl.u8q15, d2, d10   @ q15 = 
V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv420p
-pld [r10, #64*3]
-
-vld1.8  d2, [r6]!  @ d2: 
chroma red line
-vld1.8  d3, [r10]! @ d3: 
chroma blue line
-vsubl.u8q14, d2, d10   @ q14 = 
U - 128
-vsubl.u8q15, d3, d10   @ q15 = 
V - 128
-
-process_1l_16px \ofmt
-.endif
-
-.ifc \ifmt,yuv422p
+.macro load_chroma_yuv420p
 pld [r10, #64*3]
 
 vld1.8  d2, [r6]!  @ d2: 
chroma red line
 vld1.8  d3, [r10]! @ d3: 
chroma blue line
 vsubl.u8q14, d2, d10   @ q14 = 
U - 128
 vsubl.u8q15, d3, d10   @ q15 = 
V - 128
+.endm
 
-process_1l_16px \ofmt
-.endif
-
-subsr8, r8, #16@ width 
-= 16
-bgt 2b
-
-add r2, r2, r3 @ dst   
+= padding
-add r4, r4, r5 @ srcY  
+= paddingY
-
-.ifc \ifmt,nv12
-tst r1, #1
-subeq   r6, r6, r0 @ if 
(height % 2 == 0) paddingU -= width
-addne   r6, r7 @ else  
   paddingU += linesizeU - width
-
-subsr1, r1, #1 @ 
height -= 1
-.endif
+.macro load_chroma_yuv422p
+load_chroma_yuv420p
+.endm
 
-.ifc \ifmt,nv21
+.macro increment_nv12
 tst r1, #1
 subeq   r6, r6, r0 @ if 
(height % 2 == 0) paddingU -= width
 addne   r6, r7 @ else  
   paddingU += linesizeU - width
+.endm
 
-subsr1, r1, #1 @ 
height -= 1
-.endif
+.macro increment_nv21
+increment_nv12
+.endm
 
-.ifc \ifmt,yuv420p
+.macro increment_yuv420p
 tst r1, #1
 subeq   r6, r6, r0, lsr #1 @ if 
(height % 2 == 0) paddingU -= (width / 2)
 addne   r6, r7 @ else  
   paddingU += linesizeU - (width / 2)
 subeq   r10, r10, r0, lsr #1   @ if 
(height % 2 == 0) paddingU -= (width / 2)
 addne   r10, r12   @ else  
   paddingV = linesizeV - (width / 2)
+.endm
 
-subsr1, r1, #1 @ 
height -= 1
-.endif

[FFmpeg-devel] [PATCH 09/10] swscale/arm/yuv2rgb: re-order arguments of the compute_rgba macro

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 03d15cb..fe5dd04 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -121,7 +121,7 @@
 vqrshrun.s16\dst_comp2, q2, #6
 .endm
 
-.macro compute_rgba r1 r2 g1 g2 b1 b2 a1 a2
+.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
 compute_color   \r1, \r2, q8,  q9
 compute_color   \g1, \g2, q10, q11
 compute_color   \b1, \b2, q12, q13
@@ -176,19 +176,19 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 vmul.s16q15, q7@ q15 = 
(srcY - y_offset) * y_coeff (right)
 
 .ifc \ofmt,argb
-compute_rgbad7, d11, d8, d12, d9, d13, d6, d10
+compute_rgbad7, d8, d9, d6, d11, d12, d13, d10
 .endif
 
 .ifc \ofmt,rgba
-compute_rgbad6, d10, d7, d11, d8, d12, d9, d13
+compute_rgbad6, d7, d8, d9, d10, d11, d12, d13
 .endif
 
 .ifc \ofmt,abgr
-compute_rgbad9, d13, d8, d12, d7, d11, d6, d10
+compute_rgbad9, d8, d7, d6, d13, d12, d11, d10
 .endif
 
 .ifc \ofmt,bgra
-compute_rgbad8, d12, d7, d11, d6, d10, d9, d13
+compute_rgbad8, d7, d6, d9, d12, d11, d10, d13
 .endif
 
 vst4.8  {q3, q4}, [r2,:128]!
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 02/10] swscale/arm/yuv2rgb: fix comments and factorize lsl in load_args_yuv422p

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index f40327b..aac0773 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -172,11 +172,10 @@
 vdup.16 d0, r10@ d0  = 
y_coeff
 vld1.16 {d1}, [r8] @ d1  = 
*table
 add r11, r2, r3@ r11 = 
dst + linesize (dst2)
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
-sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
-sub r7, r7, r0, lsr #1 @ r7 = 
linesizeU - width / 2 (paddingU)
-sub r12,r12,r0, lsr #1 @ r12 = 
linesizeV- width / 2 (paddingV)
+sub r3, r3, r0, lsl #2 @ r3  = 
linesize  - width * 4 (padding)
+sub r5, r5, r0 @ r5  = 
linesizeY - width (paddingY)
+sub r7, r7, r0, lsr #1 @ r7  = 
linesizeU - width / 2 (paddingU)
+sub r12,r12,r0, lsr #1 @ r12 = 
linesizeV - width / 2 (paddingV)
 ldr r10,[sp, #120] @ r10 = 
srcV
 .endm
 
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 05/10] swscale/arm/yuv2rgb: factorize lsl in load_args_nvx

2016-03-25 Thread Matthieu Bouron
From: Matthieu Bouron 

---
 libswscale/arm/yuv2rgb_neon.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 4601a79..ef7b0a6 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -131,8 +131,7 @@
 add r12, r4, r5@ r12 = 
srcY + linesizeY (srcY2)
 lsl r3, r3, #1
 lsl r5, r5, #1
-lsl r8, r0, #2
-sub r3, r3, r8 @ r3 = 
linesize  * 2 - width * 4 (padding)
+sub r3, r3, r0, lsl #2 @ r3 = 
linesize  * 2 - width * 4 (padding)
 sub r5, r5, r0 @ r5 = 
linesizeY * 2 - width (paddingY)
 sub r7, r7, r0 @ r7 = 
linesizeC - width (paddingC)
 .endm
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] swscale/arm/yuv2rgb: make the code bitexact with its aarch64 counter part

2016-03-25 Thread Matthieu Bouron
The following patchset aims to make bitexact the yuv->rgba armv7 neon code path
with the aarch64 one. It also aims to make the two code bases as close as
possible.

[PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path

The current 32bit code path which is unused is removed.

[PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time

The code process only one line at a time for the yuv420p,nv12 and nv21 formats
with no regression in performance observed on a rpi2 (I've even observed a
slight increase of performance for the nv12 and nv21 formats).

[PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its

The last patch of the serie makes the code bitexact with the aarch64 version.
The increase of precision (which introduces a performance loss) is compensated
by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh.

./ffmpeg_g -nostats -f lavfi -i 
testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f null -

without patchset :
[bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605

with patchset:
[bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.018846

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support

2016-03-23 Thread Matthieu Bouron
On Tue, Mar 22, 2016 at 10:04 AM, Matthieu Bouron  wrote:

>
>
> On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron <
> matthieu.bou...@gmail.com> wrote:
>
>> From: Matthieu Bouron 
>>
>> ---
>>
>> Hello,
>>
>> The following patch add hwaccel support to the mediacodec (h264) decoder
>> by allowing
>> the user to render the output frames directly on a surface.
>>
>> In order to do so the user needs to initialize the hwaccel through the
>> use of
>> av_mediacodec_alloc_context and av_mediacodec_default_init functions. The
>> later
>> takes a reference to an android/view/Surface as parameter.
>>
>> If the hwaccel successfully initialize, the decoder output frames pix fmt
>> will be
>> AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to
>> render
>> the frames on the surface:
>>
>> AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3];
>> av_mediacodec_release_buffer(buffer, 1);
>>
>> The last argument of av_mediacodec_release_buffer enable rendering of the
>> buffer on the surface (or not if set to 0).
>>
>> Regarding the internal changes in the mediacodec decoder:
>>
>> MediaCodec.flush() discards both input and output buffers meaning that if
>> MediaCodec.flush() is called all output buffers the user has a reference
>> on are
>> now invalid (and cannot be used).
>> This behaviour does not fit well in the avcodec API.
>>
>> When the decoder is configured to output software buffers, there is no
>> issue as
>> the buffers are copied.
>>
>> Now when the decoder is configured to output to a surface, the user might
>> not
>> want to render all the frames as fast as the decoder can go and might
>> want to
>> control *when* the frame are rendered, so we need to make sure that the
>> MediaCodec.flush() call is delayed until all the frames the user retains
>> has
>> been released or rendered.
>>
>> Delaying the call to MediaCodec.flush() means buffering any inputs that
>> come
>> the decoder until the user has released/renderer the frame he retains.
>>
>> This is a limitation of this hwaccel implementation, if the user retains a
>> frame (a), then issue a flush command to the decoder, the packets he
>> feeds to
>> the decoder at that point will be queued in the internal decoder packet
>> queue
>> (until he releases the frame (a)). This scenario leads to a memory usage
>> increase to say the least.
>>
>> Currently there is no limitation on the size of the internal decoder
>> packet
>> queue but this is something that can be added easily. Then, if the queue
>> is
>> full, what would be the behaviour of the decoder ? Can it block ? Or
>> should it
>> returns something like AVERROR(EAGAIN) ?
>>
>> About the other internal decoder changes I introduced:
>>
>> The MediaCodecDecContext is now refcounted (using the lavu/atomic api)
>> since
>> the (hwaccel) frames can be retained by the user, we need to delay the
>> destruction of the codec until the user has released all the frames he
>> has a
>> reference on.
>> The reference counter of the MediaCodecDecContext is incremented each
>> time an
>> (hwaccel) frame is outputted by the decoder and decremented each time a
>> (hwaccel) frame is released.
>>
>> Also, when the decoder is configured to output to a surface the pts that
>> are
>> given to the MediaCodec API are now rescaled based on the codec_timebase
>> as
>> those timestamps values are propagated to the frames rendered on the
>> surface
>> since Android M. Not sure if it's really useful though.
>>
>> On the performance side:
>>
>> On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps:
>>   - software output + rgba conversion goes at 59~60fps
>>   - surface output + render on a surface goes at 100~110fps
>>
>>
> [...]
>
> Patch updated with the following differences:
>   * the public mediacodec api is now always built (not only when
> mediacodec is available) (and the build when mediacodec is not available
> has been fixed)
>   * the documentation of av_mediacodec_release_buffer has been improved a
> bit
>

Patch updated with the following differences:
  MediaCodecBuffer->released type is now a volatile int (instead of a int*)
  MediaCodecContext->refcount type is now a volatile int (instead of a int*)

Matthieu

[...]
From fdbc9e38816be8ce3af2d4a85383203588f1dd7a Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 11 Mar

Re: [FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support

2016-03-22 Thread Matthieu Bouron
On Fri, Mar 18, 2016 at 5:50 PM, Matthieu Bouron 
wrote:

> From: Matthieu Bouron 
>
> ---
>
> Hello,
>
> The following patch add hwaccel support to the mediacodec (h264) decoder
> by allowing
> the user to render the output frames directly on a surface.
>
> In order to do so the user needs to initialize the hwaccel through the use
> of
> av_mediacodec_alloc_context and av_mediacodec_default_init functions. The
> later
> takes a reference to an android/view/Surface as parameter.
>
> If the hwaccel successfully initialize, the decoder output frames pix fmt
> will be
> AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to
> render
> the frames on the surface:
>
> AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3];
> av_mediacodec_release_buffer(buffer, 1);
>
> The last argument of av_mediacodec_release_buffer enable rendering of the
> buffer on the surface (or not if set to 0).
>
> Regarding the internal changes in the mediacodec decoder:
>
> MediaCodec.flush() discards both input and output buffers meaning that if
> MediaCodec.flush() is called all output buffers the user has a reference
> on are
> now invalid (and cannot be used).
> This behaviour does not fit well in the avcodec API.
>
> When the decoder is configured to output software buffers, there is no
> issue as
> the buffers are copied.
>
> Now when the decoder is configured to output to a surface, the user might
> not
> want to render all the frames as fast as the decoder can go and might want
> to
> control *when* the frame are rendered, so we need to make sure that the
> MediaCodec.flush() call is delayed until all the frames the user retains
> has
> been released or rendered.
>
> Delaying the call to MediaCodec.flush() means buffering any inputs that
> come
> the decoder until the user has released/renderer the frame he retains.
>
> This is a limitation of this hwaccel implementation, if the user retains a
> frame (a), then issue a flush command to the decoder, the packets he feeds
> to
> the decoder at that point will be queued in the internal decoder packet
> queue
> (until he releases the frame (a)). This scenario leads to a memory usage
> increase to say the least.
>
> Currently there is no limitation on the size of the internal decoder packet
> queue but this is something that can be added easily. Then, if the queue is
> full, what would be the behaviour of the decoder ? Can it block ? Or
> should it
> returns something like AVERROR(EAGAIN) ?
>
> About the other internal decoder changes I introduced:
>
> The MediaCodecDecContext is now refcounted (using the lavu/atomic api)
> since
> the (hwaccel) frames can be retained by the user, we need to delay the
> destruction of the codec until the user has released all the frames he has
> a
> reference on.
> The reference counter of the MediaCodecDecContext is incremented each time
> an
> (hwaccel) frame is outputted by the decoder and decremented each time a
> (hwaccel) frame is released.
>
> Also, when the decoder is configured to output to a surface the pts that
> are
> given to the MediaCodec API are now rescaled based on the codec_timebase as
> those timestamps values are propagated to the frames rendered on the
> surface
> since Android M. Not sure if it's really useful though.
>
> On the performance side:
>
> On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps:
>   - software output + rgba conversion goes at 59~60fps
>   - surface output + render on a surface goes at 100~110fps
>
>
[...]

Patch updated with the following differences:
  * the public mediacodec api is now always built (not only when mediacodec
is available) (and the build when mediacodec is not available has been
fixed)
  * the documentation of av_mediacodec_release_buffer has been improved a
bit

The development branch is located here:
https://github.com/mbouron/FFmpeg/tree/feature/mediacodec-hwaccel
From 26b21e16a93e6580ee75cc94d71fca23c111ad5b Mon Sep 17 00:00:00 2001
From: Matthieu Bouron 
Date: Fri, 11 Mar 2016 17:21:04 +0100
Subject: [PATCH] lavc: add mediacodec hwaccel support

---
 configure   |   1 +
 libavcodec/Makefile |   6 +-
 libavcodec/allcodecs.c  |   1 +
 libavcodec/mediacodec.c | 133 
 libavcodec/mediacodec.h |  88 +
 libavcodec/mediacodec_surface.c |  66 ++
 libavcodec/mediacodec_surface.h |  31 +
 libavcodec/mediacodec_wrapper.c |   5 +-
 libavcodec/mediacodecdec.c  | 272 +---
 libavcodec/mediacodecdec.h  |  17 +++
 libavcodec/mediacodecdec_h264.c |  23 
 libavutil/pixdesc.c |   4 +
 libavutil/pixfmt.h   

[FFmpeg-devel] [PATCH] lavc/mediacodec: add hwaccel support

2016-03-19 Thread Matthieu Bouron
From: Matthieu Bouron 

---

Hello,

The following patch add hwaccel support to the mediacodec (h264) decoder by 
allowing
the user to render the output frames directly on a surface.

In order to do so the user needs to initialize the hwaccel through the use of
av_mediacodec_alloc_context and av_mediacodec_default_init functions. The later
takes a reference to an android/view/Surface as parameter.

If the hwaccel successfully initialize, the decoder output frames pix fmt will 
be
AV_PIX_FMT_MEDIACODEC. The following snippet of code demonstrate how to render
the frames on the surface:

AVMediaCodecBuffer *buffer = (AVMediaCodecBuffer *)frame->data[3];
av_mediacodec_release_buffer(buffer, 1);

The last argument of av_mediacodec_release_buffer enable rendering of the
buffer on the surface (or not if set to 0).

Regarding the internal changes in the mediacodec decoder:

MediaCodec.flush() discards both input and output buffers meaning that if
MediaCodec.flush() is called all output buffers the user has a reference on are
now invalid (and cannot be used).
This behaviour does not fit well in the avcodec API.

When the decoder is configured to output software buffers, there is no issue as
the buffers are copied.

Now when the decoder is configured to output to a surface, the user might not
want to render all the frames as fast as the decoder can go and might want to
control *when* the frame are rendered, so we need to make sure that the
MediaCodec.flush() call is delayed until all the frames the user retains has
been released or rendered.

Delaying the call to MediaCodec.flush() means buffering any inputs that come
the decoder until the user has released/renderer the frame he retains.

This is a limitation of this hwaccel implementation, if the user retains a
frame (a), then issue a flush command to the decoder, the packets he feeds to
the decoder at that point will be queued in the internal decoder packet queue
(until he releases the frame (a)). This scenario leads to a memory usage
increase to say the least.

Currently there is no limitation on the size of the internal decoder packet
queue but this is something that can be added easily. Then, if the queue is
full, what would be the behaviour of the decoder ? Can it block ? Or should it
returns something like AVERROR(EAGAIN) ?

About the other internal decoder changes I introduced:

The MediaCodecDecContext is now refcounted (using the lavu/atomic api) since
the (hwaccel) frames can be retained by the user, we need to delay the
destruction of the codec until the user has released all the frames he has a
reference on.
The reference counter of the MediaCodecDecContext is incremented each time an
(hwaccel) frame is outputted by the decoder and decremented each time a
(hwaccel) frame is released.

Also, when the decoder is configured to output to a surface the pts that are
given to the MediaCodec API are now rescaled based on the codec_timebase as
those timestamps values are propagated to the frames rendered on the surface
since Android M. Not sure if it's really useful though.

On the performance side:

On a nexus 5, decoding an h264 stream (main profile) 1080p@60fps:
  - software output + rgba conversion goes at 59~60fps
  - surface output + render on a surface goes at 100~110fps

Matthieu

---
 configure   |   1 +
 libavcodec/Makefile |   6 +-
 libavcodec/allcodecs.c  |   1 +
 libavcodec/mediacodec.c | 125 ++
 libavcodec/mediacodec.h |  85 +
 libavcodec/mediacodec_surface.c |  66 ++
 libavcodec/mediacodec_surface.h |  31 +
 libavcodec/mediacodec_wrapper.c |   5 +-
 libavcodec/mediacodecdec.c  | 272 +---
 libavcodec/mediacodecdec.h  |  17 +++
 libavcodec/mediacodecdec_h264.c |  23 
 libavutil/pixdesc.c |   4 +
 libavutil/pixfmt.h  |   2 +
 13 files changed, 586 insertions(+), 52 deletions(-)
 create mode 100644 libavcodec/mediacodec.c
 create mode 100644 libavcodec/mediacodec.h
 create mode 100644 libavcodec/mediacodec_surface.c
 create mode 100644 libavcodec/mediacodec_surface.h

diff --git a/configure b/configure
index e5de306..4d66673 100755
--- a/configure
+++ b/configure
@@ -2530,6 +2530,7 @@ h264_d3d11va_hwaccel_select="h264_decoder"
 h264_dxva2_hwaccel_deps="dxva2"
 h264_dxva2_hwaccel_select="h264_decoder"
 h264_mediacodec_decoder_deps="mediacodec"
+h264_mediacodec_hwaccel_deps="mediacodec"
 h264_mediacodec_decoder_select="h264_mp4toannexb_bsf h264_parser"
 h264_mmal_decoder_deps="mmal"
 h264_mmal_decoder_select="mmal"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 6bb1af1..a3dad7e 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -10,6 +10,7 @@ HEADERS = avcodec.h   
  \
 

Re: [FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM

2016-03-14 Thread Matthieu Bouron
On Sun, Mar 13, 2016 at 08:48:21PM +0100, Matthieu Bouron wrote:
> On Fri, Mar 11, 2016 at 09:36:41PM +0100, Matthieu Bouron wrote:

[...]

> 
> If nobody objects, I will push the patch (with #include  removed)
> tomorrow.
> 

Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM

2016-03-13 Thread Matthieu Bouron
On Fri, Mar 11, 2016 at 09:36:41PM +0100, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> Android N will prevent users from loading non-public APIs.
> 
> Users should only rely on the av_jni_set_java_vm function to set the
> Java VM.
> ---
>  libavcodec/ffjni.c | 88 
> ++
>  1 file changed, 3 insertions(+), 85 deletions(-)
> 
> diff --git a/libavcodec/ffjni.c b/libavcodec/ffjni.c
> index da13699..54f3122 100644
> --- a/libavcodec/ffjni.c
> +++ b/libavcodec/ffjni.c
> @@ -35,80 +35,6 @@
>  static JavaVM *java_vm = NULL;
>  static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
>  
> -/**
> - * Check if JniInvocation has been initialized. Only available on
> - * Android >= 4.4.
> - *
> - * @param log_ctx context used for logging, can be NULL
> - * @return 0 on success, < 0 otherwise
> - */
> -static int check_jni_invocation(void *log_ctx)
> -{
> -int ret = AVERROR_EXTERNAL;
> -void *handle = NULL;
> -void **jni_invocation = NULL;
> -
> -handle = dlopen(NULL, RTLD_LOCAL);
> -if (!handle) {
> -goto done;
> -}
> -
> -jni_invocation = (void **)dlsym(handle, 
> "_ZN13JniInvocation15jni_invocation_E");
> -if (!jni_invocation) {
> -av_log(log_ctx, AV_LOG_ERROR, "Could not find 
> JniInvocation::jni_invocation_ symbol\n");
> -goto done;
> -}
> -
> -ret = !(jni_invocation != NULL && *jni_invocation != NULL);
> -
> -done:
> -if (handle) {
> -dlclose(handle);
> -}
> -
> -return ret;
> -}
> -
> -/**
> - * Return created Java virtual machine using private JNI_GetCreatedJavaVMs
> - * function from the specified library name.
> - *
> - * @param name library name used for symbol lookups, can be NULL
> - * @param log_ctx context used for logging, can be NULL
> - * @return the current Java virtual machine in use
> - */
> -static JavaVM *get_java_vm(const char *name, void *log_ctx)
> -{
> -JavaVM *vm = NULL;
> -jsize nb_vm = 0;
> -
> -void *handle = NULL;
> -jint (*get_created_java_vms) (JavaVM ** vmBuf, jsize bufLen, jsize 
> *nVMs) = NULL;
> -
> -handle = dlopen(name, RTLD_LOCAL);
> -if (!handle) {
> -return NULL;
> -}
> -
> -get_created_java_vms = (jint (*)(JavaVM **, jsize, jsize *)) 
> dlsym(handle, "JNI_GetCreatedJavaVMs");
> -if (!get_created_java_vms) {
> -av_log(log_ctx, AV_LOG_ERROR, "Could not find JNI_GetCreatedJavaVMs 
> symbol in library '%s'\n", name);
> -goto done;
> -}
> -
> -if (get_created_java_vms(&vm, 1, &nb_vm) != JNI_OK) {
> -av_log(log_ctx, AV_LOG_ERROR, "Could not get created Java virtual 
> machines\n");
> -goto done;
> -}
> -
> -done:
> -if (handle) {
> -dlclose(handle);
> -}
> -
> -return vm;
> -}
> -
>  JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx)
>  {
>  int ret = 0;
> @@ -117,21 +43,13 @@ JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx)
>  *attached = 0;
>  
>  pthread_mutex_lock(&lock);
> -if (java_vm == NULL && (java_vm = av_jni_get_java_vm(log_ctx)) == NULL) {
> -
> -av_log(log_ctx, AV_LOG_INFO, "Retrieving current Java virtual 
> machine using Android JniInvocation wrapper\n");
> -if (check_jni_invocation(log_ctx) == 0) {
> -if ((java_vm = get_java_vm(NULL, log_ctx)) != NULL ||
> -(java_vm = get_java_vm("libdvm.so", log_ctx)) != NULL ||
> -(java_vm = get_java_vm("libart.so", log_ctx)) != NULL) {
> -av_log(log_ctx, AV_LOG_INFO, "Found Java virtual machine 
> using Android JniInvocation wrapper\n");
> -}
> -}
> +if (java_vm == NULL) {
> +java_vm = av_jni_get_java_vm(log_ctx);
>  }
>  pthread_mutex_unlock(&lock);
>  
>  if (!java_vm) {
> -av_log(log_ctx, AV_LOG_ERROR, "Could not retrieve a Java virtual 
> machine\n");
> +av_log(log_ctx, AV_LOG_ERROR, "No Java virtual machine has been 
> registered\n");
>  return NULL;
>  }
>  

If nobody objects, I will push the patch (with #include  removed)
tomorrow.

Matthieu
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/ffjni: remove use of private JniInvocation API to retreive the Java VM

2016-03-11 Thread Matthieu Bouron
From: Matthieu Bouron 

Android N will prevent users from loading non-public APIs.

Users should only rely on the av_jni_set_java_vm function to set the
Java VM.
---
 libavcodec/ffjni.c | 88 ++
 1 file changed, 3 insertions(+), 85 deletions(-)

diff --git a/libavcodec/ffjni.c b/libavcodec/ffjni.c
index da13699..54f3122 100644
--- a/libavcodec/ffjni.c
+++ b/libavcodec/ffjni.c
@@ -35,80 +35,6 @@
 static JavaVM *java_vm = NULL;
 static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
-/**
- * Check if JniInvocation has been initialized. Only available on
- * Android >= 4.4.
- *
- * @param log_ctx context used for logging, can be NULL
- * @return 0 on success, < 0 otherwise
- */
-static int check_jni_invocation(void *log_ctx)
-{
-int ret = AVERROR_EXTERNAL;
-void *handle = NULL;
-void **jni_invocation = NULL;
-
-handle = dlopen(NULL, RTLD_LOCAL);
-if (!handle) {
-goto done;
-}
-
-jni_invocation = (void **)dlsym(handle, 
"_ZN13JniInvocation15jni_invocation_E");
-if (!jni_invocation) {
-av_log(log_ctx, AV_LOG_ERROR, "Could not find 
JniInvocation::jni_invocation_ symbol\n");
-goto done;
-}
-
-ret = !(jni_invocation != NULL && *jni_invocation != NULL);
-
-done:
-if (handle) {
-dlclose(handle);
-}
-
-return ret;
-}
-
-/**
- * Return created Java virtual machine using private JNI_GetCreatedJavaVMs
- * function from the specified library name.
- *
- * @param name library name used for symbol lookups, can be NULL
- * @param log_ctx context used for logging, can be NULL
- * @return the current Java virtual machine in use
- */
-static JavaVM *get_java_vm(const char *name, void *log_ctx)
-{
-JavaVM *vm = NULL;
-jsize nb_vm = 0;
-
-void *handle = NULL;
-jint (*get_created_java_vms) (JavaVM ** vmBuf, jsize bufLen, jsize *nVMs) 
= NULL;
-
-handle = dlopen(name, RTLD_LOCAL);
-if (!handle) {
-return NULL;
-}
-
-get_created_java_vms = (jint (*)(JavaVM **, jsize, jsize *)) dlsym(handle, 
"JNI_GetCreatedJavaVMs");
-if (!get_created_java_vms) {
-av_log(log_ctx, AV_LOG_ERROR, "Could not find JNI_GetCreatedJavaVMs 
symbol in library '%s'\n", name);
-goto done;
-}
-
-if (get_created_java_vms(&vm, 1, &nb_vm) != JNI_OK) {
-av_log(log_ctx, AV_LOG_ERROR, "Could not get created Java virtual 
machines\n");
-goto done;
-}
-
-done:
-if (handle) {
-dlclose(handle);
-}
-
-return vm;
-}
-
 JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx)
 {
 int ret = 0;
@@ -117,21 +43,13 @@ JNIEnv *ff_jni_attach_env(int *attached, void *log_ctx)
 *attached = 0;
 
 pthread_mutex_lock(&lock);
-if (java_vm == NULL && (java_vm = av_jni_get_java_vm(log_ctx)) == NULL) {
-
-av_log(log_ctx, AV_LOG_INFO, "Retrieving current Java virtual machine 
using Android JniInvocation wrapper\n");
-if (check_jni_invocation(log_ctx) == 0) {
-if ((java_vm = get_java_vm(NULL, log_ctx)) != NULL ||
-(java_vm = get_java_vm("libdvm.so", log_ctx)) != NULL ||
-(java_vm = get_java_vm("libart.so", log_ctx)) != NULL) {
-av_log(log_ctx, AV_LOG_INFO, "Found Java virtual machine using 
Android JniInvocation wrapper\n");
-}
-}
+if (java_vm == NULL) {
+java_vm = av_jni_get_java_vm(log_ctx);
 }
 pthread_mutex_unlock(&lock);
 
 if (!java_vm) {
-av_log(log_ctx, AV_LOG_ERROR, "Could not retrieve a Java virtual 
machine\n");
+av_log(log_ctx, AV_LOG_ERROR, "No Java virtual machine has been 
registered\n");
 return NULL;
 }
 
-- 
2.7.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavf/img2dec: disable parsing if frame_size is specified

2016-03-07 Thread Matthieu Bouron
From: Matthieu Bouron 

---

Hello,

The following patch disable parsing if the frame_size option is specified. The
main purpose here is to disable the use of parsers (which have a huge
performance cost on embedded platforms) for single images when
their size is known in advance.

The patch sounds hackish to me though, but others might consider it OK (or not).

The performance of the jpeg parser still need to be addressed at some point.

Matthieu

---
 libavformat/img2dec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/img2dec.c b/libavformat/img2dec.c
index fe0e346..9aa6dd7 100644
--- a/libavformat/img2dec.c
+++ b/libavformat/img2dec.c
@@ -206,7 +206,7 @@ int ff_img_read_header(AVFormatContext *s1)
 s->is_pipe = 0;
 else {
 s->is_pipe   = 1;
-st->need_parsing = AVSTREAM_PARSE_FULL;
+st->need_parsing = s->frame_size > 0 ? AVSTREAM_PARSE_NONE : 
AVSTREAM_PARSE_FULL;
 }
 
 if (s->ts_from_file == 2) {
-- 
2.7.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/mjpegdec: avoid unneeded allocation if the frame is to be skipped

2016-03-07 Thread Matthieu Bouron
On Tue, Mar 01, 2016 at 08:53:33PM +0100, Paul B Mahol wrote:
> On 3/1/16, Matthieu Bouron  wrote:
> > From: Matthieu Bouron 
> >
> > ---
> >  libavcodec/mjpegdec.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> 
> probbably ok

Pushed. Thanks.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] lavc: add h264 mediacodec decoder

2016-03-07 Thread Matthieu Bouron
On Thu, Mar 03, 2016 at 02:03:01PM +0100, Matthieu Bouron wrote:

[...]

> 
> Patch updated with the following differences:
>   * ff_set_dimensions return code is now used
>   * add missing exception when trying to call the MediaCodec object 
> constructor
>   * remove leftover avctx_internal field from MediaCodecH264DecContext
>   * add ff_AMediaCodec_getName function
> 
> The dev branch can be found here:
> https://github.com/mbouron/FFmpeg/tree/feature/mediacodec-support-v7
> 
> If nobody objects I would like to push the patchset in 3 days.
> 

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] lavc: add JNI support

2016-03-07 Thread Matthieu Bouron
On Thu, Mar 03, 2016 at 01:56:16PM +0100, Matthieu Bouron wrote:
[...]
> 
> New patch attached with the following differences:
>   * added myself as a maintainer of jni* and ffjni*
> 

Pushed.

[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


<    1   2   3   4   5   6   7   >