Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-16 Thread Michael Niedermayer
On Sat, Nov 15, 2014 at 07:28:25PM -0700, Pavel Koshevoy wrote:
> On 11/15/14 18:12, James Almer wrote:
> >On 15/11/14 1:50 AM, Michael Niedermayer wrote:
> >>On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote:
> >>>I ran both builds twice and captured the output from the second run
> >>>of each build, it's in the attachment.  By the looks of it there is
> >>>no difference in performance.
> >>to compare START/STOP_TIMER data its generally best to run the
> >>test a few times (like 3) and compare the values from each that
> >>have some specific number or runs, like
> >>
> >>>681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A
> >>vs.
> >>>668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A
> >>but from these 2 tests it seems you are correct and theres no
> >>significant difference so theres probably not much point in doing
> >>further tests
> >>
> >It might be a good idea to try -threads 1 for the input file as well
> >
> 
> I've done 4 runs for each build using vec_ld and VEC_LD2, and logged
> the results of the last 3 runs for each build.
> The results are in the attachment.  This time I added -an -threads 1
> and the fps went up for both builds.  It seems VEC_LD2 is slightly
> faster.
> 
> I am not sure I've put -threads 1 in the right place on the command
> line, and I don't know if it matters -- this is a single-core ppc
> G4.
> 

> Let me know if you would like me to try something else.

no, thanks alot

it seems the conclusion is that its 1 cpu cycle faster for you and
slower for carl, thus overall its basically the same speed.

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-15 Thread Pavel Koshevoy

On 11/15/14 18:12, James Almer wrote:

On 15/11/14 1:50 AM, Michael Niedermayer wrote:

On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote:

I ran both builds twice and captured the output from the second run
of each build, it's in the attachment.  By the looks of it there is
no difference in performance.

to compare START/STOP_TIMER data its generally best to run the
test a few times (like 3) and compare the values from each that
have some specific number or runs, like


681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A

vs.

668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A

but from these 2 tests it seems you are correct and theres no
significant difference so theres probably not much point in doing
further tests


It might be a good idea to try -threads 1 for the input file as well



I've done 4 runs for each build using vec_ld and VEC_LD2, and logged the 
results of the last 3 runs for each build.
The results are in the attachment.  This time I added -an -threads 1 and 
the fps went up for both builds.  It seems VEC_LD2 is slightly faster.


I am not sure I've put -threads 1 in the right place on the command 
line, and I don't know if it matters -- this is a single-core ppc G4.


Let me know if you would like me to try something else.

Pavel
$ ./ffmpeg -v 99 -i ~/Movies/matrixbench_mpeg2.mpg -an -threads 1 -f null -
ffmpeg version N-67669-g53ab784 Copyright (c) 2000-2014 the FFmpeg developers
  built on Nov 14 2014 20:14:18 with gcc 4.2.1 (GCC) (Apple Inc. build 5577)
  configuration: --prefix=/Developer/ppc --disable-debug --disable-shared 
--enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis 
--enable-libopus --enable-libtheora --enable-libschroedinger 
--enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex 
--enable-pthreads --enable-gpl --enable-version3 --enable-libopencore-amrnb 
--enable-libopencore-amrwb --enable-postproc --enable-libx264 --enable-libxvid 
--enable-libass --enable-gnutls --enable-runtime-cpudetect 
--extra-cflags=-I/opt/local/include 
--extra-ldflags='-headerpad_max_install_names -L/opt/local/lib'
  libavutil  54. 11.100 / 54. 11.100
  libavcodec 56. 12.100 / 56. 12.100
  libavformat56. 12.103 / 56. 12.103
  libavdevice56.  3.100 / 56.  3.100
  libavfilter 5.  2.103 /  5.  2.103
  libswscale  3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc53.  3.100 / 53.  3.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument 
'99'.
Reading option '-i' ... matched as input file with argument 
'/Users/pavel/Movies/matrixbench_mpeg2.mpg'.
Reading option '-an' ... matched as option 'an' (disable audio) with argument 
'1'.
Reading option '-threads' ... matched as AVOption 'threads' with argument '1'.
Reading option '-f' ... matched as option 'f' (force format) with argument 
'null'.
Reading option '-' ... matched as output file.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 99.
Successfully parsed a group of options.
Parsing a group of options: input file 
/Users/pavel/Movies/matrixbench_mpeg2.mpg.
Successfully parsed a group of options.
Opening an input file: /Users/pavel/Movies/matrixbench_mpeg2.mpg.
[mpeg @ 0x2808010] Format mpeg probed with size=2048 and score=26
[mpeg @ 0x2808010] Before avformat_find_stream_info() pos: 0 bytes read:32768 
seeks:0
[mpeg @ 0x2808010] probing stream 1 pp:2500
[mpeg @ 0x2808010] Probe with size=2012, packets=1 detected mpegvideo with 
score=25
[mpeg @ 0x2808010] probed stream 1
[mpeg @ 0x2808010] max_analyze_duration 500 reached at 500 microseconds
[NULL @ 0x2809810] start time for stream 0 is not set in 
estimate_timings_from_pts
[mpeg @ 0x2808010] After avformat_find_stream_info() pos: 0 bytes read:4247696 
seeks:3 frames:333
Input #0, mpeg, from '/Users/pavel/Movies/matrixbench_mpeg2.mpg':
  Duration: 00:03:07.66, start: 0.22, bitrate: 5633 kb/s
Stream #0:0[0x1bf], 0, 1/9: Data: dvd_nav_packet, 1/9
Stream #0:1[0x1e0], 127, 1/9: Video: mpeg2video (Main), yuv420p(tv, 
bt470bg/bt470m/bt470m, left), 720x576 [SAR 16:15 DAR 4:3], 1/50, max. 11421 
kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc
Stream #0:2[0x1c0], 206, 1/9: Audio: mp2, 48000 Hz, stereo, s16p, 384 
kb/s
Successfully opened the file.
Parsing a group of options: output file -.
Applying option an (disable audio) with argument 1.
Applying option f (force format) with argument null.
Successfully parsed a group of options.
Opening an output file: -.
Successfully opened the file.
[graph 0 input from stream 0:1 @ 0x2226b40] Setting 'video_size' to value 
'720x576'
[graph 0 input from stream 0:1 @ 0x2226b40] Setting 'pix_fmt' to value '0'
[graph 0 input from stream 0:1 @ 0x2226b40] Setting 'time_base' to value 
'1/9'
[graph 0 input from stream 0:1 @ 0x2226b40] Setting 'pixel_aspect' to value 
'16/

Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0),vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-15 Thread Carl Eugen Hoyos
Michael Niedermayer  gmx.at> writes:

> This needs to be benchmarked, i do not have ppc hw

Decoding mpeg2video showed a slightly lower speed 
for START_TIMER with the patch than without.

Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-15 Thread James Almer
On 15/11/14 1:50 AM, Michael Niedermayer wrote:
> On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote:
>> On 11/14/14 07:34, Michael Niedermayer wrote:
>>> On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote:
 On Nov 13, 2014 4:15 PM, "Michael Niedermayer"  wrote:
> On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:
>> This needs to be benchmarked, i do not have ppc hw
>> This is on big endian more similar to how the code was before
 79e0255956bc8fcdb143f39b2e45db77144ac017
>> Signed-off-by: Michael Niedermayer 
> ping
>
> can someone with a altivec PPC please benchmark this
> or do all the ppc people want code to be slow and unoptimized ?
> iam also happy to benchmark it myself if someone provides a ppc or
> account on a altivec ppc that is reasonable idle so benchmarking is
> possible with some accuracy
>
 I can do it over the weekend, I have a ppc G4 800MHz iMac.  I'll need
 instructions on what to do for benchmarking.
>>> patch that adds benchmarking is below
>>> that and trying to decode some mpeg2 like with
>>>  -v 99 -i matrixbench_mpeg2.mpg -f null -
>>>
>>> should result in some timing values
>>> i cant say for sure though, as this does not work under qemu
>>> under qemu i just get 0
>>>
>>>
>>> diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c
>>> index e7a585d..94b140d 100644
>>> --- a/libavcodec/mpegvideo_motion.c
>>> +++ b/libavcodec/mpegvideo_motion.c
>>> @@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s,
>>> op_pixels_func (*pix_op)[4],
>>> qpel_mc_func (*qpix_op)[16])
>>>  {
>>> +START_TIMER
>>>  #if !CONFIG_SMALL
>>>  if (s->out_format == FMT_MPEG1)
>>>  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
>>> @@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s,
>>>  #endif
>>>  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
>>>  ref_picture, pix_op, qpix_op, 0);
>>> +STOP_TIMER("MC")
>>>  }
>>>
>>>
>>
>> git am wouldn't apply the patches for me (I just saved the message
>> from Thunderbird to .eml file and tried to feed that to git am). So,
>> I had to trim them and use patch -p1 to apply manually.  The patch
>> for util_altivec.h wouldn't apply so I patched manually.
>>
> 
>> I ran both builds twice and captured the output from the second run
>> of each build, it's in the attachment.  By the looks of it there is
>> no difference in performance.
> 
> to compare START/STOP_TIMER data its generally best to run the
> test a few times (like 3) and compare the values from each that
> have some specific number or runs, like
> 
>> 681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A
> vs.
>> 668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A
> 
> but from these 2 tests it seems you are correct and theres no
> significant difference so theres probably not much point in doing
> further tests
> 

It might be a good idea to try -threads 1 for the input file as well
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-14 Thread Michael Niedermayer
On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote:
> On 11/14/14 07:34, Michael Niedermayer wrote:
> >On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote:
> >>On Nov 13, 2014 4:15 PM, "Michael Niedermayer"  wrote:
> >>>On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:
> This needs to be benchmarked, i do not have ppc hw
> This is on big endian more similar to how the code was before
> >>79e0255956bc8fcdb143f39b2e45db77144ac017
> Signed-off-by: Michael Niedermayer 
> >>>ping
> >>>
> >>>can someone with a altivec PPC please benchmark this
> >>>or do all the ppc people want code to be slow and unoptimized ?
> >>>iam also happy to benchmark it myself if someone provides a ppc or
> >>>account on a altivec ppc that is reasonable idle so benchmarking is
> >>>possible with some accuracy
> >>>
> >>I can do it over the weekend, I have a ppc G4 800MHz iMac.  I'll need
> >>instructions on what to do for benchmarking.
> >patch that adds benchmarking is below
> >that and trying to decode some mpeg2 like with
> >  -v 99 -i matrixbench_mpeg2.mpg -f null -
> >
> >should result in some timing values
> >i cant say for sure though, as this does not work under qemu
> >under qemu i just get 0
> >
> >
> >diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c
> >index e7a585d..94b140d 100644
> >--- a/libavcodec/mpegvideo_motion.c
> >+++ b/libavcodec/mpegvideo_motion.c
> >@@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s,
> > op_pixels_func (*pix_op)[4],
> > qpel_mc_func (*qpix_op)[16])
> >  {
> >+START_TIMER
> >  #if !CONFIG_SMALL
> >  if (s->out_format == FMT_MPEG1)
> >  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
> >@@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s,
> >  #endif
> >  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
> >  ref_picture, pix_op, qpix_op, 0);
> >+STOP_TIMER("MC")
> >  }
> >
> >
> 
> git am wouldn't apply the patches for me (I just saved the message
> from Thunderbird to .eml file and tried to feed that to git am). So,
> I had to trim them and use patch -p1 to apply manually.  The patch
> for util_altivec.h wouldn't apply so I patched manually.
> 

> I ran both builds twice and captured the output from the second run
> of each build, it's in the attachment.  By the looks of it there is
> no difference in performance.

to compare START/STOP_TIMER data its generally best to run the
test a few times (like 3) and compare the values from each that
have some specific number or runs, like

> 681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A
vs.
> 668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A

but from these 2 tests it seems you are correct and theres no
significant difference so theres probably not much point in doing
further tests

Thanks!

PS: if the tests take too long, a shorter video can be used or the
duration can be limited


> 
> If you'd like me to try something else -- I can try again tomorrow.
> 
> Pavel.
> 

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-14 Thread Pavel Koshevoy

On 11/14/14 07:34, Michael Niedermayer wrote:

On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote:

On Nov 13, 2014 4:15 PM, "Michael Niedermayer"  wrote:

On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:

This needs to be benchmarked, i do not have ppc hw
This is on big endian more similar to how the code was before

79e0255956bc8fcdb143f39b2e45db77144ac017

Signed-off-by: Michael Niedermayer 

ping

can someone with a altivec PPC please benchmark this
or do all the ppc people want code to be slow and unoptimized ?
iam also happy to benchmark it myself if someone provides a ppc or
account on a altivec ppc that is reasonable idle so benchmarking is
possible with some accuracy


I can do it over the weekend, I have a ppc G4 800MHz iMac.  I'll need
instructions on what to do for benchmarking.

patch that adds benchmarking is below
that and trying to decode some mpeg2 like with
  -v 99 -i matrixbench_mpeg2.mpg -f null -

should result in some timing values
i cant say for sure though, as this does not work under qemu
under qemu i just get 0


diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c
index e7a585d..94b140d 100644
--- a/libavcodec/mpegvideo_motion.c
+++ b/libavcodec/mpegvideo_motion.c
@@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s,
 op_pixels_func (*pix_op)[4],
 qpel_mc_func (*qpix_op)[16])
  {
+START_TIMER
  #if !CONFIG_SMALL
  if (s->out_format == FMT_MPEG1)
  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
@@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s,
  #endif
  mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
  ref_picture, pix_op, qpix_op, 0);
+STOP_TIMER("MC")
  }




git am wouldn't apply the patches for me (I just saved the message from 
Thunderbird to .eml file and tried to feed that to git am). So, I had to 
trim them and use patch -p1 to apply manually.  The patch for 
util_altivec.h wouldn't apply so I patched manually.


I ran both builds twice and captured the output from the second run of 
each build, it's in the attachment.  By the looks of it there is no 
difference in performance.


If you'd like me to try something else -- I can try again tomorrow.

Pavel.

$ ./ffmpeg -v 99 -i ~/Movies/matrixbench_mpeg2.mpg -f null - > /tmp/vec_ld.txt
ffmpeg version N-67669-g53ab784 Copyright (c) 2000-2014 the FFmpeg developers
  built on Nov 14 2014 20:14:18 with gcc 4.2.1 (GCC) (Apple Inc. build 5577)
  configuration: --prefix=/Developer/ppc --disable-debug --disable-shared 
--enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis 
--enable-libopus --enable-libtheora --enable-libschroedinger 
--enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex 
--enable-pthreads --enable-gpl --enable-version3 --enable-libopencore-amrnb 
--enable-libopencore-amrwb --enable-postproc --enable-libx264 --enable-libxvid 
--enable-libass --enable-gnutls --enable-runtime-cpudetect 
--extra-cflags=-I/opt/local/include 
--extra-ldflags='-headerpad_max_install_names -L/opt/local/lib'
  libavutil  54. 11.100 / 54. 11.100
  libavcodec 56. 12.100 / 56. 12.100
  libavformat56. 12.103 / 56. 12.103
  libavdevice56.  3.100 / 56.  3.100
  libavfilter 5.  2.103 /  5.  2.103
  libswscale  3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc53.  3.100 / 53.  3.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument 
'99'.
Reading option '-i' ... matched as input file with argument 
'/Users/pavel/Movies/matrixbench_mpeg2.mpg'.
Reading option '-f' ... matched as option 'f' (force format) with argument 
'null'.
Reading option '-' ... matched as output file.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 99.
Successfully parsed a group of options.
Parsing a group of options: input file 
/Users/pavel/Movies/matrixbench_mpeg2.mpg.
Successfully parsed a group of options.
Opening an input file: /Users/pavel/Movies/matrixbench_mpeg2.mpg.
[mpeg @ 0x2808010] Format mpeg probed with size=2048 and score=26
[mpeg @ 0x2808010] Before avformat_find_stream_info() pos: 0 bytes read:32768 
seeks:0
[mpeg @ 0x2808010] probing stream 1 pp:2500
[mpeg @ 0x2808010] Probe with size=2012, packets=1 detected mpegvideo with 
score=25
[mpeg @ 0x2808010] probed stream 1
[mpeg @ 0x2808010] max_analyze_duration 500 reached at 500 microseconds
[NULL @ 0x2809810] start time for stream 0 is not set in 
estimate_timings_from_pts
[mpeg @ 0x2808010] After avformat_find_stream_info() pos: 0 bytes read:4247696 
seeks:3 frames:333
Input #0, mpeg, from '/Users/pavel/Movies/matrixbench_mpeg2.mpg':
  Duration: 00:03:07.66, start: 0.22, bitrate: 5633 kb/s
Stream #0:0[0x1bf], 0, 1/9: Data: dvd_nav_packet, 1/9
Stream #0:1[0x1e0], 127, 1/9: Vide

Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-14 Thread Michael Niedermayer
On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote:
> On Nov 13, 2014 4:15 PM, "Michael Niedermayer"  wrote:
> >
> > On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:
> > > This needs to be benchmarked, i do not have ppc hw
> > > This is on big endian more similar to how the code was before
> 79e0255956bc8fcdb143f39b2e45db77144ac017
> > >
> > > Signed-off-by: Michael Niedermayer 
> >
> > ping
> >
> > can someone with a altivec PPC please benchmark this
> > or do all the ppc people want code to be slow and unoptimized ?
> > iam also happy to benchmark it myself if someone provides a ppc or
> > account on a altivec ppc that is reasonable idle so benchmarking is
> > possible with some accuracy
> >
> 
> I can do it over the weekend, I have a ppc G4 800MHz iMac.  I'll need
> instructions on what to do for benchmarking.

patch that adds benchmarking is below
that and trying to decode some mpeg2 like with
 -v 99 -i matrixbench_mpeg2.mpg -f null -

should result in some timing values
i cant say for sure though, as this does not work under qemu
under qemu i just get 0


diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c
index e7a585d..94b140d 100644
--- a/libavcodec/mpegvideo_motion.c
+++ b/libavcodec/mpegvideo_motion.c
@@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s,
op_pixels_func (*pix_op)[4],
qpel_mc_func (*qpix_op)[16])
 {
+START_TIMER
 #if !CONFIG_SMALL
 if (s->out_format == FMT_MPEG1)
 mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
@@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s,
 #endif
 mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir,
 ref_picture, pix_op, qpix_op, 0);
+STOP_TIMER("MC")
 }



> 
> Pavel
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have often repented speaking, but never of holding my tongue.
-- Xenocrates


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-14 Thread Pavel Koshevoy
On Nov 13, 2014 4:15 PM, "Michael Niedermayer"  wrote:
>
> On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:
> > This needs to be benchmarked, i do not have ppc hw
> > This is on big endian more similar to how the code was before
79e0255956bc8fcdb143f39b2e45db77144ac017
> >
> > Signed-off-by: Michael Niedermayer 
>
> ping
>
> can someone with a altivec PPC please benchmark this
> or do all the ppc people want code to be slow and unoptimized ?
> iam also happy to benchmark it myself if someone provides a ppc or
> account on a altivec ppc that is reasonable idle so benchmarking is
> possible with some accuracy
>

I can do it over the weekend, I have a ppc G4 800MHz iMac.  I'll need
instructions on what to do for benchmarking.

Pavel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-13 Thread Michael Niedermayer
On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote:
> This needs to be benchmarked, i do not have ppc hw
> This is on big endian more similar to how the code was before 
> 79e0255956bc8fcdb143f39b2e45db77144ac017
> 
> Signed-off-by: Michael Niedermayer 

ping

can someone with a altivec PPC please benchmark this
or do all the ppc people want code to be slow and unoptimized ?
iam also happy to benchmark it myself if someone provides a ppc or
account on a altivec ppc that is reasonable idle so benchmarking is
possible with some accuracy

Thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is what and why we do it that matters, not just one of them.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads

2014-11-07 Thread Michael Niedermayer
This needs to be benchmarked, i do not have ppc hw
This is on big endian more similar to how the code was before 
79e0255956bc8fcdb143f39b2e45db77144ac017

Signed-off-by: Michael Niedermayer 
---
 libavcodec/ppc/hpeldsp_altivec.c |   30 ++
 libavutil/ppc/util_altivec.h |   16 
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/libavcodec/ppc/hpeldsp_altivec.c b/libavcodec/ppc/hpeldsp_altivec.c
index 87a1f05..05d8b81 100644
--- a/libavcodec/ppc/hpeldsp_altivec.c
+++ b/libavcodec/ppc/hpeldsp_altivec.c
@@ -123,8 +123,7 @@ static void put_pixels8_xy2_altivec(uint8_t *block, const 
uint8_t *pixels, ptrdi
 register const vector unsigned char vczero = (const vector unsigned 
char)vec_splat_u8(0);
 register const vector unsigned short vctwo = (const vector unsigned 
short)vec_splat_u16(2);
 
-pixelsv1 = VEC_LD(0, pixels);
-pixelsv2 = VEC_LD(1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, 0, pixels);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
 pixelsv2 = VEC_MERGEH(vczero, pixelsv2);
 
@@ -136,8 +135,7 @@ static void put_pixels8_xy2_altivec(uint8_t *block, const 
uint8_t *pixels, ptrdi
 int rightside = ((unsigned long)block & 0x000F);
 blockv = vec_ld(0, block);
 
-pixelsv1 = unaligned_load(line_size, pixels);
-pixelsv2 = unaligned_load(line_size+1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, line_size, pixels);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
 pixelsv2 = VEC_MERGEH(vczero, pixelsv2);
 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
@@ -171,8 +169,7 @@ static void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, 
const uint8_t *pixels
 register const vector unsigned short vcone = (const vector unsigned 
short)vec_splat_u16(1);
 register const vector unsigned short vctwo = (const vector unsigned 
short)vec_splat_u16(2);
 
-pixelsv1 = VEC_LD(0, pixels);
-pixelsv2 = VEC_LD(1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, 0, pixels);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
 pixelsv2 = VEC_MERGEH(vczero, pixelsv2);
 pixelssum1 = vec_add((vector unsigned short)pixelsv1,
@@ -183,8 +180,7 @@ static void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, 
const uint8_t *pixels
 int rightside = ((unsigned long)block & 0x000F);
 blockv = vec_ld(0, block);
 
-pixelsv1 = unaligned_load(line_size, pixels);
-pixelsv2 = unaligned_load(line_size+1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, line_size, pixels);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
 pixelsv2 = VEC_MERGEH(vczero, pixelsv2);
 pixelssum2 = vec_add((vector unsigned short)pixelsv1,
@@ -218,8 +214,7 @@ static void put_pixels16_xy2_altivec(uint8_t * block, const 
uint8_t * pixels, pt
 register const vector unsigned char vczero = (const vector unsigned 
char)vec_splat_u8(0);
 register const vector unsigned short vctwo = (const vector unsigned 
short)vec_splat_u16(2);
 
-pixelsv1 = VEC_LD(0, pixels);
-pixelsv2 = VEC_LD(1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, 0, pixels);
 pixelsv3 = VEC_MERGEL(vczero, pixelsv1);
 pixelsv4 = VEC_MERGEL(vczero, pixelsv2);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
@@ -234,8 +229,7 @@ static void put_pixels16_xy2_altivec(uint8_t * block, const 
uint8_t * pixels, pt
 for (i = 0; i < h ; i++) {
 blockv = vec_ld(0, block);
 
-pixelsv1 = unaligned_load(line_size, pixels);
-pixelsv2 = unaligned_load(line_size+1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, line_size, pixels);
 
 pixelsv3 = VEC_MERGEL(vczero, pixelsv1);
 pixelsv4 = VEC_MERGEL(vczero, pixelsv2);
@@ -274,8 +268,7 @@ static void put_no_rnd_pixels16_xy2_altivec(uint8_t * 
block, const uint8_t * pix
 register const vector unsigned short vcone = (const vector unsigned 
short)vec_splat_u16(1);
 register const vector unsigned short vctwo = (const vector unsigned 
short)vec_splat_u16(2);
 
-pixelsv1 = VEC_LD(0, pixels);
-pixelsv2 = VEC_LD(1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, 0, pixels);
 pixelsv3 = VEC_MERGEL(vczero, pixelsv1);
 pixelsv4 = VEC_MERGEL(vczero, pixelsv2);
 pixelsv1 = VEC_MERGEH(vczero, pixelsv1);
@@ -288,8 +281,7 @@ static void put_no_rnd_pixels16_xy2_altivec(uint8_t * 
block, const uint8_t * pix
 pixelssum1 = vec_add(pixelssum1, vcone);
 
 for (i = 0; i < h ; i++) {
-pixelsv1 = unaligned_load(line_size, pixels);
-pixelsv2 = unaligned_load(line_size+1, pixels);
+VEC_LD2(pixelsv1, pixelsv2, line_size, pixels);
 
 pixelsv3 = VEC_MERGEL(vczero, pixelsv1);
 pixelsv4 = VEC_MERGEL(vczero, pixelsv2);
@@ -329,8 +321,7 @@ static void avg_pixels8_xy2_altivec(uint8_t *block, const 
uint8_t *pixels, ptrdi
 register const vector unsigned short vctwo = (const vector unsigned short)
 vec_splat_u16(2);
 
-pixelsv1 = VEC_LD(0, pixels);
-pixelsv