Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On Sat, Nov 15, 2014 at 07:28:25PM -0700, Pavel Koshevoy wrote: > On 11/15/14 18:12, James Almer wrote: > >On 15/11/14 1:50 AM, Michael Niedermayer wrote: > >>On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote: > >>>I ran both builds twice and captured the output from the second run > >>>of each build, it's in the attachment. By the looks of it there is > >>>no difference in performance. > >>to compare START/STOP_TIMER data its generally best to run the > >>test a few times (like 3) and compare the values from each that > >>have some specific number or runs, like > >> > >>>681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A > >>vs. > >>>668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A > >>but from these 2 tests it seems you are correct and theres no > >>significant difference so theres probably not much point in doing > >>further tests > >> > >It might be a good idea to try -threads 1 for the input file as well > > > > I've done 4 runs for each build using vec_ld and VEC_LD2, and logged > the results of the last 3 runs for each build. > The results are in the attachment. This time I added -an -threads 1 > and the fps went up for both builds. It seems VEC_LD2 is slightly > faster. > > I am not sure I've put -threads 1 in the right place on the command > line, and I don't know if it matters -- this is a single-core ppc > G4. > > Let me know if you would like me to try something else. no, thanks alot it seems the conclusion is that its 1 cpu cycle faster for you and slower for carl, thus overall its basically the same speed. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When the tyrant has disposed of foreign enemies by conquest or treaty, and there is nothing more to fear from them, then he is always stirring up some war or other, in order that the people may require a leader. -- Plato signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On 11/15/14 18:12, James Almer wrote: On 15/11/14 1:50 AM, Michael Niedermayer wrote: On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote: I ran both builds twice and captured the output from the second run of each build, it's in the attachment. By the looks of it there is no difference in performance. to compare START/STOP_TIMER data its generally best to run the test a few times (like 3) and compare the values from each that have some specific number or runs, like 681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A vs. 668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A but from these 2 tests it seems you are correct and theres no significant difference so theres probably not much point in doing further tests It might be a good idea to try -threads 1 for the input file as well I've done 4 runs for each build using vec_ld and VEC_LD2, and logged the results of the last 3 runs for each build. The results are in the attachment. This time I added -an -threads 1 and the fps went up for both builds. It seems VEC_LD2 is slightly faster. I am not sure I've put -threads 1 in the right place on the command line, and I don't know if it matters -- this is a single-core ppc G4. Let me know if you would like me to try something else. Pavel $ ./ffmpeg -v 99 -i ~/Movies/matrixbench_mpeg2.mpg -an -threads 1 -f null - ffmpeg version N-67669-g53ab784 Copyright (c) 2000-2014 the FFmpeg developers built on Nov 14 2014 20:14:18 with gcc 4.2.1 (GCC) (Apple Inc. build 5577) configuration: --prefix=/Developer/ppc --disable-debug --disable-shared --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --enable-pthreads --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-postproc --enable-libx264 --enable-libxvid --enable-libass --enable-gnutls --enable-runtime-cpudetect --extra-cflags=-I/opt/local/include --extra-ldflags='-headerpad_max_install_names -L/opt/local/lib' libavutil 54. 11.100 / 54. 11.100 libavcodec 56. 12.100 / 56. 12.100 libavformat56. 12.103 / 56. 12.103 libavdevice56. 3.100 / 56. 3.100 libavfilter 5. 2.103 / 5. 2.103 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc53. 3.100 / 53. 3.100 Splitting the commandline. Reading option '-v' ... matched as option 'v' (set logging level) with argument '99'. Reading option '-i' ... matched as input file with argument '/Users/pavel/Movies/matrixbench_mpeg2.mpg'. Reading option '-an' ... matched as option 'an' (disable audio) with argument '1'. Reading option '-threads' ... matched as AVOption 'threads' with argument '1'. Reading option '-f' ... matched as option 'f' (force format) with argument 'null'. Reading option '-' ... matched as output file. Finished splitting the commandline. Parsing a group of options: global . Applying option v (set logging level) with argument 99. Successfully parsed a group of options. Parsing a group of options: input file /Users/pavel/Movies/matrixbench_mpeg2.mpg. Successfully parsed a group of options. Opening an input file: /Users/pavel/Movies/matrixbench_mpeg2.mpg. [mpeg @ 0x2808010] Format mpeg probed with size=2048 and score=26 [mpeg @ 0x2808010] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 [mpeg @ 0x2808010] probing stream 1 pp:2500 [mpeg @ 0x2808010] Probe with size=2012, packets=1 detected mpegvideo with score=25 [mpeg @ 0x2808010] probed stream 1 [mpeg @ 0x2808010] max_analyze_duration 500 reached at 500 microseconds [NULL @ 0x2809810] start time for stream 0 is not set in estimate_timings_from_pts [mpeg @ 0x2808010] After avformat_find_stream_info() pos: 0 bytes read:4247696 seeks:3 frames:333 Input #0, mpeg, from '/Users/pavel/Movies/matrixbench_mpeg2.mpg': Duration: 00:03:07.66, start: 0.22, bitrate: 5633 kb/s Stream #0:0[0x1bf], 0, 1/9: Data: dvd_nav_packet, 1/9 Stream #0:1[0x1e0], 127, 1/9: Video: mpeg2video (Main), yuv420p(tv, bt470bg/bt470m/bt470m, left), 720x576 [SAR 16:15 DAR 4:3], 1/50, max. 11421 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0:2[0x1c0], 206, 1/9: Audio: mp2, 48000 Hz, stereo, s16p, 384 kb/s Successfully opened the file. Parsing a group of options: output file -. Applying option an (disable audio) with argument 1. Applying option f (force format) with argument null. Successfully parsed a group of options. Opening an output file: -. Successfully opened the file. [graph 0 input from stream 0:1 @ 0x2226b40] Setting 'video_size' to value '720x576' [graph 0 input from stream 0:1 @ 0x2226b40] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:1 @ 0x2226b40] Setting 'time_base' to value '1/9' [graph 0 input from stream 0:1 @ 0x2226b40] Setting 'pixel_aspect' to value '16/
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0),vec_ld(1) by VEC_LD2() which has fewer loads
Michael Niedermayer gmx.at> writes: > This needs to be benchmarked, i do not have ppc hw Decoding mpeg2video showed a slightly lower speed for START_TIMER with the patch than without. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On 15/11/14 1:50 AM, Michael Niedermayer wrote: > On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote: >> On 11/14/14 07:34, Michael Niedermayer wrote: >>> On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote: On Nov 13, 2014 4:15 PM, "Michael Niedermayer" wrote: > On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: >> This needs to be benchmarked, i do not have ppc hw >> This is on big endian more similar to how the code was before 79e0255956bc8fcdb143f39b2e45db77144ac017 >> Signed-off-by: Michael Niedermayer > ping > > can someone with a altivec PPC please benchmark this > or do all the ppc people want code to be slow and unoptimized ? > iam also happy to benchmark it myself if someone provides a ppc or > account on a altivec ppc that is reasonable idle so benchmarking is > possible with some accuracy > I can do it over the weekend, I have a ppc G4 800MHz iMac. I'll need instructions on what to do for benchmarking. >>> patch that adds benchmarking is below >>> that and trying to decode some mpeg2 like with >>> -v 99 -i matrixbench_mpeg2.mpg -f null - >>> >>> should result in some timing values >>> i cant say for sure though, as this does not work under qemu >>> under qemu i just get 0 >>> >>> >>> diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c >>> index e7a585d..94b140d 100644 >>> --- a/libavcodec/mpegvideo_motion.c >>> +++ b/libavcodec/mpegvideo_motion.c >>> @@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s, >>> op_pixels_func (*pix_op)[4], >>> qpel_mc_func (*qpix_op)[16]) >>> { >>> +START_TIMER >>> #if !CONFIG_SMALL >>> if (s->out_format == FMT_MPEG1) >>> mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, >>> @@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s, >>> #endif >>> mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, >>> ref_picture, pix_op, qpix_op, 0); >>> +STOP_TIMER("MC") >>> } >>> >>> >> >> git am wouldn't apply the patches for me (I just saved the message >> from Thunderbird to .eml file and tried to feed that to git am). So, >> I had to trim them and use patch -p1 to apply manually. The patch >> for util_altivec.h wouldn't apply so I patched manually. >> > >> I ran both builds twice and captured the output from the second run >> of each build, it's in the attachment. By the looks of it there is >> no difference in performance. > > to compare START/STOP_TIMER data its generally best to run the > test a few times (like 3) and compare the values from each that > have some specific number or runs, like > >> 681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A > vs. >> 668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A > > but from these 2 tests it seems you are correct and theres no > significant difference so theres probably not much point in doing > further tests > It might be a good idea to try -threads 1 for the input file as well ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On Fri, Nov 14, 2014 at 09:00:31PM -0700, Pavel Koshevoy wrote: > On 11/14/14 07:34, Michael Niedermayer wrote: > >On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote: > >>On Nov 13, 2014 4:15 PM, "Michael Niedermayer" wrote: > >>>On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: > This needs to be benchmarked, i do not have ppc hw > This is on big endian more similar to how the code was before > >>79e0255956bc8fcdb143f39b2e45db77144ac017 > Signed-off-by: Michael Niedermayer > >>>ping > >>> > >>>can someone with a altivec PPC please benchmark this > >>>or do all the ppc people want code to be slow and unoptimized ? > >>>iam also happy to benchmark it myself if someone provides a ppc or > >>>account on a altivec ppc that is reasonable idle so benchmarking is > >>>possible with some accuracy > >>> > >>I can do it over the weekend, I have a ppc G4 800MHz iMac. I'll need > >>instructions on what to do for benchmarking. > >patch that adds benchmarking is below > >that and trying to decode some mpeg2 like with > > -v 99 -i matrixbench_mpeg2.mpg -f null - > > > >should result in some timing values > >i cant say for sure though, as this does not work under qemu > >under qemu i just get 0 > > > > > >diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c > >index e7a585d..94b140d 100644 > >--- a/libavcodec/mpegvideo_motion.c > >+++ b/libavcodec/mpegvideo_motion.c > >@@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s, > > op_pixels_func (*pix_op)[4], > > qpel_mc_func (*qpix_op)[16]) > > { > >+START_TIMER > > #if !CONFIG_SMALL > > if (s->out_format == FMT_MPEG1) > > mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, > >@@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s, > > #endif > > mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, > > ref_picture, pix_op, qpix_op, 0); > >+STOP_TIMER("MC") > > } > > > > > > git am wouldn't apply the patches for me (I just saved the message > from Thunderbird to .eml file and tried to feed that to git am). So, > I had to trim them and use patch -p1 to apply manually. The patch > for util_altivec.h wouldn't apply so I patched manually. > > I ran both builds twice and captured the output from the second run > of each build, it's in the attachment. By the looks of it there is > no difference in performance. to compare START/STOP_TIMER data its generally best to run the test a few times (like 3) and compare the values from each that have some specific number or runs, like > 681 UNITS in MC, 4192359 runs, 1945 skips0:01:40.88 bitrate=N/A vs. > 668 UNITS in MC, 4192326 runs, 1978 skips0:01:40.16 bitrate=N/A but from these 2 tests it seems you are correct and theres no significant difference so theres probably not much point in doing further tests Thanks! PS: if the tests take too long, a shorter video can be used or the duration can be limited > > If you'd like me to try something else -- I can try again tomorrow. > > Pavel. > [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On 11/14/14 07:34, Michael Niedermayer wrote: On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote: On Nov 13, 2014 4:15 PM, "Michael Niedermayer" wrote: On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: This needs to be benchmarked, i do not have ppc hw This is on big endian more similar to how the code was before 79e0255956bc8fcdb143f39b2e45db77144ac017 Signed-off-by: Michael Niedermayer ping can someone with a altivec PPC please benchmark this or do all the ppc people want code to be slow and unoptimized ? iam also happy to benchmark it myself if someone provides a ppc or account on a altivec ppc that is reasonable idle so benchmarking is possible with some accuracy I can do it over the weekend, I have a ppc G4 800MHz iMac. I'll need instructions on what to do for benchmarking. patch that adds benchmarking is below that and trying to decode some mpeg2 like with -v 99 -i matrixbench_mpeg2.mpg -f null - should result in some timing values i cant say for sure though, as this does not work under qemu under qemu i just get 0 diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c index e7a585d..94b140d 100644 --- a/libavcodec/mpegvideo_motion.c +++ b/libavcodec/mpegvideo_motion.c @@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s, op_pixels_func (*pix_op)[4], qpel_mc_func (*qpix_op)[16]) { +START_TIMER #if !CONFIG_SMALL if (s->out_format == FMT_MPEG1) mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, @@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s, #endif mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, ref_picture, pix_op, qpix_op, 0); +STOP_TIMER("MC") } git am wouldn't apply the patches for me (I just saved the message from Thunderbird to .eml file and tried to feed that to git am). So, I had to trim them and use patch -p1 to apply manually. The patch for util_altivec.h wouldn't apply so I patched manually. I ran both builds twice and captured the output from the second run of each build, it's in the attachment. By the looks of it there is no difference in performance. If you'd like me to try something else -- I can try again tomorrow. Pavel. $ ./ffmpeg -v 99 -i ~/Movies/matrixbench_mpeg2.mpg -f null - > /tmp/vec_ld.txt ffmpeg version N-67669-g53ab784 Copyright (c) 2000-2014 the FFmpeg developers built on Nov 14 2014 20:14:18 with gcc 4.2.1 (GCC) (Apple Inc. build 5577) configuration: --prefix=/Developer/ppc --disable-debug --disable-shared --enable-swscale --enable-avfilter --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libspeex --enable-pthreads --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-postproc --enable-libx264 --enable-libxvid --enable-libass --enable-gnutls --enable-runtime-cpudetect --extra-cflags=-I/opt/local/include --extra-ldflags='-headerpad_max_install_names -L/opt/local/lib' libavutil 54. 11.100 / 54. 11.100 libavcodec 56. 12.100 / 56. 12.100 libavformat56. 12.103 / 56. 12.103 libavdevice56. 3.100 / 56. 3.100 libavfilter 5. 2.103 / 5. 2.103 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc53. 3.100 / 53. 3.100 Splitting the commandline. Reading option '-v' ... matched as option 'v' (set logging level) with argument '99'. Reading option '-i' ... matched as input file with argument '/Users/pavel/Movies/matrixbench_mpeg2.mpg'. Reading option '-f' ... matched as option 'f' (force format) with argument 'null'. Reading option '-' ... matched as output file. Finished splitting the commandline. Parsing a group of options: global . Applying option v (set logging level) with argument 99. Successfully parsed a group of options. Parsing a group of options: input file /Users/pavel/Movies/matrixbench_mpeg2.mpg. Successfully parsed a group of options. Opening an input file: /Users/pavel/Movies/matrixbench_mpeg2.mpg. [mpeg @ 0x2808010] Format mpeg probed with size=2048 and score=26 [mpeg @ 0x2808010] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 [mpeg @ 0x2808010] probing stream 1 pp:2500 [mpeg @ 0x2808010] Probe with size=2012, packets=1 detected mpegvideo with score=25 [mpeg @ 0x2808010] probed stream 1 [mpeg @ 0x2808010] max_analyze_duration 500 reached at 500 microseconds [NULL @ 0x2809810] start time for stream 0 is not set in estimate_timings_from_pts [mpeg @ 0x2808010] After avformat_find_stream_info() pos: 0 bytes read:4247696 seeks:3 frames:333 Input #0, mpeg, from '/Users/pavel/Movies/matrixbench_mpeg2.mpg': Duration: 00:03:07.66, start: 0.22, bitrate: 5633 kb/s Stream #0:0[0x1bf], 0, 1/9: Data: dvd_nav_packet, 1/9 Stream #0:1[0x1e0], 127, 1/9: Vide
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On Fri, Nov 14, 2014 at 06:45:55AM -0700, Pavel Koshevoy wrote: > On Nov 13, 2014 4:15 PM, "Michael Niedermayer" wrote: > > > > On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: > > > This needs to be benchmarked, i do not have ppc hw > > > This is on big endian more similar to how the code was before > 79e0255956bc8fcdb143f39b2e45db77144ac017 > > > > > > Signed-off-by: Michael Niedermayer > > > > ping > > > > can someone with a altivec PPC please benchmark this > > or do all the ppc people want code to be slow and unoptimized ? > > iam also happy to benchmark it myself if someone provides a ppc or > > account on a altivec ppc that is reasonable idle so benchmarking is > > possible with some accuracy > > > > I can do it over the weekend, I have a ppc G4 800MHz iMac. I'll need > instructions on what to do for benchmarking. patch that adds benchmarking is below that and trying to decode some mpeg2 like with -v 99 -i matrixbench_mpeg2.mpg -f null - should result in some timing values i cant say for sure though, as this does not work under qemu under qemu i just get 0 diff --git a/libavcodec/mpegvideo_motion.c b/libavcodec/mpegvideo_motion.c index e7a585d..94b140d 100644 --- a/libavcodec/mpegvideo_motion.c +++ b/libavcodec/mpegvideo_motion.c @@ -976,6 +976,7 @@ void ff_mpv_motion(MpegEncContext *s, op_pixels_func (*pix_op)[4], qpel_mc_func (*qpix_op)[16]) { +START_TIMER #if !CONFIG_SMALL if (s->out_format == FMT_MPEG1) mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, @@ -984,4 +985,5 @@ void ff_mpv_motion(MpegEncContext *s, #endif mpv_motion_internal(s, dest_y, dest_cb, dest_cr, dir, ref_picture, pix_op, qpix_op, 0); +STOP_TIMER("MC") } > > Pavel > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I have often repented speaking, but never of holding my tongue. -- Xenocrates signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On Nov 13, 2014 4:15 PM, "Michael Niedermayer" wrote: > > On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: > > This needs to be benchmarked, i do not have ppc hw > > This is on big endian more similar to how the code was before 79e0255956bc8fcdb143f39b2e45db77144ac017 > > > > Signed-off-by: Michael Niedermayer > > ping > > can someone with a altivec PPC please benchmark this > or do all the ppc people want code to be slow and unoptimized ? > iam also happy to benchmark it myself if someone provides a ppc or > account on a altivec ppc that is reasonable idle so benchmarking is > possible with some accuracy > I can do it over the weekend, I have a ppc G4 800MHz iMac. I'll need instructions on what to do for benchmarking. Pavel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
On Fri, Nov 07, 2014 at 03:12:19PM +0100, Michael Niedermayer wrote: > This needs to be benchmarked, i do not have ppc hw > This is on big endian more similar to how the code was before > 79e0255956bc8fcdb143f39b2e45db77144ac017 > > Signed-off-by: Michael Niedermayer ping can someone with a altivec PPC please benchmark this or do all the ppc people want code to be slow and unoptimized ? iam also happy to benchmark it myself if someone provides a ppc or account on a altivec ppc that is reasonable idle so benchmarking is possible with some accuracy Thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB It is what and why we do it that matters, not just one of them. signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] ppc: replace vec_ld(0), vec_ld(1) by VEC_LD2() which has fewer loads
This needs to be benchmarked, i do not have ppc hw This is on big endian more similar to how the code was before 79e0255956bc8fcdb143f39b2e45db77144ac017 Signed-off-by: Michael Niedermayer --- libavcodec/ppc/hpeldsp_altivec.c | 30 ++ libavutil/ppc/util_altivec.h | 16 2 files changed, 26 insertions(+), 20 deletions(-) diff --git a/libavcodec/ppc/hpeldsp_altivec.c b/libavcodec/ppc/hpeldsp_altivec.c index 87a1f05..05d8b81 100644 --- a/libavcodec/ppc/hpeldsp_altivec.c +++ b/libavcodec/ppc/hpeldsp_altivec.c @@ -123,8 +123,7 @@ static void put_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, ptrdi register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); -pixelsv1 = VEC_LD(0, pixels); -pixelsv2 = VEC_LD(1, pixels); +VEC_LD2(pixelsv1, pixelsv2, 0, pixels); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); pixelsv2 = VEC_MERGEH(vczero, pixelsv2); @@ -136,8 +135,7 @@ static void put_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, ptrdi int rightside = ((unsigned long)block & 0x000F); blockv = vec_ld(0, block); -pixelsv1 = unaligned_load(line_size, pixels); -pixelsv2 = unaligned_load(line_size+1, pixels); +VEC_LD2(pixelsv1, pixelsv2, line_size, pixels); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); pixelsv2 = VEC_MERGEH(vczero, pixelsv2); pixelssum2 = vec_add((vector unsigned short)pixelsv1, @@ -171,8 +169,7 @@ static void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1); register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); -pixelsv1 = VEC_LD(0, pixels); -pixelsv2 = VEC_LD(1, pixels); +VEC_LD2(pixelsv1, pixelsv2, 0, pixels); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); pixelsv2 = VEC_MERGEH(vczero, pixelsv2); pixelssum1 = vec_add((vector unsigned short)pixelsv1, @@ -183,8 +180,7 @@ static void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels int rightside = ((unsigned long)block & 0x000F); blockv = vec_ld(0, block); -pixelsv1 = unaligned_load(line_size, pixels); -pixelsv2 = unaligned_load(line_size+1, pixels); +VEC_LD2(pixelsv1, pixelsv2, line_size, pixels); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); pixelsv2 = VEC_MERGEH(vczero, pixelsv2); pixelssum2 = vec_add((vector unsigned short)pixelsv1, @@ -218,8 +214,7 @@ static void put_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, pt register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0); register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); -pixelsv1 = VEC_LD(0, pixels); -pixelsv2 = VEC_LD(1, pixels); +VEC_LD2(pixelsv1, pixelsv2, 0, pixels); pixelsv3 = VEC_MERGEL(vczero, pixelsv1); pixelsv4 = VEC_MERGEL(vczero, pixelsv2); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); @@ -234,8 +229,7 @@ static void put_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, pt for (i = 0; i < h ; i++) { blockv = vec_ld(0, block); -pixelsv1 = unaligned_load(line_size, pixels); -pixelsv2 = unaligned_load(line_size+1, pixels); +VEC_LD2(pixelsv1, pixelsv2, line_size, pixels); pixelsv3 = VEC_MERGEL(vczero, pixelsv1); pixelsv4 = VEC_MERGEL(vczero, pixelsv2); @@ -274,8 +268,7 @@ static void put_no_rnd_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pix register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1); register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2); -pixelsv1 = VEC_LD(0, pixels); -pixelsv2 = VEC_LD(1, pixels); +VEC_LD2(pixelsv1, pixelsv2, 0, pixels); pixelsv3 = VEC_MERGEL(vczero, pixelsv1); pixelsv4 = VEC_MERGEL(vczero, pixelsv2); pixelsv1 = VEC_MERGEH(vczero, pixelsv1); @@ -288,8 +281,7 @@ static void put_no_rnd_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pix pixelssum1 = vec_add(pixelssum1, vcone); for (i = 0; i < h ; i++) { -pixelsv1 = unaligned_load(line_size, pixels); -pixelsv2 = unaligned_load(line_size+1, pixels); +VEC_LD2(pixelsv1, pixelsv2, line_size, pixels); pixelsv3 = VEC_MERGEL(vczero, pixelsv1); pixelsv4 = VEC_MERGEL(vczero, pixelsv2); @@ -329,8 +321,7 @@ static void avg_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, ptrdi register const vector unsigned short vctwo = (const vector unsigned short) vec_splat_u16(2); -pixelsv1 = VEC_LD(0, pixels); -pixelsv