Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Thomas Orgis
Sorry for being late to the party, but I have to say that this is a
rather unfortunate situation now. Not using the assembly-optimized
fixed-point ARM code of the arm_nofpu decoder and resorting to the
generic_fpu one (all plain C) will make mpg123 really slow in
comparison. I'm not sure what hardware we are targeting here ... is it
armel with softfloat? With gcc -mfpu=vfp, generic_fpu might be fine,
although using the neon decoder is still preferred on supporting CPUs.

There is no runtime detection in mpg123 for this and at least for the
decision of fixed or floating point decoding, it likely will never be
as that is a very basic decision on the whole decoder code, not just
some optimization. I can imagine combining generic_fpu and neon builds
with run-time detection, but this still assumes a hardware floating
point unit to make sense. We have arm_nofpu and generic_nofpu for the
cases without one.

Something which is possible right now is to produce one libmpg123.so with
the standard build to please users using slow ARM machines who just
want plain 16 bit playback and produce one libmpg123_float.so for people
using beefy machines and who are using audacious as a media player.

Well ... the least would be to offer builds with usage of vfp if
present (I see hints https://wiki.debian.org/ArmPorts that that's an
option, including runtime linker choice). In any case, ARM's a real
mess in that respect. Very hard to produce a build that is optimal for
that wide range of configurations that debian tries to support.

I could implement a conversion step to floating point with the
arm_nofpu decoder. That would make audacious work (although wasting
precision on machines that have hardware floating point, or even NEON)
and have the benefit of the command-line mpg123 still being fast with
16 bit output. A debian build targeting modern floating-point-capable
hardware would use generic_fpu or better the neon decoder to begin with.

Is there preference to have the faster decoder for debian without
floating point hardware?


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Reinhard Tartler
On Sun, Feb 16, 2014 at 5:46 AM, Thomas Orgis  wrote:
> Sorry for being late to the party, but I have to say that this is a
> rather unfortunate situation now.

Thomas, in libavcodec, we "solve" (or rather, workaround, depending on
the PoV) this problem by compiling the libraries multiple times and
installing the resulting variants in different subdirectories:

/usr/lib/arm-linux-gnueabi/libavcodec.so.54
/usr/lib/arm-linux-gnueabi/neon/vfp/libavcodec.so.54
/usr/lib/arm-linux-gnueabi/vfp/libavcodec.so.54
/usr/lib/arm-linux-gnueabihf/libavcodec.so.54
/usr/lib/arm-linux-gnueabihf/neon/vfp/libavcodec.so.54

This way, the run-time linker will pick-up the right version depending
on whatever capabilities the kernel autodetected at run-time. Would it
be possible to do something like this for mpg123 as well? This would
then require to compile

/usr/lib/arm-linux-gnueabi/libmpg123.so.0

for different flavors. However, we would need to make sure that
/usr/bin/mpg123 works with any of them. I'm wondering if this pattern
wouldn't work for mpg123, but I'm not familiar enough with the
codebase to say if this would work. Can you please comment?

-- 
regards,
Reinhard

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Thomas Orgis
Am Sun, 16 Feb 2014 12:14:46 -0500
schrieb Reinhard Tartler : 

> Thomas, in libavcodec, we "solve" (or rather, workaround, depending on
> the PoV) this problem by compiling the libraries multiple times

A sane approach. Runtime code selection in each application can be
stretched beyond the boundary of it being fun quickly.

> ...
> be possible to do something like this for mpg123 as well? This would
> then require to compile
> 
> /usr/lib/arm-linux-gnueabi/libmpg123.so.0
> for different flavors. However, we would need to make sure that
> /usr/bin/mpg123 works with any of them.

That would certainly work and is what I imagined. If this offers
link-time choice of fixed point, vfp or neon, I'd be overly happy for
avoiding headache over making runtime choice inside one build
(which seems rather impractical).

> I'm wondering if this pattern
> wouldn't work for mpg123, but I'm not familiar enough with the
> codebase to say if this would work. Can you please comment?

I don't see am obvious problem. As long as basic ABI fits (also regarding
floating point args), this should work as well as libavcodec. The
mpg123 binary queries the library for things like format support and
decoder engine and one build can work with arm_nofpu or generic_nofpu
(arm-linux-gnueabi), generic_fpu (arm-linux-gnueabi/vfp) and arm_neon
(arm-linux-gnueabi/neon/vfp/). I suppose the gnueabihf variants need an
separate build for the changed ABI of floating point arguments (?).

When setting up the environment to use a specific libmpg123, one should
see an effect in the output of

mpg123 --list-cpu

Just build mpg123 several times, with proper CFLAGS for -march and
friends and matching --with-cpu for configure. You can pick which
mpg123 binary to use, I suggest the most universal one. In the mpg123
binary itself, there is no assembly code affected by the --with-cpu
switch and the performance-relevant bits are in libmpg123. Same for the
other helper applications (mpg123-id3dump, mpg123-strip).

The only issue still present would be that the most basic libmpg123
still misses floating point output format for audacious. But my guess
is that you wouldn't use that player on a system without floating point
hardware. As I said, I can change that, for the sake of completeness
adding a bit of code converting from 16 bit to 32 bit float (and 24 and
32 bit integer) in the output buffer.

The only remaining difference between the libraries would be
performance and accuracy of output. Were you aware of the
--enable-int-quality flag to configure? Yet another set of decoders
being a bit slower but doing better rounding to 16 bit integers;-)

In any case, you might want to check out
mpg123's tests to verify that your various builds
decode properly:

svn co svn://svn.orgis.org/mpg123/test mpg123-test
cd mpg123-test
make # build helper programs (rms computation)
perl compliance.pl /path/to/mpg123

This makes a basic MPEG ISO compliance test, comparing decoder to
reference output (see http://mpg123.org, after scrolling down a bit).

Disclaimer: If anything does not work as I just promised, this is a bug
and will be addressed;-)


Alrighty then,

Thomas



signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Reinhard Tartler
Dear ARM porters,

Please see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=738981
for full context. I've uploaded a patch proposed by Riku that AFAIUI
makes mpg123 really slow on all arm targets, while unbreaking it on
some others.

As one of the maintainers of the mpg123 without familiarity about the
scope of the Debian arm port, I'm rather confused about what's the
best way to proceed here, and would need more input. Can someone more
experienced please advise?

Thanks,
Reinhard


-- Forwarded message --
From: Thomas Orgis 
Date: Sun, Feb 16, 2014 at 5:46 AM
Subject: Bug#738981: Switch to use generic_fpu for ARM
To: 738...@bugs.debian.org


Sorry for being late to the party, but I have to say that this is a
rather unfortunate situation now. Not using the assembly-optimized
fixed-point ARM code of the arm_nofpu decoder and resorting to the
generic_fpu one (all plain C) will make mpg123 really slow in
comparison. I'm not sure what hardware we are targeting here ... is it
armel with softfloat? With gcc -mfpu=vfp, generic_fpu might be fine,
although using the neon decoder is still preferred on supporting CPUs.

There is no runtime detection in mpg123 for this and at least for the
decision of fixed or floating point decoding, it likely will never be
as that is a very basic decision on the whole decoder code, not just
some optimization. I can imagine combining generic_fpu and neon builds
with run-time detection, but this still assumes a hardware floating
point unit to make sense. We have arm_nofpu and generic_nofpu for the
cases without one.

Something which is possible right now is to produce one libmpg123.so with
the standard build to please users using slow ARM machines who just
want plain 16 bit playback and produce one libmpg123_float.so for people
using beefy machines and who are using audacious as a media player.

Well ... the least would be to offer builds with usage of vfp if
present (I see hints https://wiki.debian.org/ArmPorts that that's an
option, including runtime linker choice). In any case, ARM's a real
mess in that respect. Very hard to produce a build that is optimal for
that wide range of configurations that debian tries to support.

I could implement a conversion step to floating point with the
arm_nofpu decoder. That would make audacious work (although wasting
precision on machines that have hardware floating point, or even NEON)
and have the benefit of the command-line mpg123 still being fast with
16 bit output. A debian build targeting modern floating-point-capable
hardware would use generic_fpu or better the neon decoder to begin with.

Is there preference to have the faster decoder for debian without
floating point hardware?

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


-- 
regards,
Reinhard


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread peter green

Reinhard Tartler wrote:

Dear ARM porters,

Please see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=738981
for full context. I've uploaded a patch proposed by Riku that AFAIUI
makes mpg123 really slow on all arm targets, while unbreaking it on
some others.

As one of the maintainers of the mpg123 without familiarity about the
scope of the Debian arm port

Debian has TWO official arm ports.

Armel has a minimum CPU requirement of armv4t with no gaurantee that 
there will be any floating point hardware at all.
Armhf has a minimum CPU requirement of armv7-a with vfpv3-d16, neon is 
not gauranteed to be available (but it is very likely to be in pratice).


As well as the two official arm ports there is also Raspbian which 
targets armv6 with vfpv2, raspbian also uses the armhf architecture name 
but can be distinguished by using dpkg-vendor.



-- Forwarded message --
From: Thomas Orgis 
Date: Sun, Feb 16, 2014 at 5:46 AM
Subject: Bug#738981: Switch to use generic_fpu for ARM
To: 738...@bugs.debian.org


Sorry for being late to the party, but I have to say that this is a
rather unfortunate situation now. Not using the assembly-optimized
fixed-point ARM code of the arm_nofpu decoder and resorting to the
generic_fpu one (all plain C) will make mpg123 really slow in
comparison. I'm not sure what hardware we are targeting here ... is it
armel with softfloat? With gcc -mfpu=vfp, generic_fpu might be fine,
although using the neon decoder is still preferred on supporting CPUs.

There is no runtime detection in mpg123 for this and at least for the
decision of fixed or floating point decoding, it likely will never be
as that is a very basic decision on the whole decoder code, not just
some optimization. I can imagine combining generic_fpu and neon builds
with run-time detection,
That would seem like it would be a good approach for Debian armhf if it 
was implemented.



 but this still assumes a hardware floating
point unit to make sense. We have arm_nofpu and generic_nofpu for the
cases without one.

Something which is possible right now is to produce one libmpg123.so with
the standard build to please users using slow ARM machines who just
want plain 16 bit playback and produce one libmpg123_float.so for people
using beefy machines and who are using audacious as a media player.
  
Seems like a good approach if the so files can be gauranteed to be ABI 
compatible. There is already a mechanism for loading different versions 
of libraries based on hardware capabilities.

I could implement a conversion step to floating point with the
arm_nofpu decoder. That would make audacious work (although wasting
precision on machines that have hardware floating point, or even NEON)
and have the benefit of the command-line mpg123 still being fast with
16 bit output.
Seems like a reasoanable idea for armel, frankly people who have vfp 
hardware probablly shouldn't be running armel anyway.


___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Reinhard Tartler
On Sun, Feb 16, 2014 at 6:37 PM, peter green  wrote:
> Reinhard Tartler wrote:
>>
>> Dear ARM porters,
>>
>> Please see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=738981
>> for full context. I've uploaded a patch proposed by Riku that AFAIUI
>> makes mpg123 really slow on all arm targets, while unbreaking it on
>> some others.
>>
>> As one of the maintainers of the mpg123 without familiarity about the
>> scope of the Debian arm port
>
> Debian has TWO official arm ports.
>
> Armel has a minimum CPU requirement of armv4t with no gaurantee that there
> will be any floating point hardware at all.
> Armhf has a minimum CPU requirement of armv7-a with vfpv3-d16, neon is not
> gauranteed to be available (but it is very likely to be in pratice).
>
> As well as the two official arm ports there is also Raspbian which targets
> armv6 with vfpv2, raspbian also uses the armhf architecture name but can be
> distinguished by using dpkg-vendor.

With this explanation, I think it would help a lot to all armhf users
if the following line was restricted to the armel port:

http://anonscm.debian.org/gitweb/?p=pkg-multimedia/mpg123.git;a=blob;f=debian/rules;h=afb018502621352914e757338a2dace6b65522cb;hb=HEAD#l25

Does anyone disagree that this is a good move, at least until
something like https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=738981#25
is implemented?

-- 
regards,
Reinhard

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread peter green

Reinhard Tartler wrote:

With this explanation, I think it would help a lot to all armhf users
if the following line was restricted to the armel port:

http://anonscm.debian.org/gitweb/?p=pkg-multimedia/mpg123.git;a=blob;f=debian/rules;h=afb018502621352914e757338a2dace6b65522cb;hb=HEAD#l25
  
Umm that link just goes to a page with a copy of your whole debian/rules 
file, can you tell us what line you are reffering to?


___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Reinhard Tartler
On Sun, Feb 16, 2014 at 7:02 PM, peter green  wrote:
> Reinhard Tartler wrote:
>>
>> With this explanation, I think it would help a lot to all armhf users
>> if the following line was restricted to the armel port:
>>
>>
>> http://anonscm.debian.org/gitweb/?p=pkg-multimedia/mpg123.git;a=blob;f=debian/rules;h=afb018502621352914e757338a2dace6b65522cb;hb=HEAD#l25
>>
>
> Umm that link just goes to a page with a copy of your whole debian/rules
> file, can you tell us what line you are reffering to?

I mean line 55 to limit the effect of not using a FPU to just armel.



-- 
regards,
Reinhard

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread peter green

Reinhard Tartler wrote:
I mean line 55 
  
I've just looked at ./configure --help and AIUI there are four possible 
values of --with-cpu for arm systems


--with-cpu=generic_fpu  Use generic processor code with floating 
point arithmetic
--with-cpu=generic_nofpu  Use generic processor code with fixed 
point arithmetic (p.ex. ARM, experimental)
--with-cpu=generic_dither Use generic processor code with floating 
point arithmetic and dithering for 1to1 16bit decoding.
--with-cpu=neon Use code optimized for ARM NEON SIMD engine 
(Cortex-A series)
--with-cpu=arm_nofpuUse code optimized for ARM processors with fixed 
point arithmetic (experimental)


We can't use the neon option because no arm port of debian gaurantees 
that neon will be available. So I belive until the no-fpu options are 
made compatible with audacious and/or some form of runtime detection is 
implemeted (either using ld.so.hwcaps to load different versions of the 
library) we need to keep using --with-generic-fpu on all our arm ports.


>to limit the effect of not using a FPU to just armel.
Assuming generic_fpu uses the standard floating point functionality in 
the C compiler it WILL use the FPU on Debian armhf and raspbian (but not 
on armel)


___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-16 Thread Reinhard Tartler
On Sun, Feb 16, 2014 at 7:28 PM, peter green  wrote:
> Reinhard Tartler wrote:
>>
>> I mean line 55
>
> I've just looked at ./configure --help and AIUI there are four possible
> values of --with-cpu for arm systems
>
> --with-cpu=generic_fpu  Use generic processor code with floating point
> arithmetic
> --with-cpu=generic_nofpu  Use generic processor code with fixed point
> arithmetic (p.ex. ARM, experimental)
> --with-cpu=generic_dither Use generic processor code with floating point
> arithmetic and dithering for 1to1 16bit decoding.
> --with-cpu=neon Use code optimized for ARM NEON SIMD engine
> (Cortex-A series)
> --with-cpu=arm_nofpuUse code optimized for ARM processors with fixed
> point arithmetic (experimental)
>
> We can't use the neon option because no arm port of debian gaurantees that
> neon will be available. So I belive until the no-fpu options are made
> compatible with audacious and/or some form of runtime detection is
> implemeted (either using ld.so.hwcaps to load different versions of the
> library) we need to keep using --with-generic-fpu on all our arm ports.

I see. In that case, I'll have to leave the package as it until
something along those lines is implemented.

Thanks for your input!

-- 
regards,
Reinhard

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Riku Voipio
Hi,

On Sun, Feb 16, 2014 at 07:41:02PM -0500, Reinhard Tartler wrote:
> On Sun, Feb 16, 2014 at 7:28 PM, peter green  wrote:
> > We can't use the neon option because no arm port of debian gaurantees that
> > neon will be available. So I belive until the no-fpu options are made
> > compatible with audacious and/or some form of runtime detection is
> > implemeted (either using ld.so.hwcaps to load different versions of the
> > library) we need to keep using --with-generic-fpu on all our arm ports.

Thanks Peter for explaining, this was how I ended up the suggestion
in the bug.

> I see. In that case, I'll have to leave the package as it until
> something along those lines is implemented.

Yes. The ideal solution is for the upstream to implement cpu runtime
detection that:

1) uses neon if it is available
2) falls back to fixed point if app requested 16-bit playback
3) finally falls back to generic fpu code if neither of above applies

Any packaging level workaround is going to be suboptimal for someone.

Riku

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Riku Voipio
> -- Forwarded message --
> From: Thomas Orgis 
> Date: Sun, Feb 16, 2014 at 5:46 AM
> Subject: Bug#738981: Switch to use generic_fpu for ARM
> To: 738...@bugs.debian.org
> 
 
> There is no runtime detection in mpg123 for this and at least for the
> decision of fixed or floating point decoding, it likely will never be
> as that is a very basic decision on the whole decoder code, not just
> some optimization. 

- snip-

> Something which is possible right now is to produce one libmpg123.so with
> the standard build to please users using slow ARM machines who just
> want plain 16 bit playback and produce one libmpg123_float.so for people
> using beefy machines and who are using audacious as a media player.

If is indeed the case that doing runtime detection of float/fixed code
is not feasible, then it might make sense to provide separate
libmpg123_float.so and libmpg123.so (on armel).

Applications needing flaots would then be linked against libmpg123_float.so
and all others against plain libmpg123.so.

This would mean that audacious would be slow on non-fpu hardware, but
people could still use fast command line mpg123. Which I guess is what
most people with slow non-fpu hardware would do anyways.

Now, if audacious is the only application that need floats, a more
brutal approach could be acceptable:

on armel: ship audacious-plugins without libmpg123 support, and libmpg123
as fixed point for other applications.

on armhf, ship libmpg123 as floats, with runtime autodection of NEON.

This leaves in limbo only on rasberry and others who have vfp but
no NEON. It would be interesting to hear what's the performance of
libmpg123 compiled with different options on rasberry to make sense
what would be the optimal solution for raspbian.

> I could implement a conversion step to floating point with the
> arm_nofpu decoder. That would make audacious work (although wasting
> precision on machines that have hardware floating point, or even NEON)
> and have the benefit of the command-line mpg123 still being fast with
> 16 bit output. A debian build targeting modern floating-point-capable
> hardware would use generic_fpu or better the neon decoder to begin with.

> Is there preference to have the faster decoder for debian without
> floating point hardware?

There is as many preferences as there is debian developers, I'm afraid.
My primarily preference is that binaries in debian work. Which with
previous state of libmpg123 and audacious was not the case - on any
hardware.

Riku

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Thomas Orgis
Am Mon, 17 Feb 2014 10:00:48 +0200
schrieb Riku Voipio : 

> Thanks Peter for explaining, this was how I ended up the suggestion
> in the bug.
> 
> > I see. In that case, I'll have to leave the package as it until
> > something along those lines is implemented.
> 
> Yes. The ideal solution is for the upstream to implement cpu runtime
> detection that:
> 
> 1) uses neon if it is available
> 2) falls back to fixed point if app requested 16-bit playback
> 3) finally falls back to generic fpu code if neither of above applies
> 
> Any packaging level workaround is going to be suboptimal for someone.

Isn't the approach for the linker to select libraries like libavcodec
on the table anymore? I see that I'll have to add that float conversion
code to keep the features along all builds, but selecting a vfp and
non-vfp variant for fixed point or floating point via the linker seems
like the most clean approach you are going to get.

NEON detection may come... but if we have linker selection, that would
be covered right now.

So ... can I get away with adding that stupid float conversion, so
folks have reasonable performance in likely applications of debian on
ARM, please? ;-)


Alrighty then,

Thomas

PS: I'll have to remove those "experimental" markings from the nofpu
variants in configure help. They are getting old.



signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-17 Thread Steve McIntyre
On Mon, Feb 17, 2014 at 11:43:16AM +0100, Thomas Orgis wrote:
>Am Mon, 17 Feb 2014 10:00:48 +0200
>schrieb Riku Voipio : 
>
>> Thanks Peter for explaining, this was how I ended up the suggestion
>> in the bug.
>> 
>> > I see. In that case, I'll have to leave the package as it until
>> > something along those lines is implemented.
>> 
>> Yes. The ideal solution is for the upstream to implement cpu runtime
>> detection that:
>> 
>> 1) uses neon if it is available
>> 2) falls back to fixed point if app requested 16-bit playback
>> 3) finally falls back to generic fpu code if neither of above applies
>> 
>> Any packaging level workaround is going to be suboptimal for someone.
>
>Isn't the approach for the linker to select libraries like libavcodec
>on the table anymore? I see that I'll have to add that float conversion
>code to keep the features along all builds, but selecting a vfp and
>non-vfp variant for fixed point or floating point via the linker seems
>like the most clean approach you are going to get.

Yes, you can do that - build several copies of the library and use the
hwcaps / auxv approach to pick the best one for the hardware at link
time.

>NEON detection may come... but if we have linker selection, that would
>be covered right now.

Yup.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
Is there anybody out there?

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread Thomas Orgis
> >> > I see. In that case, I'll have to leave the package as it until
> >> > something along those lines is implemented.

So, I got conversion to float implemented now and tested with the
generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
too;-) If you'd like to check the current snapshot of mpg123,

http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,

you hopefull will find that any normal build of mpg123 (unless
specifying --disable-float explicitly) now offers all usual formats. As
a bonus, I even implemented the 8 Bit A-Law output, which has always
just been a placeholder (nobody missed it, apparently).

I'd be interested on some timings of

mpg123 -t -e s16 test.mp3
mpg123 -t -e f32 test.mp3

with the various builds you'll do for the ARM variants. Best would be running

perl scripts/benchmark-cpu.pl src/mpg123 
convergence_-_points_of_view/*.mp3

with

http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz

as reference album, as mentioned on

http://mpg123.orgis.org/benchmarking.shtml

to be able to compare the performance of the code and machine to
others. This yields output like this:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
x86-64  3.394.05
generic 6.156.01
generic_dither  6.365.97

... or this, with --with-cpu=generic_fpu:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.146.29

(on a Core2Duo machine).

> Yes, you can do that - build several copies of the library and use the
> hwcaps / auxv approach to pick the best one for the hardware at link
> time.
> 
> >NEON detection may come... but if we have linker selection, that would
> >be covered right now.
> 
> Yup.

Seconding the second part: Linker selection it is. NEON runtime
detection just isn't fun in user code.

The bright side: If the multiple builds are setup and tested, I can
safely release mpg123-1.19.0 with the changes and we finally have this
settled.


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread peter green

Thomas Orgis wrote:

So, I got conversion to float implemented now and tested with the
generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
too;-) If you'd like to check the current snapshot of mpg123,

http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,

you hopefull will find that any normal build of mpg123 (unless
specifying --disable-float explicitly) now offers all usual formats. As
a bonus, I even implemented the 8 Bit A-Law output, which has always
just been a placeholder (nobody missed it, apparently).

I'd be interested on some timings of

mpg123 -t -e s16 test.mp3
mpg123 -t -e f32 test.mp3

with the various builds you'll do for the ARM variants. Best would be running

perl scripts/benchmark-cpu.pl src/mpg123 
convergence_-_points_of_view/*.mp3

with

http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz

as reference album, as mentioned on

http://mpg123.orgis.org/benchmarking.shtml

to be able to compare the performance of the code and machine to
others. This yields output like this:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
x86-64  3.394.05
generic 6.156.01
generic_dither  6.365.97

... or this, with --with-cpu=generic_fpu:

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.146.29

(on a Core2Duo machine)
Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out 
of date) debian sid armhf chroot I tested with "perl 
scripts/benchmark-cpu.pl src/mpg123 convergence_-_points_of_view/*.mp3" 
in the following configurations.


Built with ./configure --with-cpu=arm_nofpu
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 30.36   34.26

Built with ./configure --with-cpu=generic_fpu
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 148.66  138.49

Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON0.030.04

I found the neon result unbelivable so I decided to run the test program 
you mentioned to me in my private mail asking about how to run the 
benchmarks.
root@plugwash:/mpg123-test# 
LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ 
perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123


 Layer 1 
--> 16 bit signed integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.510105e-01 (FAIL) maxdiff=5.277658e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
--> 32 bit integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.513207e-01 (FAIL) maxdiff=4.787517e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
--> 24 bit integer output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.494715e-01 (FAIL) maxdiff=4.984906e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
--> 32 bit floating point output
fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
fl4.bit:RMS=1.137037e-01 (FAIL) maxdiff=4.459082e-01 (FAIL)
fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)

 Layer 2 
--> 16 bit signed integer output
fl10.bit:   RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
fl11.bit:   RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
fl12.bit:   RMS=3.528948e

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-20 Thread Thomas Orgis
I'm adding the mpg123 assembly guru to the CC list, as I imagine he
would be interested in why his ARM NEON code doesn't work on a Cortex
A8 chip here. Needless to say, it worked before (on other systems).
Also, the precision of the arm_nofpu code does not look right. This
topic is now shifting towards mpg123 development, but as long as it's
only on this debian platform that it's not working, I guess it is
on-topic for debian, too.

Am Fri, 21 Feb 2014 01:29:40 +
schrieb peter green : 

> Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat out 
> of date) debian sid armhf chroot

> Built with ./configure --with-cpu=arm_nofpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> ARM 30.36   34.26
> 
> Built with ./configure --with-cpu=generic_fpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> generic 148.66  138.49

That seems to prove a point about trying to use the nofpu build. How
does --with-cpu=generic_nofpu stack up for this machine? Also regarding
the compliance test later on ...

> Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> NEON0.030.04

Yeah, as we see

> Illegal instruction

this is most interesting. I refer to Taihei, as I don't have a NEON
setup at hand (need to get a debian chroot going on my phone).

> root@plugwash:/mpg123-test# 
> LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/ 
> perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
> 
>  Layer 1 
> --> 16 bit signed integer output
> fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)

That doesn't look pretty to me. Does it _sound_ like (metal) music (in
case no audio chip there, decode to WAV with -w output.wav, I happily
accept snippets, limit number of frames via -n 500).

> root@plugwash:/mpg123-test# 
> LD_LIBRARY_PATH=/mpg123-20140220132548-generic_fpu/src/libmpg123/.libs/ 
> perl compliance.pl /mpg123-20140220132548-generic_fpu/src/mpg123
> 
>  Layer 1 
> --> 16 bit signed integer output
> fl1.bit:RMS=8.683659e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl2.bit:RMS=8.686681e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl3.bit:RMS=8.737660e-06 (PASS) maxdiff=1.525879e-05 (PASS)

Yes, that is better. Can you compare --with-cpu=generic_nofpu to
isolate this to the assembly version for ARM? This is how it looks with
generic_nofpu on my box:

sh$ perl ../test/compliance.pl src/mpg123

 Layer 1 
--> 16 bit signed integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 32 bit integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 24 bit integer output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.604723e-05 (PASS)
--> 32 bit floating point output
fl1.bit:RMS=7.936754e-06 (PASS) maxdiff=2.533197e-05 (PASS)
fl2.bit:RMS=7.837830e-06 (PASS) maxdiff=2.342463e-05 (PASS)
fl3.bit:RMS=7.928321e-06 (PASS) maxdiff=2.485514e-05 (PASS)
fl4.bit:RMS=7.784658e-06 (PASS) maxdiff=2.521276e-05 (PASS)
fl5.bit:RMS=1.677634e-05 (LIMITED) maxdiff=6.681681e-05 (FAIL)
fl6.bit:RMS=1.071518e-05 (LIMITED) maxdiff=4.619360e-05 (PASS)
fl7.bit:RMS=7.469690e-06 (PASS) maxdiff=2.658367e-05 (PASS)
fl8.bit:RMS=7.923985e-06 (PASS) maxdiff=2.60472

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-21 Thread Lennart Sorensen
On Fri, Feb 21, 2014 at 01:29:40AM +, peter green wrote:
> Thomas Orgis wrote:
> >So, I got conversion to float implemented now and tested with the
> >generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
> >too;-) If you'd like to check the current snapshot of mpg123,
> >
> > http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,
> >
> >you hopefull will find that any normal build of mpg123 (unless
> >specifying --disable-float explicitly) now offers all usual formats. As
> >a bonus, I even implemented the 8 Bit A-Law output, which has always
> >just been a placeholder (nobody missed it, apparently).
> >
> >I'd be interested on some timings of
> >
> > mpg123 -t -e s16 test.mp3
> > mpg123 -t -e f32 test.mp3
> >
> >with the various builds you'll do for the ARM variants. Best would be running
> >
> > perl scripts/benchmark-cpu.pl src/mpg123 
> > convergence_-_points_of_view/*.mp3
> >
> >with
> >
> > http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz
> >
> >as reference album, as mentioned on
> >
> > http://mpg123.orgis.org/benchmarking.shtml
> >
> >to be able to compare the performance of the code and machine to
> >others. This yields output like this:
> >
> >#mpg123 benchmark (user CPU time in seconds for decoding)
> >#decoder t_s16/s t_f32/s
> >x86-64   3.394.05
> >generic  6.156.01
> >generic_dither   6.365.97
> >
> >... or this, with --with-cpu=generic_fpu:
> >
> >#mpg123 benchmark (user CPU time in seconds for decoding)
> >#decoder t_s16/s t_f32/s
> >generic  6.146.29
> >
> >(on a Core2Duo machine)
> Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat
> out of date) debian sid armhf chroot I tested with "perl
> scripts/benchmark-cpu.pl src/mpg123
> convergence_-_points_of_view/*.mp3" in the following configurations.
> 
> Built with ./configure --with-cpu=arm_nofpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> ARM 30.36   34.26
> 
> Built with ./configure --with-cpu=generic_fpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> generic 148.66  138.49
> 
> Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decodert_s16/s t_f32/s
> NEON0.030.04
> 
> I found the neon result unbelivable so I decided to run the test
> program you mentioned to me in my private mail asking about how to
> run the benchmarks.
> root@plugwash:/mpg123-test#
> LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/
> perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
> 
>  Layer 1 
> --> 16 bit signed integer output
> fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:RMS=1.510105e-01 (FAIL) maxdiff=5.277658e-01 (FAIL)
> fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 32 bit integer output
> fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:RMS=1.513207e-01 (FAIL) maxdiff=4.787517e-01 (FAIL)
> fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 24 bit integer output
> fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:RMS=1.494715e-01 (FAIL) maxdiff=4.984906e-01 (FAIL)
> fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 32 bit floating point output
> fl1.bit:RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:RMS=1.137037e-01 (FAIL) maxdiff=4.459082e-01 (FAIL)
> fl5.bit:RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:RMS=2.211659e-02 (FAIL) max

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-22 Thread Thomas Orgis
Am Fri, 21 Feb 2014 11:25:12 -0500
schrieb "Lennart Sorensen" : 

> Testing with the neon build I get a return code of 4, and it seems to
> be failing to run.  It was a pain to even get it to compile.  Using just
> the configure option, the assembler complained about the NEON instructions
> being invalid for the chosen cpu type.  Adding -mfpu=neon to the CFLAGS
> made it able to compile, but it still crashes with illegal instruction.
> I tried with CFLAGS set to -mcpu=cortex-a15 -mfpu=neon, and that still
> gives illegal instruction when running it.

This is weird. What happened in debian side since
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35 ? We have
the current code working on this setup:

device: iPod touch 4G with iOS 5.1.1
toolchain: gcc 4.2.1(from Xcode 3.2.6) on OSX 10.6.8, clang 3.3(from Xcode 
5.0.2) on OSX 10.9.1 (double checked)
configure script option: --host=armv7-apple-darwin --with-cpu=arm_nofpu[neon] 
--with-audio=dummy --disable-shared --enable-static [--enable-int-quality]

Taihei also just checked the compliance of the decoder choices
including NEON. That illegal instruction ... care to fire up the
debugger to tell us where it actually occurs? The NEON assembly is
written as plain assembler input (cpp + as), you can see the
instructions we use right there and it doesn't differ from iOS.

> It might be a good idea to have the benchmark script actuall check the
> return code of system()

Yes.

> I was building and testing under Debian armhf sid.
> gcc (Debian 4.8.2-16) 4.8.2
> 
> CPU is a dual Cortex-A15 1.5GHz (TI OMAP 57xx).


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-24 Thread Lennart Sorensen
On Sat, Feb 22, 2014 at 10:05:35AM +0100, Thomas Orgis wrote:
> Am Fri, 21 Feb 2014 11:25:12 -0500
> schrieb "Lennart Sorensen" : 
> 
> > Testing with the neon build I get a return code of 4, and it seems to
> > be failing to run.  It was a pain to even get it to compile.  Using just
> > the configure option, the assembler complained about the NEON instructions
> > being invalid for the chosen cpu type.  Adding -mfpu=neon to the CFLAGS
> > made it able to compile, but it still crashes with illegal instruction.
> > I tried with CFLAGS set to -mcpu=cortex-a15 -mfpu=neon, and that still
> > gives illegal instruction when running it.
> 
> This is weird. What happened in debian side since
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35 ? We have
> the current code working on this setup:
> 
> device: iPod touch 4G with iOS 5.1.1
> toolchain: gcc 4.2.1(from Xcode 3.2.6) on OSX 10.6.8, clang 3.3(from Xcode 
> 5.0.2) on OSX 10.9.1 (double checked)
> configure script option: --host=armv7-apple-darwin --with-cpu=arm_nofpu[neon] 
> --with-audio=dummy --disable-shared --enable-static [--enable-int-quality]
> 
> Taihei also just checked the compliance of the decoder choices
> including NEON. That illegal instruction ... care to fire up the
> debugger to tell us where it actually occurs? The NEON assembly is
> written as plain assembler input (cpp + as), you can see the
> instructions we use right there and it doesn't differ from iOS.
> 
> > It might be a good idea to have the benchmark script actuall check the
> > return code of system()
> 
> Yes.
> 
> > I was building and testing under Debian armhf sid.
> > gcc (Debian 4.8.2-16) 4.8.2
> > 
> > CPU is a dual Cortex-A15 1.5GHz (TI OMAP 57xx).
> 
> 
> Alrighty then,

Any help from this:

(gdb) run
Starting program: /tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/01\ -\ Bleed.mp3 
/convergence_-_points_of_view/02\ -\ Strike\ the\ end.mp3 
/convergence_-_points_of_view/03\ -\ Listen.mp3 
/convergence_-_points_of_view/04\ -\ Six\ feet\ under.mp3 
/convergence_-_points_of_view/05\ -\ Always\ the\ same.mp3 
/convergence_-_points_of_view/06\ -\ Breath.mp3 
/convergence_-_points_of_view/07\ -\ Vanished\ memories.mp3 
/convergence_-_points_of_view/08\ -\ Silent.mp3 
/convergence_-_points_of_view/09\ -\ Nothing\ else.mp3 
/convergence_-_points_of_view/10\ -\ Train\ to\ leave.mp3

Program received signal SIGILL, Illegal instruction.
0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
48  vpush   {q4-q7}
(gdb) where
#0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
#1  0xb6fab71c in INT123_synth_1to1_stereo_neon (bandPtr_l=, 
bandPtr_r=0x36400, fr=0x291d8) at synth.c:892
#2  0xb6fb8328 in INT123_do_layer3 (fr=0x291d8) at layer3.c:2060
#3  0xb6fa725e in decode_the_frame (fr=fr@entry=0x291d8) at libmpg123.c:699
#4  0xb6fa823e in mpg123_decode_frame_64 (mh=0x291d8, num=num@entry=0x28490 
, audio=audio@entry=0xbefff8e8, bytes=bytes@entry=0xbefff8f0) at 
libmpg123.c:838
#5  0x00012fce in play_frame () at mpg123.c:667
#6  0xb2f0 in main (sys_argc=, sys_argv=) at 
mpg123.c:1177

-- 
Len Sorensen

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Mon, 24 Feb 2014 12:27:36 -0500
schrieb "Lennart Sorensen" : 

> Any help from this:
> 
> Program received signal SIGILL, Illegal instruction.
> 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
> 48  vpush   {q4-q7}

What the ... ? This does not make sense. I (and actually, with "I", I
mean Taihei who knows more about ARM assembly;-). The vpush pseudo
instruction should be harmless in our context. Quote from Taihei:

I don't know why. Actually vpush is a pseudo instruction, and
vpush {q4-q7} should be assembled into vstmdb sp!, {d8-d15}
(machine code is ed2d8b10). I'm curious how their assembler
(gnu as?) assembles into.


Well ... what does

sh$ objdump -S src/libmpg132/.libs/dct64_neon.o

say? Any hint from the debian ARM folks with experience about funny
behaviour for stand-alone assembly files? I also wonder if this is
generally broken on debian (since certain toolchain version) or on
certain CPUs only. I repeat: This code worked before:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667653#35


Alrighty then,

Thomas



signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Taihei Momma
Wait, code alignment issue?

#0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
 ^
not a multiple of 4.

I've just committed a fix to mpg123 repository to align the function by 4 
bytes. I supposed this was fixed before, but actually dct64 part was omitted: 
http://www.mpg123.de/cgi-bin/scm/mpg123?view=revision&revision=3003

I hope this should fix the SIGILL issue.

Regards,
Taihei Momma
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Tue, 25 Feb 2014 17:37:41 +0900
schrieb Taihei Momma : 

> #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
>  ^
> not a multiple of 4.

Oh, d'oh! It could be that simple.

> I've just committed a fix to mpg123 repository 

I generated a new snapshot,

http://mpg123.org/snapshot/mpg123-20140225111416.tar.bz2 ,

and also attached the patch for the rather small change that hopefully
has a big effect. Care to test this?


Alrighty then,

Thomas

-- 
Thomas Orgis - Source Mage GNU/Linux Developer (http://www.sourcemage.org)
OrgisNetzOrganisation ---)=- http://orgis.org
GPG public key D446D524: http://thomas.orgis.org/public_key
Fingerprint: 7236 3885 A742 B736 E0C8 9721 9B4C 52BC D446 D524
Index: src/libmpg123/dct64_neon_float.S
===
--- src/libmpg123/dct64_neon_float.S	(Revision 3514)
+++ src/libmpg123/dct64_neon_float.S	(Revision 3515)
@@ -44,6 +44,7 @@
 	.word 1060439283
 	.word 1060439283
 	.globl ASM_NAME(dct64_real_neon)
+	ALIGN4
 ASM_NAME(dct64_real_neon):
 	vpush		{q4-q7}
 
Index: src/libmpg123/dct64_neon.S
===
--- src/libmpg123/dct64_neon.S	(Revision 3514)
+++ src/libmpg123/dct64_neon.S	(Revision 3515)
@@ -44,6 +44,7 @@
 	.word 1060439283
 	.word 1060439283
 	.globl ASM_NAME(dct64_neon)
+	ALIGN4
 ASM_NAME(dct64_neon):
 	vpush		{q4-q7}
 


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Lennart Sorensen
On Tue, Feb 25, 2014 at 11:18:50AM +0100, Thomas Orgis wrote:
> Am Tue, 25 Feb 2014 17:37:41 +0900
> schrieb Taihei Momma : 
> 
> > #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
> >  ^
> > not a multiple of 4.
> 
> Oh, d'oh! It could be that simple.
> 
> > I've just committed a fix to mpg123 repository 
> 
> I generated a new snapshot,
> 
>   http://mpg123.org/snapshot/mpg123-20140225111416.tar.bz2 ,
> 
> and also attached the patch for the rather small change that hopefully
> has a big effect. Care to test this?
> 
> 
> Alrighty then,
> 
> Thomas
> 
> -- 
> Thomas Orgis - Source Mage GNU/Linux Developer (http://www.sourcemage.org)
> OrgisNetzOrganisation ---)=- http://orgis.org
> GPG public key D446D524: http://thomas.orgis.org/public_key
> Fingerprint: 7236 3885 A742 B736 E0C8 9721 9B4C 52BC D446 D524

> Index: src/libmpg123/dct64_neon_float.S
> ===
> --- src/libmpg123/dct64_neon_float.S  (Revision 3514)
> +++ src/libmpg123/dct64_neon_float.S  (Revision 3515)
> @@ -44,6 +44,7 @@
>   .word 1060439283
>   .word 1060439283
>   .globl ASM_NAME(dct64_real_neon)
> + ALIGN4
>  ASM_NAME(dct64_real_neon):
>   vpush   {q4-q7}
>  
> Index: src/libmpg123/dct64_neon.S
> ===
> --- src/libmpg123/dct64_neon.S(Revision 3514)
> +++ src/libmpg123/dct64_neon.S(Revision 3515)
> @@ -44,6 +44,7 @@
>   .word 1060439283
>   .word 1060439283
>   .globl ASM_NAME(dct64_neon)
> + ALIGN4
>  ASM_NAME(dct64_neon):
>   vpush   {q4-q7}
>  

root@rceng05:/mpg123-20140225111416# gdb --args 
/tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/*mp3
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
...
Reading symbols from /tmp/mpginst/usr/local/bin/mpg123...done.
(gdb) run
Starting program: /tmp/mpginst/usr/local/bin/mpg123 -e s16 -q --cpu NEON -t 
/convergence_-_points_of_view/01\ -\ Bleed.mp3 
/convergence_-_points_of_view/02\ -\ Strike\ the\ end.mp3 
/convergence_-_points_of_view/03\ -\ Listen.mp3 
/convergence_-_points_of_view/04\ -\ Six\ feet\ under.mp3 
/convergence_-_points_of_view/05\ -\ Always\ the\ same.mp3 
/convergence_-_points_of_view/06\ -\ Breath.mp3 
/convergence_-_points_of_view/07\ -\ Vanished\ memories.mp3 
/convergence_-_points_of_view/08\ -\ Silent.mp3 
/convergence_-_points_of_view/09\ -\ Nothing\ else.mp3 
/convergence_-_points_of_view/10\ -\ Train\ to\ leave.mp3

Program received signal SIGILL, Illegal instruction.
0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
49  vpush   {q4-q7}
(gdb) disassemble
Dump of assembler code for function INT123_dct64_neon:
   0xb6fb9330 <+0>: vpush   {d8-d15}
   0xb6fb9334 <+4>: sub r3, pc, #140; 0x8c
   0xb6fb9338 <+8>: vld1.32 {d0-d3}, [r2]!
   0xb6fb933c <+12>:vld1.32 {d4-d7}, [r2]!
   0xb6fb9340 <+16>:vld1.32 {d8-d11}, [r2]!
   0xb6fb9344 <+20>:vld1.32 {d12-d15}, [r2]
   0xb6fb9348 <+24>:vld1.32 {d24-d27}, [r3 :128]!
   0xb6fb934c <+28>:vld1.32 {d28-d31}, [r3 :128]!
   0xb6fb9350 <+32>:vrev64.32   q4, q4
   0xb6fb9354 <+36>:vrev64.32   q5, q5
   0xb6fb9358 <+40>:vrev64.32   q6, q6
   0xb6fb935c <+44>:vrev64.32   q7, q7
   0xb6fb9360 <+48>:vswpd8, d9
   0xb6fb9364 <+52>:vswpd10, d11
   0xb6fb9368 <+56>:vswpd12, d13
   0xb6fb936c <+60>:vswpd14, d15
   0xb6fb9370 <+64>:vsub.f32q8, q0, q7
   0xb6fb9374 <+68>:vsub.f32q9, q1, q6
   0xb6fb9378 <+72>:vsub.f32q10, q2, q5
   0xb6fb937c <+76>:vsub.f32q11, q3, q4
   0xb6fb9380 <+80>:vadd.f32q0, q0, q7
   0xb6fb9384 <+84>:vadd.f32q1, q1, q6
   0xb6fb9388 <+88>:vadd.f32q2, q2, q5
   0xb6fb938c <+92>:vadd.f32q3, q3, q4
   0xb6fb9390 <+96>:vmul.f32q4, q8, q12
   0xb6fb9394 <+100>:   vmul.f32q5, q9, q13
   0xb6fb9398 <+104>:   vmul.f32q6, q10, q14
   0xb6fb939c <+108>:   vmul.f32q7, q11, q15
   0xb6fb93a0 <+112>:   vld1.32 {d24-d27}, [r3 :128]!
   0xb6fb93a4 <+116>:   vld1.32 {d28-d31}, [r3 :128]
   0xb6fb93a8 <+120>:   vrev64.32   q2, q2
   0xb6fb93ac <+124>:   vrev64.32   q3, q3
   0xb6fb93b0 <+128>:   vrev64.32   q6, q6
   0xb6fb93b4 <+132>:   vrev64.32   q7, q7
   0xb6fb93b8 <+136>:   vswpd4, d5
   0xb6fb93bc <+140>:   vswpd6, d7
   0xb6fb93c0 <+144>:   vswpd12, d13
   0xb6fb93c4 <+148>:

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Thomas Orgis
Am Tue, 25 Feb 2014 11:20:06 -0500
schrieb "Lennart Sorensen" : 

> On Tue, Feb 25, 2014 at 11:18:50AM +0100, Thomas Orgis wrote:
> > Am Tue, 25 Feb 2014 17:37:41 +0900
> > schrieb Taihei Momma : 
> > 
> > > #0  0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:48
> > >  ^
> > > not a multiple of 4.

> > Index: src/libmpg123/dct64_neon.S
> > ===
> > --- src/libmpg123/dct64_neon.S  (Revision 3514)
> > +++ src/libmpg123/dct64_neon.S  (Revision 3515)
> > @@ -44,6 +44,7 @@
> > .word 1060439283
> > .word 1060439283
> > .globl ASM_NAME(dct64_neon)
> > +   ALIGN4
> >  ASM_NAME(dct64_neon):
> > vpush   {q4-q7}
> >  

Now ... 

> Program received signal SIGILL, Illegal instruction.
> 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
> 49  vpush   {q4-q7}

That address didn't change. I suggest we better align the function
symbol itself, seems like we accidentally missed by one line:

ALIGN4
.globl ASM_NAME(dct64_neon)
ASM_NAME(dct64_neon):

looks better to me (at least that's how we did it for all other
functions;-). Care to test the current

http://mpg123.org/snapshot/mpg123-20140225173909.tar.bz2 ?

Sorry for the inconvenience, but I don't have a setup handy to test
this myself.


Alrighty then,

Thomas



signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Taihei Momma
On 2014/02/26, at 1:44, Thomas Orgis wrote:

> That address didn't change.


Well, the function itself is properly aligned (so my fix didn't take effect 
anyway).
> 0xb6fb9330 <+0>: vpush   {d8-d15}
> 0xb6fb9334 <+4>: sub r3, pc, #140; 0x8c

But the processor decoded the first instruction as 2-byte (thumb?), then 
increased PC by 2. And it raised SIGILL at
> 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49


So, I guess
 - assembler emits a bad machine code for vpush
or
 - kernel is not configured properly to run vfp instructions

I'd like to look into objdump -d result to check the machine code. 

Regards,
Taihei Momma
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread Lennart Sorensen
On Wed, Feb 26, 2014 at 01:59:12AM +0900, Taihei Momma wrote:
> On 2014/02/26, at 1:44, Thomas Orgis wrote:
> 
> > That address didn't change.
> 
> 
> Well, the function itself is properly aligned (so my fix didn't take effect 
> anyway).
> > 0xb6fb9330 <+0>: vpush   {d8-d15}
> > 0xb6fb9334 <+4>: sub r3, pc, #140; 0x8c
> 
> But the processor decoded the first instruction as 2-byte (thumb?), then 
> increased PC by 2. And it raised SIGILL at
> > 0xb6fb9332 in INT123_dct64_neon () at dct64_neon.S:49
> 
> 
> So, I guess
>  - assembler emits a bad machine code for vpush
> or
>  - kernel is not configured properly to run vfp instructions

Is that a kernel option?  I wouldn't have thought armhf would run without
that (unless no floating point code is every being run).

Well the kernel that is running has this:

CONFIG_VFP=y
CONFIG_VFPv3=y
CONFIG_NEON=y

> I'd like to look into objdump -d result to check the machine code. 

Remember Debian armhf is -mthumb by default.  Any assembly code needs
to be properly flagged with .arm, or .syntax unified or whatever is
appropriate (still trying to wrap my head around this myself).  That is
if the assembly code is written in arm rather than thumb2 assembly.
At least that's my understanding so far.  If I add .syntax unified and
.fpu neon, then I no longer have to pass -mfpu neon to the CFLAGS to
get it to compile, but it still fails.  I am just about to test the new
version to see if that helps anything.

The disassembly in gcc shows 4 byte alignment, but the address of the
illegal instruction is 2 bytes past the vpush instruction's address.

In fact if I add -marm to the CFLAGS, then it seems to work, so the .S
files are not being flagged correctly as being arm code, or they are
missing thumb interworking bits or something.

root@rceng05:/mpg123-20140225173909# perl scripts/benchmark-cpu.pl src/mpg123 
/convergence_-_points_of_view/*mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON7.527.65

That was with CFLAGS=-g -mcpu=cortex-a15 -mfpu=neon -marm

Without -marm, it crashes with illegal instruction.  But since -mthumb
is the default on armhf, then passing -marm seems wrong.

-- 
Len Sorensen

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-25 Thread peter green

Taihei Momma wrote:

But the processor decoded the first instruction as 2-byte (thumb?),

Note that debian armhf builds C code in thumb2 mode by default.

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-02-28 Thread Taihei Momma
OK, after some investigation with armhf cross environment and qemu, finally the 
current mpg123 svn (r3517) should work (including arm_nofpu decoder).

The point is .type directive. Without this directive, a linker doesn't 
distinguish arm functions from thumb functions, and interworking doesn't work 
properly.

Regards,
Taihei Momma
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-01 Thread Thomas Orgis
Am Sat, 01 Mar 2014 01:00:02 +0900
schrieb Taihei Momma : 

> OK, after some investigation with armhf cross environment and qemu, finally 
> the current mpg123 svn (r3517) should work (including arm_nofpu decoder).
> 
> The point is .type directive. Without this directive, a linker doesn't 
> distinguish arm functions from thumb functions, and interworking doesn't work 
> properly.
>
Great! So, folks, please check that

http://mpg123.de/snapshot/mpg123-2014030100.tar.bz2

does the trick with all decoders now. Performance numbers from the
benchmark script would be nice. I'll release 1.18.1 after confirmation
and we finally can settle this.


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-01 Thread Thomas Orgis
Am Sat, 1 Mar 2014 09:56:46 +0100
schrieb Thomas Orgis : 

> Great! So, folks, please check that
> 
>   http://mpg123.de/snapshot/mpg123-2014030100.tar.bz2
> 
> does the trick with all decoders now. Performance numbers from the
> benchmark script would be nice. I'll release 1.18.1 after confirmation

Sorry, I meant 1.18.2, of course. Also, I fixed the benchmark script to
check the return value with

http://mpg123.de/snapshot/mpg123-20140301101020.tar.bz2

just in case things are still broken.


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-02 Thread Thomas Orgis
Am Sat, 01 Mar 2014 01:00:02 +0900
schrieb Taihei Momma : 

> OK, after some investigation with armhf cross environment and qemu, finally 
> the current mpg123 svn (r3517) should work 

After Tahei didn't stop at this (big thanks from here!), we got a new
snapshot,

http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,

that will hopefully become mpg123 1.19.0 soon (not 1.18.x
because of feature additions regarding this very debian issue). The
main points:

- float output with all decoders (also arm_nofpu)
- ARM decoders (esp. NEON) working with debian toolchain
- new --with-cpu=arm_fpu choice with runtime detection to switch
  between NEON or normal FPU

So, the number of builds for optimal treatment of differing platforms
reduces to two:

1. --with-cpu=arm_nofpu
2. --with-cpu=arm_fpu

I hope we can all be happy about that. I'd also be glad to get some
confirmation from debian that it really works now. Release will be
imminent, then.

Thanks for staying with us with all the chattering about this ...


Alrighty then,

Thomas



signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-02 Thread Reinhard Tartler
On Sun, Mar 2, 2014 at 6:02 AM, Thomas Orgis  wrote:
> Am Sat, 01 Mar 2014 01:00:02 +0900
> schrieb Taihei Momma :
>
>> OK, after some investigation with armhf cross environment and qemu, finally 
>> the current mpg123 svn (r3517) should work
>
> After Tahei didn't stop at this (big thanks from here!), we got a new
> snapshot,
>
> http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,
>
> that will hopefully become mpg123 1.19.0 soon (not 1.18.x
> because of feature additions regarding this very debian issue). The
> main points:
>
> - float output with all decoders (also arm_nofpu)
> - ARM decoders (esp. NEON) working with debian toolchain
> - new --with-cpu=arm_fpu choice with runtime detection to switch
>   between NEON or normal FPU
>
> So, the number of builds for optimal treatment of differing platforms
> reduces to two:
>
> 1. --with-cpu=arm_nofpu
> 2. --with-cpu=arm_fpu
>
> I hope we can all be happy about that. I'd also be glad to get some
> confirmation from debian that it really works now. Release will be
> imminent, then.

That sounds like if the mpg123 package should use:

on armel: --with-cpu=arm_nofpu
on armhf: --with-cpu=arm_fpu


Does this make sense to everybody?

> Thanks for staying with us with all the chattering about this ...

Thank you for handling this issue (and basically every issue other
that popped out in Debian for mpg123) so quickly!


-- 
regards,
Reinhard

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Riku Voipio
On Sun, Mar 02, 2014 at 12:02:40PM +0100, Thomas Orgis wrote:
> Am Sat, 01 Mar 2014 01:00:02 +0900
> schrieb Taihei Momma : 
> 
> > OK, after some investigation with armhf cross environment and qemu, finally 
> > the current mpg123 svn (r3517) should work 
> 
> After Tahei didn't stop at this (big thanks from here!), we got a new
> snapshot,
> 
>   http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,
> 
> that will hopefully become mpg123 1.19.0 soon (not 1.18.x
> because of feature additions regarding this very debian issue). The
> main points:
> 
> - float output with all decoders (also arm_nofpu)
> - ARM decoders (esp. NEON) working with debian toolchain
> - new --with-cpu=arm_fpu choice with runtime detection to switch
>   between NEON or normal FPU
> 
> So, the number of builds for optimal treatment of differing platforms
> reduces to two:
> 
> 1. --with-cpu=arm_nofpu
> 2. --with-cpu=arm_fpu

Awesome work!

> I hope we can all be happy about that. I'd also be glad to get some
> confirmation from debian that it really works now. Release will be
> imminent, then.

Here's some test results

On a cortex-a15 system arm_nofpu: (ubuntu armhf)

#decodert_s16/s t_f32/s
ARM 24.22   25.02

On a cortex-a15 system arm_fpu: (ubuntu armhf)

#decodert_s16/s t_f32/s
NEON14.33   14.90
generic 36.25   27.46
generic_dither  39.52   27.44

the A15 core was downclocked and cpufreq disabled to ensure
stable results

ARMv5 system arm_nofpu (debian armel)

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 49.12   63.17

ARMv5 system arm_fpu (debian sid)

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 491.75  468.37
generic_dither  535.50  468.38

armel is with softfloat emulation, so horrible times were expected - 
the main point of that last run was to verify that NEON runtime
detection works (Seems so).

Riku

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Lennart Sorensen
On Sun, Mar 02, 2014 at 12:02:40PM +0100, Thomas Orgis wrote:
> After Tahei didn't stop at this (big thanks from here!), we got a new
> snapshot,
> 
>   http://mpg123.org/snapshot/mpg123-20140302115523.tar.bz2 ,
> 
> that will hopefully become mpg123 1.19.0 soon (not 1.18.x
> because of feature additions regarding this very debian issue). The
> main points:
> 
> - float output with all decoders (also arm_nofpu)
> - ARM decoders (esp. NEON) working with debian toolchain
> - new --with-cpu=arm_fpu choice with runtime detection to switch
>   between NEON or normal FPU
> 
> So, the number of builds for optimal treatment of differing platforms
> reduces to two:
> 
> 1. --with-cpu=arm_nofpu
> 2. --with-cpu=arm_fpu
> 
> I hope we can all be happy about that. I'd also be glad to get some
> confirmation from debian that it really works now. Release will be
> imminent, then.
> 
> Thanks for staying with us with all the chattering about this ...

I now see (with arm_fpu of course, which it seems to have auto detected 
correctly):

perl scripts/benchmark-cpu.pl `which mpg123` /convergence_-_points_of_view/*mp3
Found 3 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
NEON7.587.84
generic 19.23   14.56
generic_dither  20.97   14.54

Looks good.  I ran it 3 times and they were very close, and the cpu pinned
itself at 1.5GHz during the test, and went back to 1.0GHz when idle again.
One of the two cores was very bored though with nothing to do.

-- 
Len Sorensen

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread Lennart Sorensen
On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:
> That sounds like if the mpg123 package should use:
> 
> on armel: --with-cpu=arm_nofpu
> on armhf: --with-cpu=arm_fpu
> 
> 
> Does this make sense to everybody?

I think so.  armhf's current debian rules automatically picked arm_fpu
with the new version's configure script, so at least that one doesn't
seem to need any explicit help.  armel might though.

> Thank you for handling this issue (and basically every issue other
> that popped out in Debian for mpg123) so quickly!

-- 
Len Sorensen

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-03 Thread peter green

On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:


That sounds like if the mpg123 package should use:
 
on armel: --with-cpu=arm_nofpu

on armhf: --with-cpu=arm_fpu


Does this make sense to everybody?
  
Seems sane to me. armv7 devices without neon are relatively uncommon so 
while it's important that they are supported it's IMO not vitally 
important to squeeze out every last drop of performance from them.


I wonder what we should use on raspbian? I haven't tested on a Pi yet 
but it seems that on all tests i've seen so-far the generic fpu code is 
quite a bit slower than the arm nofpu code. Is there any quality 
difference from using a fpu vs nonfpu decoder? If so how much 
performance degredation do you beleive should be accepted in exchange 
for that quality improvement.


Lennart Sorensen wrote:

I think so.  armhf's current debian rules automatically picked arm_fpu
with the new version's configure script, so at least that one doesn't
seem to need any explicit help.  armel might though.
  
IMO it's often better to be explicit about this sort of thing. While 
upstreams defaults may align with debian armhf's requirements at the 
present time and on the present build hardware such defaults are subject 
to change either as a result of upstream changes in new versions or as a 
result of different build hardware.


___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 04 Mar 2014 02:59:44 +
schrieb peter green : 

> Is there any quality 
> difference from using a fpu vs nonfpu decoder?

Technically, there is. See those numbers for generic fpu and non-fpu
code with and without --enable-int-quality given to configure (enables
better rounding for small performance hit, you might want to activate
that by default).

In numbers, the difference is this:

==> src/mpg123.fpu_accurate.compliance.txt <==

 Layer 3 
--> 16 bit signed integer output
compl.bit:  RMS=4.300914e-06 (PASS) maxdiff=7.688999e-06 (PASS)
--> 32 bit integer output
compl.bit:  RMS=2.152784e-08 (PASS) maxdiff=1.769513e-07 (PASS)
--> 24 bit integer output
compl.bit:  RMS=4.206462e-08 (PASS) maxdiff=1.788139e-07 (PASS)
--> 32 bit floating point output
compl.bit:  RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS)

==> src/mpg123.fpu.compliance.txt <==

 Layer 3 
--> 16 bit signed integer output
compl.bit:  RMS=8.907757e-06 (LIMITED) maxdiff=1.531839e-05 (PASS)
--> 32 bit integer output
compl.bit:  RMS=2.152589e-08 (PASS) maxdiff=1.769513e-07 (PASS)
--> 24 bit integer output
compl.bit:  RMS=4.205495e-08 (PASS) maxdiff=1.788139e-07 (PASS)
--> 32 bit floating point output
compl.bit:  RMS=2.153045e-08 (PASS) maxdiff=1.769513e-07 (PASS)

==> src/mpg123.nofpu_accurate.compliance.txt <==

 Layer 3 
--> 16 bit signed integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
--> 32 bit integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
--> 24 bit integer output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)
--> 32 bit floating point output
compl.bit:  RMS=4.344827e-06 (PASS) maxdiff=1.275539e-05 (PASS)

==> src/mpg123.nofpu.compliance.txt <==

 Layer 3 
--> 16 bit signed integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 32 bit integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 24 bit integer output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)
--> 32 bit floating point output
compl.bit:  RMS=7.927192e-06 (PASS) maxdiff=2.676249e-05 (PASS)

With a nofpu decoder, you always get the precision of 16 bit output,
because floating point numbers are converted from 16 bit. But,
especially so with --enable-int-quality, this is a fully compliante
MPEG audio decoder with all the precision that you need for "normal"
playback situations.

MAD claims 24 bit precision with integer math
(just about matching mpg123's 24 bit output with FPU decoder, see
http://www.underbit.com/resources/mpeg/audio/compliance, RMS=4.906e−08)
I suspect though, that MAD will be considerably slower than mpg123's
arm_nofpu decoder. On my Core2Duo P8800, madplay with libmad 0.15.1
needs about  7.4 s to 8.5 s decoding to null output (with either speed or
accuracy optimization). The mpg123 numbers for the generic variants
(accurate == --enable-int-quality):

==> src/mpg123.fpu_accurate.bench.txt <==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.165.85

==> src/mpg123.fpu.bench.txt <==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.055.83

==> src/mpg123.nofpu_accurate.bench.txt <==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.676.81

==> src/mpg123.nofpu.bench.txt <==
#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 6.016.16

You see, there is some hit from accurate rounding, but it is in a
different league compared to the difference between fpu and nofpu on a
NEON-less ARM device (and yes, on a x86 CPU, generic FPU code is faster
when actually proucing float output).

Oh, and remember: This is for mpg123 with handbrakes on, using Taihei's
assembly optimizations, the decoding time is about halved on the Core2.
Similarily, I'd like to see numbers for madplay on ARM (best on
machines with and without fpu to get a picture about what difference we
talk about):

sh$ time -d -o null convergence_-_points_of_view/*.mp3

I don't know offhand how mpg123 nofpu stacks up against that, but there
should be a considerable difference in speed. My guess is that, on
limited hardware without NEON, you'd prefer stutter-free playback with
least CPU power draw. When utmost theoretical quality really matters or
you intend extensive post-processing of the data --- especially using
an audio player that works with floating point math internally, like
audacious --- then employing a more capable CPU with NEON is something
I expect. The mpg123 nofpu decoder, according Riku's numbers, is still
a good choice for systems with a FPU but no NEON, but the generic
floating point decoder is not that far behind in speed (compared to
softfloat) and offers proper floating point accuracy as bonus.

Generally, it is a safe bet tha

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 11:49:45 +0100
schrieb Thomas Orgis : 

> sh$ time -d -o null convergence_-_points_of_view/*.mp3

That should be

sh$ time madplay -d -o null: convergence_-_points_of_view/*.mp3

... as you may have guessed (notice the added ":").


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Felipe Sateler
On Mon, Mar 3, 2014 at 11:59 PM, peter green  wrote:
> wonder what we should use on raspbian? I haven't tested on a Pi yet but it
> seems that on all tests i've seen so-far the generic fpu code is quite a bit
> slower than the arm nofpu code.


Indeed, it seems to be:

==
felipe@felipepi:mpg123-20140302115523-nofpu%
./scripts/benchmark-cpu.pl src/mpg123
../convergence_-_points_of_view/*.mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
ARM 86.26   90.66

=

felipe@felipepi:mpg123-20140302115523% ./scripts/benchmark-cpu.pl
src/mpg123 ../convergence_-_points_of_view/*.mp3
Found 2 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decodert_s16/s t_f32/s
generic 102.80  100.06
generic_dither  121.10  100.84

=

-- 

Saludos,
Felipe Sateler

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Lennart Sorensen
On Tue, Mar 04, 2014 at 02:59:44AM +, peter green wrote:
> Seems sane to me. armv7 devices without neon are relatively uncommon
> so while it's important that they are supported it's IMO not vitally
> important to squeeze out every last drop of performance from them.

I don't agree.  At least the fisrt Tegra chips did not have neon, and the
marvell chips often don't have neon (the newer ones are starting to now
that they are moving to using Cortex-A designs, rather than marvell custom
cores (like the JP4 used in the armada 510 in the cubox for example),
but many chips don't have neon).  Do the qualcomm designs have neon?
I have been mostly ignoring them due to the anti open source attitude
of qualcomm.

If with=arm_fpu auto selects neon or VFP3 automatically, then I think
armhf is perfect for all armv7 devices.

> I wonder what we should use on raspbian? I haven't tested on a Pi
> yet but it seems that on all tests i've seen so-far the generic fpu
> code is quite a bit slower than the arm nofpu code. Is there any
> quality difference from using a fpu vs nonfpu decoder? If so how
> much performance degredation do you beleive should be accepted in
> exchange for that quality improvement.

So VFP2 is slower than interger math?  Interesting.

> IMO it's often better to be explicit about this sort of thing. While
> upstreams defaults may align with debian armhf's requirements at the
> present time and on the present build hardware such defaults are
> subject to change either as a result of upstream changes in new
> versions or as a result of different build hardware.

I suppose that makes sense.  Avoids unexpected surprises later.

-- 
Len Sorensen

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 11:10:25 -0300
schrieb Felipe Sateler : 

> #decodert_s16/s t_f32/s
> ARM 86.26   90.66
> generic 102.80  100.06
> generic_dither  121.10  100.84

Yes, a difference, but aguably a lot less than comparing VPU code to
NEON. With the feature to produce float output from all decoders, it is
your (debian's) option to prefer decoding speed by building a libmpg123
with arm_nofpu and use it on armhf machines without NEON via the
library loading mechanism. Or you decide for offering "proper" floating
point output that needs some 25-50 % more CPU time.

I am even more interested in a comparison with the runtime of madplay
in that configuration. Perhaps its fixed-point math with 24 bit output
is still faster than using the VFP with mpg123. Of course, I'd be
interested to know if that's not the case (mpg123 rulez!;-). But if it
is, it wouldn't totally surprise me.


Alrighty then,

Thomas

PS: You still have to decide for --enable-int-quality or not, for a
smaller impact on CPU time and basically one bit of precision.


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Felipe Sateler
On Tue, Mar 4, 2014 at 2:26 PM, Thomas Orgis  wrote:
> Am Tue, 4 Mar 2014 11:10:25 -0300
> schrieb Felipe Sateler :
>
>> #decodert_s16/s t_f32/s
>> ARM 86.26   90.66
>> generic 102.80  100.06
>> generic_dither  121.10  100.84
>
> Yes, a difference, but aguably a lot less than comparing VPU code to
> NEON. With the feature to produce float output from all decoders, it is
> your (debian's) option to prefer decoding speed by building a libmpg123
> with arm_nofpu and use it on armhf machines without NEON via the
> library loading mechanism. Or you decide for offering "proper" floating
> point output that needs some 25-50 % more CPU time.
>
> I am even more interested in a comparison with the runtime of madplay
> in that configuration. Perhaps its fixed-point math with 24 bit output
> is still faster than using the VFP with mpg123. Of course, I'd be
> interested to know if that's not the case (mpg123 rulez!;-). But if it
> is, it wouldn't totally surprise me.

madplay -d -o null: convergence_-_points_of_view/*.mp3 &> /dev/null
130.22s user 1.88s system 93% cpu 2:21.91 total

That's with the following mad:

MPEG Audio Decoder 0.15.1 (beta)
  Copyright (C) 2000-2004 Underbit Technologies, Inc.
  Build options: NDEBUG FPM_ARM ASO_IMDCT ASO_INTERLEAVE1

ID3 Tag Library 0.15.1 (beta)
  Copyright (C) 2000-2004 Underbit Technologies, Inc.
  Build options: NDEBUG

madplay 0.15.2 (beta)
  Copyright (C) 2000-2004 Robert Leslie
  Build options: AUDIO_DEFAULT=audio_alsa ENABLE_NLS

This is the madplay straight from raspbian, not sure if some other
configure flag was to be tested.



-- 

Saludos,
Felipe Sateler

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-04 Thread Thomas Orgis
Am Tue, 4 Mar 2014 16:25:17 -0300
schrieb Felipe Sateler : 

> >> #decodert_s16/s t_f32/s
> >> ARM 86.26   90.66
> >> generic 102.80  100.06
> >> generic_dither  121.10  100.84

> madplay -d -o null: convergence_-_points_of_view/*.mp3 &> /dev/null
> 130.22s user 1.88s system 93% cpu 2:21.91 total

Interesting. So the VFP is not that bad: You get superior output (not
noticeably, but measurable in the digital domain) from mpg123's generic
decoder in about 75 % of the decoding time.

The lower-quality 16 bit integer decoder of mpg123 is considerably
faster. So, on a armel system without VFP, it makes sense to employ
libmad to achieve 24 bit accuracy with reasonable CPU cost, if you
insist on that accuracy. But with VFP, using mpg123 gives you full 32
bit floating point output with less CPU load. For NEON, it's not even a
question.

I think I can live with that situation;-) both MAD and mpg123 achieve
their goals. MAD gets the best precision out of integer math, mpg123
offers something faster everywhere, possibly with less, but also
possibly with more (irrelevant, 24 bit is _really_ enough) precision.

One might also benchmark a decoder based on ffmpeg, which has both
fixed-point and floating-point decoders, but I don't have a good
command line for that at hand (used mplayer -ac mpg123 and mplayer -ac
ffmp3[float] in the past). Anyhow, leaving scope here. I should get
going and release mpg123 1.19.0 .

> This is the madplay straight from raspbian, not sure if some other
> configure flag was to be tested.

Optimizing for speed vs. quality might be an option ... but that's
somehow missing the point of preferring libmad.


Alrighty then,

Thomas


signature.asc
Description: PGP signature
___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers

Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-05 Thread Riku Voipio
On Tue, Mar 04, 2014 at 11:49:45AM +0100, Thomas Orgis wrote:
> In any case ... Riku: Care to run timings of MAD on your
> configurations? I'm interested in how fast it is producing that 24 bit
> output on limited CPUs.

time madplay -d -o null: convergence_-_points_of_view/*.mp3 &> /dev/null  

Cortex A15:

real0m33.154s
user0m33.045s
sys 0m0.110s

ARMv5:

real1m35.923s
user1m18.290s
sys 0m0.070s

Seems mpg123 wins bragging rights :) thanks, awesome work!

Riku

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-05 Thread Riku Voipio
On Tue, Mar 04, 2014 at 02:59:44AM +, peter green wrote:
> On Sun, Mar 02, 2014 at 09:06:44AM -0500, Reinhard Tartler wrote:
> 
> >That sounds like if the mpg123 package should use:
> >on armel: --with-cpu=arm_nofpu
> >on armhf: --with-cpu=arm_fpu
> >
> >
> >Does this make sense to everybody?
> Seems sane to me. armv7 devices without neon are relatively uncommon
> so while it's important that they are supported it's IMO not vitally
> important to squeeze out every last drop of performance from them.
 
> I wonder what we should use on raspbian? I haven't tested on a Pi
> yet but it seems that on all tests i've seen so-far the generic fpu
> code is quite a bit slower than the arm nofpu code. Is there any
> quality difference from using a fpu vs nonfpu decoder? If so how
> much performance degredation do you beleive should be accepted in
> exchange for that quality improvement.

I think nofpu would good for raspian. Any lost audio quality would
unnoticable on the Rasberry's analog audio output ;)

Peter, what's the recommended way to recognize raspbian in debian/rules
?

Riku

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers


Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

2014-03-06 Thread peter green

Riku Voipio wrote:

I think nofpu would good for raspian. Any lost audio quality would
unnoticable on the Rasberry's analog audio output ;)

Peter, what's the recommended way to recognize raspbian in debian/rules
?
  

dpkg-vendor --derives-from raspbian

___
pkg-multimedia-maintainers mailing list
pkg-multimedia-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-multimedia-maintainers