Bug#832095: zita-resampler - debian bug

2019-08-10 Thread Steinar H. Gunderson
On Mon, Sep 05, 2016 at 01:24:59AM +0200, Steinar H. Gunderson wrote:
> I expanded on the patch; this new version supports 1, 2 and multiples of 4
> channels, and it no longer relies on the serial code (when I can't write past
> the end, I just write to a temporary buffer and copy out from there).
> Of course, you'll need to reorganize to fit the new structure once you have
> it, but hopefully, that should be simple.

I keep coming back to this. :-) Is there any progress here, three years later?
It would be really nice to have that reduced CPU usage; my laptop isn't
getting any faster. :-)

/* Steinar */
-- 
Homepage: https://www.sesse.net/



Bug#832095: zita-resampler - debian bug

2016-09-04 Thread Steinar H. Gunderson
On Mon, Aug 29, 2016 at 06:58:24PM +, Fons Adriaensen wrote:
> I wil not accept the patch in its current form, but OTOH the
> code is too good to just be ignored, so I will integrate it
> in another way.
> 
> For the next release of zita-resampler I will reorganise the
> code a bit, so it will be possible to have separate optimised
> Resampler1,2,4 classes (for 1,2,4 channels respectively) using
> the SSE code, and without too much code duplication. Same for
> Vresampler.
> 
> So Steinar, could you provide optimised 1 and 4 chan versions
> as well ? Even better would be if the latter could handle any
> multiple of 4 channels. In all cases you may assume (hlen % 4 == 0). 

Hi Fons,

I expanded on the patch; this new version supports 1, 2 and multiples of 4
channels, and it no longer relies on the serial code (when I can't write past
the end, I just write to a temporary buffer and copy out from there).
Of course, you'll need to reorganize to fit the new structure once you have
it, but hopefully, that should be simple.

The code is pretty much straight-up; the multiples-of-4 VResampler
version isn't optimal for 4 nor for multiples-of-4, but it should be a
reasonable compromise between the two. I guess that if multiples-of-4
is the more important case, we should go to storing the coefficients
(like the scalar version does) instead of computing them anew for each
group of four. (Except for hlen <= 32, in which case it probably would
be optimal to keep them in registers, but I'm not writing special code
for that :-) )

I didn't write AVX versions yet because they would need function
multiversioning to be useful, and in the current structure, that would
mean quite a lot of duplicated code. (Unfortunately, you can't multiversion
inlined functions yet.)

I wrote some test code to make sure I didn't mess up. My original test for
this was based on resampling noise and comparing it to a reference rendering,
but this required my entire audio pipeline running, so I made a simpler one
that simply resamples a sine and compares to another sine. It is simplistic,
but catches most kinds of SSE-ification mistakes instantly, so I've included
it in case you find it useful. Consider both the patch and the test file
licensed under GPLv3+ -- let me know if you want some other kind of license.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
// zita-resampler tests; meant only to verify correctness of
// e.g. SSE optimizations, not as a means of comparing quality
// of different resamplers, or as exhaustive tests.

#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

constexpr int in_freq = 44100;
constexpr int out_freq = 48000;
constexpr int samples_per_block = 137;  // Prime.

vector make_freqs(unsigned num_channels)
{
	vector ret;
	for (unsigned i = 0; i < num_channels; ++i) {
		ret.push_back(440 - i);
	}
	return ret;
}

void setup_resampler(VResampler *resampler, unsigned in_freq, unsigned out_freq, unsigned num_channels, float rratio)
{
	resampler->setup(double(out_freq) / double(in_freq), num_channels, /*hlen=*/32);
	resampler->set_rratio(rratio);
}

void setup_resampler(Resampler *resampler, unsigned in_freq, unsigned out_freq, unsigned num_channels, float rratio)
{
	resampler->setup(in_freq, out_freq, num_channels, /*hlen=*/32);
	assert(rratio == 1.0f);
}

void print_result(VResampler *resampler, unsigned num_channels, float rratio, float max_err_db, float rms_err_db)
{
	printf("VResampler test: %u channel(s), rratio=%.2f   max error = %+7.2f dB   RMS error = %+7.2f dB\n",
		num_channels, rratio, max_err_db, rms_err_db);
}

void print_result(Resampler *resampler, unsigned num_channels, float rratio, float max_err_db, float rms_err_db)
{
	printf("Resampler test:  %u channel(s)max error = %+7.2f dB   RMS error = %+7.2f dB\n",
		num_channels, max_err_db, rms_err_db);
}

template
void test_resampler(int num_channels, float rratio = 1.0f)
{
	vector freqs = make_freqs(num_channels);
	T resampler;
	setup_resampler(, in_freq, out_freq, num_channels, rratio);

	const int initial_delay = resampler.inpsize() / 2 - 1;
	vector in_phaser_speeds, out_phaser_speeds;
	for (unsigned i = 0; i < num_channels; ++i) {
		in_phaser_speeds.push_back(2.0 * M_PI * freqs[i] / in_freq);
		out_phaser_speeds.push_back(2.0 * M_PI * freqs[i] / (out_freq * rratio));
	}

	float max_err = 0.0f, sum_sq_err = 0.0f;

	unsigned num_output_samples = 0;
	vector in, out;
	out.resize(samples_per_block * num_channels * 2);  // Plenty.
	for (unsigned block = 0; block < 10; ++block) {
		in.clear();
		for (unsigned i = 0; i < samples_per_block; ++i) {
			int sample_num = block * samples_per_block + i;
			for (unsigned channel = 0; channel < num_channels; ++channel) {
in.push_back(sin((sample_num - initial_delay) * in_phaser_speeds[channel]));
			}
		}
		resampler.inp_count = in.size() / num_channels;
		resampler.inp_data = [0];
		resampler.out_count = out.size() / num_channels;
		resampler.out_data = [0];