Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

robert bristow-johnson Mon, 24 Apr 2023 20:01:00 -0700

> On 04/24/2023 1:51 PM EDT Sampo Syreeni <de...@iki.fi> wrote:
> 
>  
> On 2023-04-08, robert bristow-johnson wrote:
> 
> > Listen, people here that know me from the 1990s, know that I was a 
> > staunch fix-point advocate.
> 
> And people don't know me.


Well, I don't *know* you, but I remember you from way back in the 90s, I think. 
 Doug Repetto started this a long time ago and I thought you were there early 
in the list's history.

> For a reason: I'm not a practitioner, but a 
> theoretical amateur in the field. You have every reason to chastise me. 
> However, about floats, fixpoint and denormalisation...
> 
> It's just easier and mathematically simpler to work in fixpoint.

Whoa!  That's very interesting!  Seems to me that the common sentiment was to 
the contrary.  With floating point you don't have to worry about scaling and 
trading off headroom with quantization noise floor.

> Earlier 
> you just couldn't have the range for audio work you needed, so there had 
> to be floats, A/μ-laws, and whatnot.

With fixed-point, you can have A/μ-law and whatnot.  Even FFT.  Even LPC.  But 
you have to worry more about scaling and the tradeoff with dB headroom vs. dB 
quantization noise floor.  With floating point, the floor remains 150 dB below 
your signal level, no matter what.  And, internally, you don't have to worry 
about any ceiling.

> But now you don't need them. 
> Definitely you don't need them in discussing 64-bit arithmetic; as I 
> said, even usual 32-bit C-float lets you have a linear range of 24 bits, 
> signed, which is more than enough.

Well, there's the cosine problem that even infects single-precision floating 
point when frequencies are close to DC and much lower than Nyquist.

https://urldefense.proofpoint.com/v2/url?u=https-3A__dsp.stackexchange.com_questions_16885_how-2Ddo-2Di-2Dmanually-2Dplot-2Dthe-2Dfrequency-2Dresponse-2Dof-2Da-2Dbandpass-2Dbutterworth-2Dfilter-2Di_16911-2316911&d=DwIFaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=kZzkho2lSPnZshw6h06fhjNg13AWUcVP5vT6Wh0FqQ_T6YCdVuwYIjwP2UgZaE_V&s=FSbFSzLyRMGHfBra3DAkeUYKyk9GJbCY-iGUBTQrHx0&e=
 

So even single-precision floating point doesn't save my ass, because cos(omega) 
is so close to 1 that all of the bits with information regarding frequency fall 
off the edge in the floating-point word.  Just as they would with fixed-point.  
(So I had to rejigger all of my math replacing cos(omega) with 1 - 2 
(sin(omega/2))^2. That's just a good practice anyway.)

Double-precision fixes it, but you pay the price of the word size.

> I do understand where floats and denormals come from. They're half about 
> numerical analysis, and half about software development ease. If you 
> want to deal with quantities which range wildly over orders of 
> magnitude, and their inexatitude is relative, not absolute, you'll want 
> floats.

I agree.  

> There's really no substitute for floats there. Then if you want 
> your numerical algorithm to not underflow, in many cases you'd want your 
> floats to denormalise — which is to say, suddenly behave linearly, 
> unlike floats as an exponential representation do, and like fixpoint 
> does.
> 
> Since we're talking on a sound-minded group, I perhaps should remind you 
> of "the gain structure". How analogue studios controlled their noise.
> 
> I believe the choice of digital gauge is very much the same as that one. 
> If you do it wrong, on the analogue side you'll be left with unbearable 
> noise. On the digital side, you'll be left with digital rounding noise. 

But, with floating-point, the digital rounding noise should remain a constant 
level below the signal level.  No matter what the gain structure is.

> But if you control your gain structure right, especially within nowadays 
> rather wide 24-bit architecture, you really don't even have to think 
> about it too much. It mostly just works out.

It should work out anyway with floating point.  Especially 64-bit floating 
point.  You shouldn't have to think of it at all, using floating point.  Of 
course you **do** have to worry about gain structure with analog and with 
fixed-point arithmetic.

> 
> (I'd actually say, calibrate your studio absolutely, like them movie 
> people do. Done so, a 24-bit linear stream goes below perceptual 
> thresholds when quietest, and exceeds the threshold of pain at max. So it 
> linearly covers the whole range, and the representation can be worked 
> with as wholly linear — except that nobody currently does so. Not even 
> the movie people; even they mix in relative amplitude and only then set 
> the final absolute calibration. I think that's thoroughly stupid; the 
> one little wing of our important audio work which actually *has* a set 
> amplitude reference, chooses not to utilize it.)
> 
> > If given an assignment of developing an audio processing system using 
> > fixed-point math, I will not shrink away from the challenge, but 
> > **if** the project is "Hey we got this 64-bit ARM with FPU in it and 
> > gobs of memory, I don't want my code to be checking for saturation and 
> > "minding the gain structure".  Fuck no.
> 
> You don't have to mind saturation if your gain structure is well thought 
> out.

Thinking it out is minding the gain structure.

> If you know the limits and averages of your input signals, and 
> scale your sums appropriately.

But you should not *have* to scale your sums with floating point anyway.

(Now with a sliding average or sliding sum, there **is** a problem with 
floating-point summation that you don't have with fixed-point.)

> It's not even hard. Even for me, as a rank amateur.

It's not that hard.  And it's necessary for fixed point.

But it should be of no concern at all with floating point, especially 64-bit 
floating point.

> 
> What's really hard is controlling nonlinearity in your signal processing 
> algorithm. What's doubly hard is controlling the semi-logarithmic 
> tendency floats obviously do, *and* at the same time ne linear ramp 
> denormals produce. Floats and denormals *do* make it easier for the 
> average chump to churn out his average numerical codes, but once we go 
> into numerical analysis, signal processing, for real, as this list is 
> ostensibly about, that mix of linear and semi-logarithmic is a horror. 

If denorms were dealt with routinely and the FPU gave us the most accurate 
result that fits into the 64-bit floating-point word, we shouldn't have to care 
about any horror.

> You don't want to deal with it; if you don't think it's a horror, then 
> you haven't tried to deal with to begin with.
> 
> My favourite here is dithering.

Dithering is easier to do to fixed-point numbers.  For sure.  But with 64-bit 
floats, is the undithered quantization error ever going to have an effect at 
the output word where quantization *must* be done because we're outputting to a 
DAC?  It's only at that operation where 64-bit float goes in and a 24-bit 
integer comes out, only there do I consider where dithering and noise-shaping 
is necessary.  But, internally in the algorithm, with 64-bit floats, all 
samples and all parameters can be represented naturally as numbers that would 
be the same quantities in the mathematics that you're implementing in code.  
Why scale these numbers from their natural value?

> There's no known closed form solution of 
> how to do any of it given denormals.

That's also true for floating-point addition or subtraction.  The FPU has to 
perform the operation to see how many leading zeros to shift away (and 
compensate in the exponent).  Denorms would be no worse.

> You can do some of it using floats, 
> but only after approximating them via a true exponential. Denormals, 
> they are impossible to analyze together with floats; you can't easily do 
> mathematics in the linear and multiplicative domains, at the same time. 
> Especially when you cut off the regime arbirarily, at your lowest float 
> bit depth; that's yet another arbitrary nonlinearity to your analysis, 
> right there.
> 
> > But, if my tool is a 64-bit processor that can do 64x64 to 64-bit 
> > result in the same nanosecond instruction cycle as anything else (like 
> > 32-bit fixed-point processing), why would I toss that headroom and 
> > legroom away?
> 
> But it can't, basically because of carry propagation in digital 
> circuitry. It is impossible to do sums or products in O(1) circuit 
> width. This means that 32-bit arithmetic will always in the end be more 
> efficient than its 64-bit bigger brother. That being when it applies; if 
> you really need 64 bits, then the hardware often helps you. But if the 
> underlying algorithm can be parallelized to two 32-bit ALU's, the 
> narrower bitwidth will necessarily win out.
> 

Well, I dunno, I was just assuming that these 64-bit processors were doing 
64-bit arithmetic (maybe divide is different) with 1 instruction cycle.  Add, 
subtract, multiply, all with one instruction cycle.  Denorms add a few more 
gates to set up a binary mantissa and exponent word in the same format that 
normals have internally in the FPU.

> 
> > It's only when a **final** sample value is getting output, that I 
> > should need to worry about gain, saturation, quantization, and 
> > noise-shaping.  I shouldn't have to worry about it anywhere else. 
> > Not if I'm using a 64-bit ARM.
> 
> I beg to differ. My ideal is that whatever you bring into your audio 
> processing chain, is absolutely referenced. It has an absolute decibel 
> level attached to it. Like them movie people attempt to do it.
> 

I think that audio samples can be scaled so that +1.0 and -1.0 are full scale.  
All other numbers (coefficients, parameters) can have their natural value.  But 
with fixed-point you might have to scale the parameters.

> If you work like that, and your gain structure follows, there's no 
> saturation or quantization or anything like that, anywhere, ever. You 
> can and *will* work within the 24-bit linear range of even a 32-bit 
> float, because 1) going lower than 1-bit would be unhearable, and 2) 
> going to the full 24 bits would literally split your ears. Then between 
> those wide limits, you have full linearity, which helps you produce 
> better and more stable algorithms.

I haven't completely grokked your meaning here, Sampo.


--

r b-j . _ . _ . _ . _ r...@audioimagination.com

"Imagination is more important than knowledge."

.
.
.

Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

Reply via email to