Re: [music-dsp] FIR blog post & interactive demo

2020-03-09 Thread Ethan Duni
It is certainly possible to combine STFT with fast convolution in various ways. 
But doing so imposes significant overhead costs and constrains the overall 
design in strong ways. 

For example, this approach:

> On Mar 9, 2020, at 7:16 AM, Spencer Russell  wrote:
> 
> 
> if you have an KxN STFT (K frequency components and N frames) then then 
> zero-padding each frame by K-1 should still eliminate any time-aliasing even 
> if your filter has hard edges in the frequency domain, right?

Right, but if you are using length K FFT and zero-padding by K-1, then the hop 
size is 1 sample and there are no windows. 

This is just applying the raw IDFT of the response as an FIR, which is not 
appropriate for something estimated in a windowed filterbank domain. Deriving 
an equivalent FIR from, say, an estimated noise reduction mask is not trivial.

> 
> I understand the role of time-domain windowing in STFT processing to be 
> mostly:
> 1. Reduce frequency-domain ripple (side-lobes in each band)

Right, this is the “analysis” aspect, where the window controls the spectral 
characteristics (frequency selectivity, bandwidth, leakage, etc.)

> 2. Provide a sort of cross-fade from frame-to-frame to smooth out framing 
> effects

And that is the “synthesis” aspect, where the window controls the 
characteristics of the artifacts introduced by processing. Note that “framing 
effects” are by definition time-variant: this is a form of aliasing.

Ethan
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] FIR blog post & interactive demo

2020-03-09 Thread robert bristow-johnson

> On March 8, 2020 7:55 PM Ethan Duni  wrote:
> 
> Fast FIR is a different thing than an FFT filter bank.
> 
> You can combine the two approaches but I don’t think that’s what is being 
> done here?

> On March 9, 2020 10:15 AM Spencer Russell  wrote:
> 
> 
> I think we're mostly on the same page, Ethan.

well, i think that i am on the same page as Ethan.

> Though even with STFT-domain time-variant filtering (such as with noise 
> reduction, or mask-based source separation) it would seem you could still 
> zero-pad each input frame to eliminate any issues due to time-aliasing.

zero-padding is the sole technique that gets rid of time-aliasing.

let's say your FIR is of length L.  let's say that your frame hop is H and 
frame length is F ≥ H and we're doing overlap-add.  then your F samples of 
input (H samples are *new* samples in the current frame, F-H samples are 
remaining from the previous frame) are considered zero-padded out to infinity 
in both directions.  then the length of the result of linear convolution is 
L+F-1.  now if you can guarantee that the size of the DFT, which we'll call "N" 
(and most of the time is a power of 2) is at least as large as the non-zero 
length of the linear convolution, then the result of circular convolution of 
the zero-padded FIR and the zero-padded frame of samples will be exactly the 
same.  that means

   N ≥ L + F - 1

this is always true whether the windowing is rectangular or something else.  
and, whether your FIR varies in definition or not, the length L must never be 
longer than N-F+1.  all frequency responses (which is what you multiply with in 
the frequency domain) must be the N-point DFT of an FIR limited in length to 
N-F+1.

if it is a rectangular window, the frame length and frame hop are the same, 
F=H, and the number of generated output samples that are valid is H, and the 
most you can hope to get is:

H = F = N - L + 1

now, if you want to window that input data with a complementary window, such as 
the Hann window, that's fine, but instead of having the frame hop equal to the 
frame length, the relationship between the two is

F = 2H - 1   or   H = (F+1)/2   (50% overlap)

so now, the number of valid output samples is about half as before.

H = (F+1)/2 = (N-L)/2 + 1

so the input buffer to the FFT will still be zero padded with N+1-F zeros, 
independent of the hop size.  but if you get a bigger hop size and more output 
samples per frame with a rectangular window.  and in both cases you get exactly 
the same results (up to rounding error) in either case.

now, if it is overlap-scrap (or "overlap-save") and a rectangular window (which 
is no window at all, because the data is not zero-padded), the output samples 
are "butt spliced" (no crossfade) if your FIR filter changes in frequency 
response, the new timbre of the filter is applied instantly with the new frame.

but if is is overlap-add then the F samples in the frame are zero-padded with 
N-F zeros and there is a form of crossfading, even with a rectangular window, 
from one frame to the next if the FIR filter definition changes.

if you cut your frame hop size, H, from F to nearly half (F+1)/2 (and use a 
complementary window such as Hann), it is half as efficient, but the crossfade 
is even smoother (and the frame rate is faster, so the filter definition can 
change more often).

all of this is well-established knowledge regarding frame-by-frame processing 
with windows and the FFT.

--
 
r b-j  r...@audioimagination.com
 
"Imagination is more important than knowledge."
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] FIR blog post & interactive demo

2020-03-09 Thread Spencer Russell
I think we're mostly on the same page, Ethan. Though even with STFT-domain 
time-variant filtering (such as with noise reduction, or mask-based source 
separation) it would seem you could still zero-pad each input frame to 
eliminate any issues due to time-aliasing. As you mention (paraphrasing), you 
can smooth out the mask which will reduce the amount of zero-padding you need, 
but if you have an KxN STFT (K frequency components and N frames) then then 
zero-padding each frame by K-1 should still eliminate any time-aliasing even if 
your filter has hard edges in the frequency domain, right?

I understand the role of time-domain windowing in STFT processing to be mostly:
1. Reduce frequency-domain ripple (side-lobes in each band)
2. Provide a sort of cross-fade from frame-to-frame to smooth out framing 
effects

In my mind doing STFT-domain masking/filtering is _roughly_ equivalent to a 
filter bank with time-varying response. In the STFT case though you're keeping 
things invariant within each frame and then cross-fading from frame to frame. 
This is a pretty intuitive/ad-hoc way of thinking on my part though - I'd love 
to see some literature that gives a more formal treatment.

-s

On Mon, Mar 9, 2020, at 12:52 AM, Ethan Duni wrote:
> 
> 
> On Sun, Mar 8, 2020 at 8:02 PM Spencer Russell  wrote:
>> In fact, the the standard STFT analysis/synthesis pipeline is the same thing 
>> as overlap-add "fast convolution" if you:
>> 
>> 1. Use a rectangular window with a length equal to your hop size
>> 2. zero-pad each input frame by the length of your FIR kernel minus 1
> 
> Indeed, the two ideas are closely related and can be combined. It's more a 
> difference in the larger approach. 
> 
> If you can specify the desired response in terms of an FIR of some fixed 
> length, then you can account for the circular effects and use fast FIR. Note 
> that this is a time-variant MIMO system constructed to be equivalent to a 
> time-invariant SISO system (modulo finite word length effects, as you say). 
> 
> Alternatively, the desired response can be specified in the STFT domain. This 
> comes up naturally in situations where it is estimated in the frequency 
> domain to begin with, such as noise suppression or channel equalization. 
> Then, circular convolution effects are controlled through a combination of 
> pre/post windowing and smoothing/conditioning of the frequency response. 
> Unlike the fast FIR case, the time-variant effects are only approximately 
> suppressed: this is a time-variant MIMO system that is *not* equivalent to 
> any time-invariant SISO system. 
> 
> So there is an extra layer of engineering needed in STFT systems to ensure 
> that time domain aliasing is adequately suppressed. With fast FIR, you just 
> calculate the correct size to zero-pad (or delete), and then there is no 
> aliasing to worry about. 
> 
> Ethan
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp