Re: [music-dsp] FIR blog post & interactive demo
It is certainly possible to combine STFT with fast convolution in various ways. But doing so imposes significant overhead costs and constrains the overall design in strong ways. For example, this approach: > On Mar 9, 2020, at 7:16 AM, Spencer Russell wrote: > > > if you have an KxN STFT (K frequency components and N frames) then then > zero-padding each frame by K-1 should still eliminate any time-aliasing even > if your filter has hard edges in the frequency domain, right? Right, but if you are using length K FFT and zero-padding by K-1, then the hop size is 1 sample and there are no windows. This is just applying the raw IDFT of the response as an FIR, which is not appropriate for something estimated in a windowed filterbank domain. Deriving an equivalent FIR from, say, an estimated noise reduction mask is not trivial. > > I understand the role of time-domain windowing in STFT processing to be > mostly: > 1. Reduce frequency-domain ripple (side-lobes in each band) Right, this is the “analysis” aspect, where the window controls the spectral characteristics (frequency selectivity, bandwidth, leakage, etc.) > 2. Provide a sort of cross-fade from frame-to-frame to smooth out framing > effects And that is the “synthesis” aspect, where the window controls the characteristics of the artifacts introduced by processing. Note that “framing effects” are by definition time-variant: this is a form of aliasing. Ethan ___ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] FIR blog post & interactive demo
> On March 8, 2020 7:55 PM Ethan Duni wrote: > > Fast FIR is a different thing than an FFT filter bank. > > You can combine the two approaches but I don’t think that’s what is being > done here? > On March 9, 2020 10:15 AM Spencer Russell wrote: > > > I think we're mostly on the same page, Ethan. well, i think that i am on the same page as Ethan. > Though even with STFT-domain time-variant filtering (such as with noise > reduction, or mask-based source separation) it would seem you could still > zero-pad each input frame to eliminate any issues due to time-aliasing. zero-padding is the sole technique that gets rid of time-aliasing. let's say your FIR is of length L. let's say that your frame hop is H and frame length is F ≥ H and we're doing overlap-add. then your F samples of input (H samples are *new* samples in the current frame, F-H samples are remaining from the previous frame) are considered zero-padded out to infinity in both directions. then the length of the result of linear convolution is L+F-1. now if you can guarantee that the size of the DFT, which we'll call "N" (and most of the time is a power of 2) is at least as large as the non-zero length of the linear convolution, then the result of circular convolution of the zero-padded FIR and the zero-padded frame of samples will be exactly the same. that means N ≥ L + F - 1 this is always true whether the windowing is rectangular or something else. and, whether your FIR varies in definition or not, the length L must never be longer than N-F+1. all frequency responses (which is what you multiply with in the frequency domain) must be the N-point DFT of an FIR limited in length to N-F+1. if it is a rectangular window, the frame length and frame hop are the same, F=H, and the number of generated output samples that are valid is H, and the most you can hope to get is: H = F = N - L + 1 now, if you want to window that input data with a complementary window, such as the Hann window, that's fine, but instead of having the frame hop equal to the frame length, the relationship between the two is F = 2H - 1 or H = (F+1)/2 (50% overlap) so now, the number of valid output samples is about half as before. H = (F+1)/2 = (N-L)/2 + 1 so the input buffer to the FFT will still be zero padded with N+1-F zeros, independent of the hop size. but if you get a bigger hop size and more output samples per frame with a rectangular window. and in both cases you get exactly the same results (up to rounding error) in either case. now, if it is overlap-scrap (or "overlap-save") and a rectangular window (which is no window at all, because the data is not zero-padded), the output samples are "butt spliced" (no crossfade) if your FIR filter changes in frequency response, the new timbre of the filter is applied instantly with the new frame. but if is is overlap-add then the F samples in the frame are zero-padded with N-F zeros and there is a form of crossfading, even with a rectangular window, from one frame to the next if the FIR filter definition changes. if you cut your frame hop size, H, from F to nearly half (F+1)/2 (and use a complementary window such as Hann), it is half as efficient, but the crossfade is even smoother (and the frame rate is faster, so the filter definition can change more often). all of this is well-established knowledge regarding frame-by-frame processing with windows and the FFT. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." ___ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] FIR blog post & interactive demo
I think we're mostly on the same page, Ethan. Though even with STFT-domain time-variant filtering (such as with noise reduction, or mask-based source separation) it would seem you could still zero-pad each input frame to eliminate any issues due to time-aliasing. As you mention (paraphrasing), you can smooth out the mask which will reduce the amount of zero-padding you need, but if you have an KxN STFT (K frequency components and N frames) then then zero-padding each frame by K-1 should still eliminate any time-aliasing even if your filter has hard edges in the frequency domain, right? I understand the role of time-domain windowing in STFT processing to be mostly: 1. Reduce frequency-domain ripple (side-lobes in each band) 2. Provide a sort of cross-fade from frame-to-frame to smooth out framing effects In my mind doing STFT-domain masking/filtering is _roughly_ equivalent to a filter bank with time-varying response. In the STFT case though you're keeping things invariant within each frame and then cross-fading from frame to frame. This is a pretty intuitive/ad-hoc way of thinking on my part though - I'd love to see some literature that gives a more formal treatment. -s On Mon, Mar 9, 2020, at 12:52 AM, Ethan Duni wrote: > > > On Sun, Mar 8, 2020 at 8:02 PM Spencer Russell wrote: >> In fact, the the standard STFT analysis/synthesis pipeline is the same thing >> as overlap-add "fast convolution" if you: >> >> 1. Use a rectangular window with a length equal to your hop size >> 2. zero-pad each input frame by the length of your FIR kernel minus 1 > > Indeed, the two ideas are closely related and can be combined. It's more a > difference in the larger approach. > > If you can specify the desired response in terms of an FIR of some fixed > length, then you can account for the circular effects and use fast FIR. Note > that this is a time-variant MIMO system constructed to be equivalent to a > time-invariant SISO system (modulo finite word length effects, as you say). > > Alternatively, the desired response can be specified in the STFT domain. This > comes up naturally in situations where it is estimated in the frequency > domain to begin with, such as noise suppression or channel equalization. > Then, circular convolution effects are controlled through a combination of > pre/post windowing and smoothing/conditioning of the frequency response. > Unlike the fast FIR case, the time-variant effects are only approximately > suppressed: this is a time-variant MIMO system that is *not* equivalent to > any time-invariant SISO system. > > So there is an extra layer of engineering needed in STFT systems to ensure > that time domain aliasing is adequately suppressed. With fast FIR, you just > calculate the correct size to zero-pad (or delete), and then there is no > aliasing to worry about. > > Ethan > ___ > dupswapdrop: music-dsp mailing list > music-dsp@music.columbia.edu > https://lists.columbia.edu/mailman/listinfo/music-dsp ___ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp