Re: [Clam-devel] residual spectrum line segment approximation?

roumbaba Fri, 08 Aug 2008 17:16:38 -0700

Hi Xavier, and thank you for you reply,

I will ask this question to another forum as you suggest. Onequestion though that another forum might not be able to address ishow specifically SMS internally deals with the analysis window size:When I specify an analysis window of 1024 I get 1STF frames ofwindowsize 1025 and fft size 513. What happens internally?


Anyhow, I actually have tried different input spectrums

        - the original spectrum with phase randomization

- the original spectrum subsampled (in frequency domain) thenlinearly interpolated to reconstruct it with its full number of values.

I always get the same type of audio results which I suspect might bedue to the way I choose window sizes and what I do with that 513thsample.Following your recommendations today I also did some trys where I"deconvolved" the effect of the original analysis window before Ieven do the linear subsampling (or the phase randomization). Still Iget the same type of artifacts which actually are not "phasediscontinuities" as I wrongly stated in my previous message. (What Imeant by that was clicks caused by discontinuities in the synthesizedaudio signal ). In fact the artifacts now is that the resynthesizedsignal seems to be composed of "packets" or "wavelet kernels" whichseems to indicate that the overlap add is wrong somewhere, or that mywindow shapes are off or something.

I agree that a short sound can save lines of text. I have 4 audioexamples (about 184Kb each) that I can send to you directly to avoidsending it to the whole list. (I don't have a place to post it unlessyou know of one.)

1- the interpolated noise spectrum resynth with convolution by bh92in freq domain to compensate for the 'lost' window

2- the interpolated noise spectrum resynth without convolution:

3- the original noise spectrum resynth:originalSpec.noconv.synth_res.wav4- the original noise spectrum resynth with convolution of bh92 inspec domain (window applied twice but sound more or less ok asopposed to 1 and 2)



Thank you again for your time,

Roumbaba




On 7 août 08, at 14:18, Xavier Amatriain wrote:

Hi Baba,
Sorry for the late response but I think that this discussion isgetting a bit off-topic for this mailing list as it is more adiscussion on DSP issues than on CLAM itself. I encourage you totake the thread to the music-dsp mailing list [1] where you willprobably get much more (and quicker) feedback on general DSPquestions... Unfortunately I don't have as much time as I wished toget to these questions that require more thinking than writing ;-)
In any case, I don't see anything fundamentally wrong in yourprocedure except in the way you have decided on the input spectrum.The idea behind applying the BH92 to the residual spectrum wasbecause when doing the line approximation out of few spectralpoints you are "losing" the effect of the analysis window. It issimilar to what happens when you do the peak detection process inthe sinusoidal component. If you use the original spectrum you arein fact applying the window twice, right? Or am I missingsomething? As a quick test you could try doing a peak detection +sinusoidal synthesis (without phase continuation) also on theresidual component. This should mimic the effect of what I wasproposing... more or less.
Also, what do you exactly mean when you mention phasediscontinuities? Could you post some audio examples somewhere?Listening to the result can sometimes save a few lines of emailtext :-)
X


[1] http://music.columbia.edu/mailman/listinfo/music-dsp

roumbaba wrote:
So I have *not* managed to correctly apply the bh92 window to mymodified residual spectrum and thus I have *not* eliminate phasediscontinuities at resynth time.
One thing i still do not understand is why SMS need odd analysiswindow sizes and how I should handle this. I specify analysiswindow size to be 1024 and internally it seems to become 1025 andmy 1STF frames are 513 in size. The fact that i do not understandthat issue might be one of the source of what I do not do right.
Here is where I am at so far. Any hint on what I do wrong orshould do otherwise is welcome of course:
- For testing purpose the only modification I do to the original513 values of the noise spectrum is to randomize phases.- Then I expand the 513 spectrum to a 1026 spectrum by an evensymetry across the 513.5 axis and complex conjugate of the last513 values.
- Then I do a circular convolution of my 1026 spectrum with theFFT of a 1026 bh92time window.
    the way I compute the bh92  time window is (matlab code for now):

        w1Length = 1026;
        fConst=2*pi/(w1Length+1-1);
        w1=[1:w1Length];
w1=.35875 -.48829*cos(fConst*w1)+.14128*cos(fConst*2*w1) -.01168*cos(fConst*3*w1);
- When I check the real part (and the magnitude) of the ifft ofthe resulting 1026 values spectrum resulting of the convolutiong,I do see that the windowing worked and that the resulting timesignal smoothes to 0 at begining and end.
- Then I take the first 513 values of the resulting spectrum andreplace the corresponding 1STF frame in the original sdif analysisfile
Still I get phase discontinuites in the resynth signal.

What am i missing?

Thanks,

Baba






On 15 juil. 08, at 14:55, Xavier Amatriain wrote:
Hi Roumbaba, and congrats for your progress!
You are right on the source of your problem: SMSSynthesis expectsyour residual to come with an analysis window and if not thingsare likely to mess up.
The lines that are "guilty" for that are around SMSSynthesis.cxx:252
http://clam.iua.upf.edu/doc/CLAM-doxygen/SMSSynthesis_8cxx-source.html#l00252
First the peaks are synthesized into a sinusoidal spectrum. Thenthe two spectrums are added. Already at that point the spectrumsare supposed to have the same analysis window (BH92) and size.The effect of that window is undone in line 261 when the globalspectral synthesis is performed.
The issue here is that you need to guarantee that both spectrumcome from a similar place before adding them... The sinusoidalpeaks are reconstructed by convolving by the transform of themain lobe of the window (BH92) but you are reconstructing theresidual in a different way. So.... you either apply the BH92transform to your spectrum or avoid doing that in the peaksynthesis (and then avoid multiplying by the inverse in theglobal spectral synthesis). None of the two options are immediatebut I'd say the first one should be easier to work out.
Hope it helps... and if you get it to work don't forget to reportback.
roumbaba wrote:
Hello all and thanks again for your previous help,
So I have written some matlab script to perform noise spectrumline segment approximation.
- As input the script takes an sdif file generated by analysiswith SMSConsole.- It then reads all sdif frames, in particular the 1STF framescontaining the noise spectrums in complex form.
- It converts these complex spectrums into magPhase form
- It performs line segment approximation on the amplitudes.
To check the impact of the approximation on the quality ofresynthesis the script does the following:- It reconstructs full noise magnitude spectrums from the lineapproximations (by linear interpolation)
- It randomizes the phases
- It converts the new "smoothed" magPhase spectrums back tocomplex spectrums- It writes back the sdif file with these new "smoothed"spectrums instead of the original raw noise spectrums.
Then I run SMSConsole to synthesize that sdif file with theexact same parameters than for the original sdif file.My problem is that the resulting synthesised noise sounds likesomething is wrong in the synthesis overlap-add (like lots ofdiscontinuites in the resynthesis)I think that this might be due to what is described in theSerra/Smith 1990 CMJ paper concerning line segment approximationnoise resynthesis:
" ...Since the [new] phase spectrum used is not the result of ananalysis process (with windowing of a waveform, zero padding,and FFT computation), the resulting signal does not tapper to 0at the boundaries. This is because a phase spectrum with randomvalues corresponds to a phase spectrum of a rectangular-windowednoise waveform of size N. In order to succeed in the overlap-addresynthesis (ie, to obtain smooth transitions between frames) weneed a smoothly windowed waveform of size M, where M is thesynthesis-window length. ....
"
So what might be happening is that by default SMSConsole assumesthat the 1STF frames are *NOT* line segment approximation andtherefore does *NOT* perform that last windowing at synthesistime. I have gone a little bit through SMS/Clam code but Icannot find where I can change this behavior or even if that isthe default behavior. Where shoud I look in the SMS/Clam code?
Thanks,

Roumbaba



On 27 mai 08, at 23:25, Xavier Amatriain wrote:
Hi Roumbaba,
In the paper you cite it says "you can", which does not mean"you have to" :-) Doing an approximation of the residual modelis indeedan interesting thing to do, especially if you want to reducethe amount of data in your transformed signal, however it isnot a must.Note that there are many other ways to model the residual apartfrom the one mentioned in that paper.
So far, in CLAM we are using the residual as is, with nomodeling or approximation. The "only" downside is that thetransformedsignal (SMS Data) is in fact larger than the original audiowhen it could be much smaller with not much loss in quality. Ifforwhatever reason you do need to do the residual modeling you canlook at the SpectralEnvelopeExtract processing. This processinggenerates a spectral approximation (spectrum in bpf format) butfrom an array of peaks, it would not be hard to modify it to work
with an input spectrum.

X


roumbaba wrote:
Hi all,
I am trying to understand how the residual spectrum getsmodeled in clam/SMS. I have read the Serra/Smith 1990 CMJpaper and as I understand it it describes two steps:
1- substract the harmonic spectrum from the original spectrum
2- perform a line-segment approximation of the residualspectrum obtained in 1
I have stepped through clam and SMS code and I think I can seewhere step 1 gets performed:
SMSAnalysisCore::Do()
{

mSinSpectralAnalysis.Do();
mResSpectralAnalysis.Do();
...
...
...
mSynthSineSpectrum.Do();
mSpecSubstracter.Do(); /* step 1 gets performed here I think*/

}
but I cannot find where step 2 (line approximation) getsperformed. Where should I look in the code?
Thank you very much,
Cheers,

Roumbaba

ps:

Here is a quote from the paper I mentionned above:

"Approximation of the Spectral Residual
Assuming the the residual signal is quasi-stochastic, eachmagnitude-spectrum residual can be approximated by itsenvelope since only its shape contributes to the soundcharacteristics. [...] The particular line-segmentapproximation performed here is done by stepping through themagnitude spectrum and finding local maxima in everysection, ..."
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel



_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel

Re: [Clam-devel] residual spectrum line segment approximation?

Reply via email to