So I have *not* managed to correctly apply the bh92 window to my
modified residual spectrum and thus I have *not* eliminate phase
discontinuities at resynth time.
One thing i still do not understand is why SMS need odd analysis
window sizes and how I should handle this. I specify analysis window
size to be 1024 and internally it seems to become 1025 and my 1STF
frames are 513 in size. The fact that i do not understand that issue
might be one of the source of what I do not do right.
Here is where I am at so far. Any hint on what I do wrong or should
do otherwise is welcome of course:
- For testing purpose the only modification I do to the original 513
values of the noise spectrum is to randomize phases.
- Then I expand the 513 spectrum to a 1026 spectrum by an even
symetry across the 513.5 axis and complex conjugate of the last 513
values.
- Then I do a circular convolution of my 1026 spectrum with the FFT
of a 1026 bh92time window.
the way I compute the bh92 time window is (matlab code for now):
w1Length = 1026;
fConst=2*pi/(w1Length+1-1);
w1=[1:w1Length];
w1=.35875 -.48829*cos(fConst*w1)+.14128*cos(fConst*2*w1) -.01168*cos
(fConst*3*w1);
- When I check the real part (and the magnitude) of the ifft of the
resulting 1026 values spectrum resulting of the convolutiong, I do
see that the windowing worked and that the resulting time signal
smoothes to 0 at begining and end.
- Then I take the first 513 values of the resulting spectrum and
replace the corresponding 1STF frame in the original sdif analysis file
Still I get phase discontinuites in the resynth signal.
What am i missing?
Thanks,
Baba
On 15 juil. 08, at 14:55, Xavier Amatriain wrote:
Hi Roumbaba, and congrats for your progress!
You are right on the source of your problem: SMSSynthesis expects
your residual to come with an analysis window and if not things are
likely to mess up.
The lines that are "guilty" for that are around SMSSynthesis.cxx:252
http://clam.iua.upf.edu/doc/CLAM-doxygen/SMSSynthesis_8cxx-
source.html#l00252
First the peaks are synthesized into a sinusoidal spectrum. Then
the two spectrums are added. Already at that point the spectrums
are supposed to have the same analysis window (BH92) and size. The
effect of that window is undone in line 261 when the global
spectral synthesis is performed.
The issue here is that you need to guarantee that both spectrum
come from a similar place before adding them... The sinusoidal
peaks are reconstructed by convolving by the transform of the main
lobe of the window (BH92) but you are reconstructing the residual
in a different way. So.... you either apply the BH92 transform to
your spectrum or avoid doing that in the peak synthesis (and then
avoid multiplying by the inverse in the global spectral synthesis).
None of the two options are immediate but I'd say the first one
should be easier to work out.
Hope it helps... and if you get it to work don't forget to report
back.
roumbaba wrote:
Hello all and thanks again for your previous help,
So I have written some matlab script to perform noise spectrum
line segment approximation.
- As input the script takes an sdif file generated by analysis
with SMSConsole.
- It then reads all sdif frames, in particular the 1STF frames
containing the noise spectrums in complex form.
- It converts these complex spectrums into magPhase form
- It performs line segment approximation on the amplitudes.
To check the impact of the approximation on the quality of
resynthesis the script does the following:
- It reconstructs full noise magnitude spectrums from the line
approximations (by linear interpolation)
- It randomizes the phases
- It converts the new "smoothed" magPhase spectrums back to
complex spectrums
- It writes back the sdif file with these new "smoothed"
spectrums instead of the original raw noise spectrums.
Then I run SMSConsole to synthesize that sdif file with the exact
same parameters than for the original sdif file.
My problem is that the resulting synthesised noise sounds like
something is wrong in the synthesis overlap-add (like lots of
discontinuites in the resynthesis)
I think that this might be due to what is described in the Serra/
Smith 1990 CMJ paper concerning line segment approximation noise
resynthesis:
" ...Since the [new] phase spectrum used is not the result of an
analysis process (with windowing of a waveform, zero padding, and
FFT computation), the resulting signal does not tapper to 0 at the
boundaries. This is because a phase spectrum with random values
corresponds to a phase spectrum of a rectangular-windowed noise
waveform of size N. In order to succeed in the overlap-add
resynthesis (ie, to obtain smooth transitions between frames) we
need a smoothly windowed waveform of size M, where M is the
synthesis-window length. ....
"
So what might be happening is that by default SMSConsole assumes
that the 1STF frames are *NOT* line segment approximation and
therefore does *NOT* perform that last windowing at synthesis
time. I have gone a little bit through SMS/Clam code but I cannot
find where I can change this behavior or even if that is the
default behavior. Where shoud I look in the SMS/Clam code?
Thanks,
Roumbaba
On 27 mai 08, at 23:25, Xavier Amatriain wrote:
Hi Roumbaba,
In the paper you cite it says "you can", which does not mean "you
have to" :-) Doing an approximation of the residual model is indeed
an interesting thing to do, especially if you want to reduce the
amount of data in your transformed signal, however it is not a must.
Note that there are many other ways to model the residual apart
from the one mentioned in that paper.
So far, in CLAM we are using the residual as is, with no modeling
or approximation. The "only" downside is that the transformed
signal (SMS Data) is in fact larger than the original audio when
it could be much smaller with not much loss in quality. If for
whatever reason you do need to do the residual modeling you can
look at the SpectralEnvelopeExtract processing. This processing
generates a spectral approximation (spectrum in bpf format) but
from an array of peaks, it would not be hard to modify it to work
with an input spectrum.
X
roumbaba wrote:
Hi all,
I am trying to understand how the residual spectrum gets modeled
in clam/SMS. I have read the Serra/Smith 1990 CMJ paper and as I
understand it it describes two steps:
1- substract the harmonic spectrum from the original spectrum
2- perform a line-segment approximation of the residual spectrum
obtained in 1
I have stepped through clam and SMS code and I think I can see
where step 1 gets performed:
SMSAnalysisCore::Do()
{
mSinSpectralAnalysis.Do();
mResSpectralAnalysis.Do();
...
...
...
mSynthSineSpectrum.Do();
mSpecSubstracter.Do(); /* step 1 gets performed here I think*/
}
but I cannot find where step 2 (line approximation) gets
performed. Where should I look in the code?
Thank you very much,
Cheers,
Roumbaba
ps:
Here is a quote from the paper I mentionned above:
"Approximation of the Spectral Residual
Assuming the the residual signal is quasi-stochastic, each
magnitude-spectrum residual can be approximated by its envelope
since only its shape contributes to the sound characteristics.
[...] The particular line-segment approximation performed here
is done by stepping through the magnitude spectrum and finding
local maxima in every section, ..."
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/
clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/
clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/
clam-devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-
devel
_______________________________________________
Clam-devel mailing list
[email protected]
https://llistes.projectes.lafarga.org/cgi-bin/mailman/listinfo/clam-devel