Do you mean as a time-scaler or as a pitch-shifter?
WSOLA can and does work real-time in a pitch-shifter.  But a time-scaler can't 
be real-time whether it's WSOLA or a phase-vocoder.  Because a real-time 
process requires the output to process the input indefinitely without the input 
and output pointers colliding or diverting away from each other indefinitely. 

--r b-j                     r...@audioimagination.com
"Imagination is more important than knowledge."




-------- Original message --------
From: Alex Dashevski <alexd...@gmail.com> 
Date: 5/28/2018  10:22 PM  (GMT-08:00) 
To: robert bristow-johnson <r...@audioimagination.com>, 
music-dsp@music.columbia.edu 
Subject: Re: [music-dsp] WSOLA 

Hi,
I mean WSOLA on RealTime.  How can I proof to my instructor that it's not 
possible ?
Why do I need to do resampling ? Android sample and resample in the same 
frequency(in my case,48Khz). Maybe, do you mean to do a processing with 
8Khz(subsample) ?
I also want to achieve the high performance and minimum latency.
How can I proof to my instructor that correct way to implement is pitch 
shifting and not WSOLA  on RealTime?
Thanks,Alex   
2018-05-29 4:19 GMT+03:00 robert bristow-johnson <r...@audioimagination.com>:



---------------------------- Original Message ----------------------------

Subject: Re: [music-dsp] WSOLA

From: "Alex Dashevski" <alexd...@gmail.com>

Date: Sun, May 27, 2018 2:56 pm

To: philb...@mobileer.com

music-dsp@music.columbia.edu

--------------------------------------------------------------------------



> Hi,

>

> I don't understand your answer.

> I have already audio echo application on Android. Buffer size and Frequency

> sample infuence on latency.

> Could you explain me how implement WSOLA on Real-time ? It is a bit more

> difficult .

>yes WSOLA is a little difficult, but less difficult than a phase-vocoder. now, 
>when you say "WSOLA" and "Real-time" in the same breath, do you mean a pitch 
>shifter?  not a time-scaler, right?  because pitch shifting can be done 
>real-time,
but time-scaling has to be done with an input buffer (with some number of 
samples) getting made into a longer (more samples) or shorter (fewer samples) 
buffer with the same sample rate.  that can't be done on an operation the runs 
on indefinitely, even with a long throughput delay. 
eventually the input and output pointers will collide.but you can combine 
time-scaling and resampling (the latter is mathematically well defined) to get 
pitch shifting that can run on forever.  one operation increases the number of 
samples and the other reduces the number of samples
exactly in reciprocal proportion.  so the number of samples coming out every 
buffer of time is the same as the number going in.now the "S" in acronym stands 
for "Similarity", so you have to position the windows in the input waveform to 
be similar to the waveform in
the output.  the waveform in the first-half of the input window should match 
the similarity to the waveform in the last-half of the output window of the 
previous frame.  normally the frame hop is exactly half of the window width.  
and the window shape should be complementary like a
Hann window.i believe that 240 sample buffer in the Android is an input/output 
sample buffer for the media I/O.  you can't really do anything with that buffer 
except pull in input samples and push out output samples.  you will have to 
(using whatever programming environment one uses
to make Android apps)  allocate memory and create your own buffers to hold 
about 100 ms of sound.  in that buffer, you will use a technique called AMDF, 
ASDF, or autocorrelation to measure waveform similarity.  your input frame hop 
distance (which has both integer and fractional
parts) is the output frame hop size times the reciprocal of the time-stretch 
factor.  so, if you're time-stretching (instead of time-compressing), your 
input frame will advance more slowly than your output frame, that increases the 
number of samples.  but in that output buffer, you will
resample (interpolate) with a step-size that is time-stretch factor (do this 
only for output samples that have already been overlapped and added) thus 
reducing the final output number of samples back to the original number.   you 
will allow some jitter on the input window that is informed
by the result of the waveform similarity analysis.that's how you do WSOLA, as 
best as i understand it.--


r b-j                         r...@audioimagination.com



"Imagination is more important than knowledge."

    
_______________________________________________

dupswapdrop: music-dsp mailing list

music-dsp@music.columbia.edu

https://lists.columbia.edu/mailman/listinfo/music-dsp


_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to