[android-developers] sound recognition and FFT problem

Mik Wed, 02 May 2012 07:41:06 -0700

Dear all,
I am struggling a lot to get sound recognition working.
Essentially I would like to use the microphone and recognize when a
specific note has been played (as a guitar tuner).


I am trying to use GDX library for FFT: http://code.google.com/p/libgdx/

What is happening is that with the same frequency originated from a
signal generator, I don't get always the same results. And most of all
I get very different results from my ZTE Orange San Francisco versus a
Galaxy mobile (although i suspect the ZTE is the one to blame :-)

This is my code.
        private static final int RECORDER_SAMPLERATE = 44100;;
        private static final int RECORDER_CHANNELS =
AudioFormat.CHANNEL_CONFIGURATION_MONO;
        private static final int RECORDER_AUDIO_ENCODING =
AudioFormat.ENCODING_PCM_16BIT;

When I start the 'record' button I initialise the AudioRecorder and
start listening into a separate thread:
        private AudioRecord recorder = null;

        recorder = new AudioRecord(MediaRecorder.AudioSource.MIC,
                                                RECORDER_SAMPLERATE, 
RECORDER_CHANNELS,RECORDER_AUDIO_ENCODING,
bufferSize);

and the listening function follows:
        int MULTIPLIER = 1;     // <-- I increase this if I wish to
concatenate input buffers coming from microphone

        short[] data = new short[bufferSize];

        float[] fft_cpx;
        float[] tmpR, tmpI;
        float[] new_array = new float[bufferSize*MULTIPLIER];

        double[] real = new double[bufferSize*MULTIPLIER];
        double[] imag = new double[bufferSize*MULTIPLIER];
        double[] mag = new double[bufferSize*MULTIPLIER];

        Log.i(TAG, "Ready to go, buffer size: "+bufferSize);  //<--- it is
4096 when using 44Khz, but at each read only 2048 is retreived

        fft = new FFT(bufferSize*MULTIPLIER, RECORDER_SAMPLERATE);

        long t1, t2, t3, deltaT;

        while (isRecording) {
                t1 = System.currentTimeMillis();
                int index = 0;

                for (int j=0; j<MULTIPLIER; ++j) {
                        int n = recorder.read(data, 0, bufferSize);          // 
<--
it reads 2048 = n
                        Log.i(TAG, "Data read (j="+j+"): "+n);

                        for ( int z=0; z<n; ++z) {
                                new_array[index] = data[z];
                                ++index;
                        }
                }

                t2 = System.currentTimeMillis();
                deltaT = t2-t1;
                Log.i(TAG, "Data read in TIME: "+deltaT+" ("+t1+","+t2+")");
                Log.i(TAG, "Global index: "+index);

                fft.forward(new_array);
                fft_cpx = fft.getSpectrum();

                // Find Max and Max position in CPX

                tmpR = fft.getRealPart();
                tmpI = fft.getImaginaryPart();
                arrDet.peakValue = 0;                                  //arrDet
is a dummy structure where I store peak value and position


                for (int i = 0; i < new_array.length; i++) {
                        real[i] = (double) tmpR[i];
                        imag[i] = (double) tmpI[i];

                        mag[i] = Math.sqrt((real[i] * real[i]) + (imag[i] * 
imag[i]));

                        if ( mag[i] > arrDet.peakValue ) {
                                arrDet.peakValue = mag[i];
                                arrDet.position = i;
                        }
                }

                int FrequencyN = 0;
                if ( new_array.length!= 0 )
                        FrequencyN = RECORDER_SAMPLERATE * arrDet.position /
new_array.length; //bufferSize;
                int FrequencyBS = RECORDER_SAMPLERATE * arrDet.position /
bufferSize;

                Log.i(TAG, "@ F(N)is: "+ FrequencyN + "\tF (BS)is: "+FrequencyBS
+ "\tpos: "+arrDet.position );

    }

Results:
For example, if I get:

INPUT FREQUENCY = 1000
Log: Frequency (N) is 8010
@ 744 // <-- position

Or
INPUT FREQUENCY = 1000
Log: Frequency (N) is 3003
@ 279 // <-- position

Or
INPUT FREQUENCY = 1000
Log: Frequency (N) is 6007
@ 558 // <-- position



QUESTIONS:
1) any idea about what I am doing silly?

2) is it enough to read from the microphone only once? Or should I
rather concatenate buffers?
I am not noticing any more stability to be honest...

thanks for any help!
Mik

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] sound recognition and FFT problem

Reply via email to