Dear all, I am struggling a lot to get sound recognition working. Essentially I would like to use the microphone and recognize when a specific note has been played (as a guitar tuner).
I am trying to use GDX library for FFT: http://code.google.com/p/libgdx/ What is happening is that with the same frequency originated from a signal generator, I don't get always the same results. And most of all I get very different results from my ZTE Orange San Francisco versus a Galaxy mobile (although i suspect the ZTE is the one to blame :-) This is my code. private static final int RECORDER_SAMPLERATE = 44100;; private static final int RECORDER_CHANNELS = AudioFormat.CHANNEL_CONFIGURATION_MONO; private static final int RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT; When I start the 'record' button I initialise the AudioRecorder and start listening into a separate thread: private AudioRecord recorder = null; recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, RECORDER_SAMPLERATE, RECORDER_CHANNELS,RECORDER_AUDIO_ENCODING, bufferSize); and the listening function follows: int MULTIPLIER = 1; // <-- I increase this if I wish to concatenate input buffers coming from microphone short[] data = new short[bufferSize]; float[] fft_cpx; float[] tmpR, tmpI; float[] new_array = new float[bufferSize*MULTIPLIER]; double[] real = new double[bufferSize*MULTIPLIER]; double[] imag = new double[bufferSize*MULTIPLIER]; double[] mag = new double[bufferSize*MULTIPLIER]; Log.i(TAG, "Ready to go, buffer size: "+bufferSize); //<--- it is 4096 when using 44Khz, but at each read only 2048 is retreived fft = new FFT(bufferSize*MULTIPLIER, RECORDER_SAMPLERATE); long t1, t2, t3, deltaT; while (isRecording) { t1 = System.currentTimeMillis(); int index = 0; for (int j=0; j<MULTIPLIER; ++j) { int n = recorder.read(data, 0, bufferSize); // <-- it reads 2048 = n Log.i(TAG, "Data read (j="+j+"): "+n); for ( int z=0; z<n; ++z) { new_array[index] = data[z]; ++index; } } t2 = System.currentTimeMillis(); deltaT = t2-t1; Log.i(TAG, "Data read in TIME: "+deltaT+" ("+t1+","+t2+")"); Log.i(TAG, "Global index: "+index); fft.forward(new_array); fft_cpx = fft.getSpectrum(); // Find Max and Max position in CPX tmpR = fft.getRealPart(); tmpI = fft.getImaginaryPart(); arrDet.peakValue = 0; //arrDet is a dummy structure where I store peak value and position for (int i = 0; i < new_array.length; i++) { real[i] = (double) tmpR[i]; imag[i] = (double) tmpI[i]; mag[i] = Math.sqrt((real[i] * real[i]) + (imag[i] * imag[i])); if ( mag[i] > arrDet.peakValue ) { arrDet.peakValue = mag[i]; arrDet.position = i; } } int FrequencyN = 0; if ( new_array.length!= 0 ) FrequencyN = RECORDER_SAMPLERATE * arrDet.position / new_array.length; //bufferSize; int FrequencyBS = RECORDER_SAMPLERATE * arrDet.position / bufferSize; Log.i(TAG, "@ F(N)is: "+ FrequencyN + "\tF (BS)is: "+FrequencyBS + "\tpos: "+arrDet.position ); } Results: For example, if I get: INPUT FREQUENCY = 1000 Log: Frequency (N) is 8010 @ 744 // <-- position Or INPUT FREQUENCY = 1000 Log: Frequency (N) is 3003 @ 279 // <-- position Or INPUT FREQUENCY = 1000 Log: Frequency (N) is 6007 @ 558 // <-- position QUESTIONS: 1) any idea about what I am doing silly? 2) is it enough to read from the microphone only once? Or should I rather concatenate buffers? I am not noticing any more stability to be honest... thanks for any help! Mik -- You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en