[FFmpeg-user] Increasing lags when using ffmpeg for streaming audio from microphone

2021-10-22 Thread Alex R
Hi everyone,

I'm attempting to use ffmpeg as a DIY baby audio monitor. Although I have a
working prototype, there is a delay that gets progressively greater.
The lag is of a few seconds in the beginning, but it reaches a few minutes
after several hours.

My set-up is:
- Raspberry Pi 2B with a USB microphone is the streaming server
- My client is usually VLC running on Android or on a computer elsewhere in
the house
- I only have one client at a time,
- but clients are different, depending on who watches the child (so the
solution to stream directly to a specific IP is not suitable)
- All the devices are in the same network
- ffmpeg is started by another process with these parameters:

ffmpeg -re -f alsa -i plughw:1,0 -vn -acodec libmp3lame -b:a 8k -ac 1 -ar
22050 -f mp3 -

The parent process continuously reads stdout and exposes the chunks over
HTTP. This makes it convenient, as the stream can be played in a browser
too. The tool that does it is micstream:
https://github.com/BlackLight/micstream/blob/main/micstream/server.py


I've tried tweaking the ffmpeg parameters and have gotten some small
improvements, by reducing the bit-rate, for example. However, I believe the
approach needs to be reviewed, because the delays still pile up over time.




Hypotheses I've had:
1. The client has a buffer of its own
However, VLC allows me to set the cache size by specifying a duration,
which is currently at 1000ms. I tried lower values too, but there was no
noticeable difference.

2. The hardware is not fast enough
I doubt it because next to the RasPi 2B streaming the microphone, I have a
RasPi ZeroW that streams video from a camera - it is very smooth, and the
delay is ~1s even after weeks of uptime.
Further, if I inspect what ffmpeg writes to stdout, the text at the bottom
says `size=  343273kB time=97:38:43.52 bitrate=8.0kbits/s speed=   1x`,
which I suppose means that encoding in real-time works well.

3. The network itself
Although both RasPis mentioned above work over Wi-Fi and are in a remote
part of the house - the video stream works reliably. Also, if I disable the
video, audio still lags. Moreover, if I modify ffmpeg parameters to stream
directly to another address over RTP: `ffmpeg -re -f alsa -i plughw:1,0 -vn
-acodec libmp3lame -b:a 8k -ac 1 -ar 22050 -f rtp rtp://192.168.1.10:5002`
and on 192.168.1.10 (the receiver, not the same system as the streamer) I
run netcat to see what I'm getting `nc -u -l 5002` - I do see small chunks
of what appears to be mp3 headers and some payload arriving at regular
intervals, and there are no periods where these datagrams stop arriving.



I am wondering what else I could try to establish the root cause of the
problem and reduce the delays. Perhaps the culprit is `micstream`, I'd be
happy to replace it with something else that is known to work better.
However, on the Internet I found references to `ffserver` which is not
available anymore, while tutorials I've found for Apache and nginx are
tailored for video streaming.


Is the scenario I described feasible at all? What troubleshooting steps
could I try?


Best wishes to everyone, and I look forward to your feedback,
Alex
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-user] Advice on using silence removal

2021-09-18 Thread Alex R
Hi everyone,

Thank you for providing valuable feedback about silence removal last month.
For the benefit of future archaeologists, I summarize the steps I've taken
and the key elements of the solution. Note that while this worked for me, I
do not claim that this is the optimal approach.

- As Carl pointed out, don't normalize before silence removal. This is
obvious in retrospect, but I didn't think of it myself.
- The "compand" filter makes a substantial contribution to the quality of
the output.
- This article provides a clear, step by step explanation of how to use
this feature of ffmpeg; there are also illustrations that show how the
waveform changes after each step
https://medium.com/@jud.dagnall/dynamic-range-compression-for-audio-with-ffmpeg-and-compand-621fe2b1a892
- Use the mean volume as a threshold for the silence detector (in the past
I used the maximum value)

In case the site above is not available, here is a relevant excerpt:

```
ffmpeg -i in.mp3  -filter_complex
 "compand=attacks=0:points=-30/-900|-20/-20" out.wav

- attacks=0 means that I wanted to measure absolute volume, not averaging
the sound over a short (or long period of time)
- followed by points, which is a series of "from->to" mappings that are to
be interpreted as:
  - -30/-900, which means that volume below -30db in the original input
track gets converted to -900db (completely silent)
  - -20/-20 means that at -20db the volume remains unchanged
```



In practical terms, here are the steps I currently use in my noise gate
function:
1. cut the leading and trailing 200ms of the file (this is where I usually
had the sound of a click/tap when users begin/stop the recording)

2. use a combination of a high-pass and low-pass filter for the range 200
.. 4000 that should cover a typical human voice
ffmpeg -i out-02-trim-ex.wav -af "highpass=f=200, lowpass=f=4000"
out-03-range-filter.wav

3. apply the compand filter
ffmpeg -i out-03-range-filter.wav  -filter_complex
"compand=attacks=0:points=-30/-900|-20/-20" out-04-compand.wav

4. apply the silence removal filter
ffmpeg -i out-04-compand.wav -af
silenceremove=start_periods=1:start_duration=0:

 start_threshold=-6dB:start_silence=0.5,areverse,silenceremove=start_periods=1:

 start_duration=0:start_threshold=-6dB:start_silence=0.5,afade=t=in:st=0:
 d=0.3,areverse,afade=t=in:st=0:d=0.3 out-05-silence-fade.wav

Notes:
- the threshold of -6dB in the command line above is not hardcoded, but it
is the mean value as detected by `volumedetect`
- we remove silence from the beginning, then turn the signal around and
repeat the process, then turn it around again - such that both ends are
without silence

5. normalize it to the max value returned by `volumedetect`
ffmpeg -i out-05-silence-fade.wav -af "volume=18.2 dB" out-06-normalized.wav


Thanks again for your assistance, I greatly appreciate it. If anyone comes
up with refinements of the describe approach, please share your methodology.

Alex
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-user] Advice on using silence removal

2021-08-20 Thread Alex R
Hi everyone,

I am attempting to leverage ffmpeg in a project that involves recording
short audio clips. So far I have gotten some mixed results and I'd like to
tap into your collective knowledge to ensure my approach is sound.

Context:
- a person records an audio clip of themselves pronouncing a word (imagine
that you read aloud a flash-card that says "tree" or "helicopter")
- the recording is usually made on a mobile phone

The clip contains some silence at both ends, because there is a delay
between the moment the user presses the record button, the moment they
pronounce their word, and the moment they press "stop". Depending on the
device, there may also be an audible click in the beginning.

My objective is to trim the silence at both ends and apply fade-in/out to
soften the clicks, if any.

The challenges are:
- ffmpeg's silenceremove filter needs a threshold value, however,
- each user is in their own environment, with different levels of ambient
noise
- each device is unique in terms of sensitivity

Thus, I can achieve my desired result with one specific clip through trial
and error, tinkering with thresholds until I get what I need. But I cannot
figure out how to detect these thresholds automatically, such that I can
replicate the result with a broad range of users, environments and
recording devices.

Note that there is no expectation to produce perfect results that match the
quality of an audio recording studio, I'm more in the "rough, but good
enough for practical purposes" territory.

Having read the documentation and various forums, I put together this
pipeline (actual commands in the appendix):

1. run volumedetect to see what the maximum level is
1a. parse stdout to extract `max_volume`
2. normalize audio to `max_volume`
3. apply silenceremove with 
3a. for the beginning of the file
3b. invert the stream and run another silenceremove for the beginning
(which is actually the end)
3c. invert it back and save the output



What I read in the forums gave me the impression that we need step#2 such
that at step#3 we could say the threshold is 0. However, that is not the
case, I still had to find a reasonable threshold via trial and error.

After I found a value that produces a good result, I assumed that it might
be good enough for practical purposes and it would be OK to simply hardcode
it into my code as a magic number. However, on the next day I attempted to
replicate the results using the same recording device in the same room -
but this time ffmpeg would tell me the filtered stream is empty, nothing to
write. The environment wasn't 100% identical, since I'm not doing this in a
controlled lab, but most of the variables are the same, though perhaps the
windows were open and it was a different time of the day, so the baseline
noise level outside was somewhat different.

Clearly, my approach is not robust. I'd like to understand whether there
are any low-hanging fruits that I can try, or if I'm not on the right track.

I imagine that the solution I need would somehow determine the silence
threshold relative to the rest of the file, instead of using a "one fits
all" value. However I did not find such filters or analyzers in ffmpeg.


Your guidance will be greatly appreciated,
Alex




Appendix, pipeline commands

1. ffmpeg -i input.mp3 -af "volumedetect"  -f null /dev/null
here I parse stdout, looking for something like "[Parsed_volumedetect_0 @
0x559dbe815f00] max_volume: -15.9 dB"

2. ffmpeg -i input.mp3 -af "volume=15.9dB" out2-normalized.mp3

3. ffmpeg -i out2-normalized.mp3 -af
silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,afade=t=in:st=0:d=0.3,areverse,afade=t=in:st=0:d=0.3
out3-trimmed.mp3


An example of an input file is available at
railean.net/files/public-temp/in-fresh.mp3, after normalization you can
hear some church bells in the distance. I'm totally fine with them
remaining audible in the result, as long as the leading and trailing
silence is removed.
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".