Dear all,

My co-authors and I would like to share our publication from IEEE Oceans.
We use latent diffusion models as a method of data augmentation to improve
binary classifier performance when deployed in new locations.

Abstract: Passive acoustic monitoring provides a reliable way to monitor
marine mammal populations, but one deployment can yield terabytes of data,
creating a bottleneck at the analysis stage. As a solution, supervised
Convolutional Neural Networks (CNNs) have been extensively used to
automatically detect cetacean calls. However, model generalization enabling
satisfactory performances in new target domains is challenging. A common
solution is to train a model for a target site. CNN training requires
substantial amounts of data, and expert labeling for each target domain is
a time-consuming task. Data augmentation methods can artificially extend
the size of bioacoustic training datasets. Here, we leverage Stable
Diffusion, a latent diffusion model, to create augmented spectrogram images
as training inputs for a CNN that classifies a spectrogram image as either
a fin whale (Balaenoptera physalus) call or background noise. We eliminate
the need to label calls of interest in the target domain (here fin whale
calls, in Antarctica and Bermuda) by creating training spectrograms
comprised of fin whale calls from an existing labeled dataset (Long Island,
NY) inserted into background noise from the target domains. We use Stable
Diffusion to blend the call more seamlessly into the target background
noise, using a canny edge map and inpainting mask to direct the diffusion
process to run on the area around the call. Performances of models trained
on inserted and Stable Diffusion-augmented spectrograms were tested on data
from each target domain and compared to a baseline model trained on the
Long, Island, NY dataset and applied to the target sites. Using Stable
Diffusion augmented data resulted in an 11% increase in Area Under the
Curve (AUC) compared to the baseline model. With the field of generative AI
image models rapidly evolving, latent diffusion models are likely to become
an efficient method of augmenting CNN datasets for bioacoustic detection
and classification.

The article can be found here:
https://doi.org/10.1109/OCEANS55160.2024.10754260

Best,
Dr. Dawn Parry
Department of Natural Resources and the Environment, Cornell University
K. Lisa Yang Center for Conservation Bioacoustics
Cornell Lab of Ornithology, Cornell University

Email: [email protected]
Web: https://www.birds.cornell.edu/ccb/dawn-parry/
_______________________________________________
MARMAM mailing list
[email protected]
https://lists.uvic.ca/mailman/listinfo/marmam

Reply via email to