Dear all, My co-authors and I would like to share our publication from IEEE Oceans. We use latent diffusion models as a method of data augmentation to improve binary classifier performance when deployed in new locations.
Abstract: Passive acoustic monitoring provides a reliable way to monitor marine mammal populations, but one deployment can yield terabytes of data, creating a bottleneck at the analysis stage. As a solution, supervised Convolutional Neural Networks (CNNs) have been extensively used to automatically detect cetacean calls. However, model generalization enabling satisfactory performances in new target domains is challenging. A common solution is to train a model for a target site. CNN training requires substantial amounts of data, and expert labeling for each target domain is a time-consuming task. Data augmentation methods can artificially extend the size of bioacoustic training datasets. Here, we leverage Stable Diffusion, a latent diffusion model, to create augmented spectrogram images as training inputs for a CNN that classifies a spectrogram image as either a fin whale (Balaenoptera physalus) call or background noise. We eliminate the need to label calls of interest in the target domain (here fin whale calls, in Antarctica and Bermuda) by creating training spectrograms comprised of fin whale calls from an existing labeled dataset (Long Island, NY) inserted into background noise from the target domains. We use Stable Diffusion to blend the call more seamlessly into the target background noise, using a canny edge map and inpainting mask to direct the diffusion process to run on the area around the call. Performances of models trained on inserted and Stable Diffusion-augmented spectrograms were tested on data from each target domain and compared to a baseline model trained on the Long, Island, NY dataset and applied to the target sites. Using Stable Diffusion augmented data resulted in an 11% increase in Area Under the Curve (AUC) compared to the baseline model. With the field of generative AI image models rapidly evolving, latent diffusion models are likely to become an efficient method of augmenting CNN datasets for bioacoustic detection and classification. The article can be found here: https://doi.org/10.1109/OCEANS55160.2024.10754260 Best, Dr. Dawn Parry Department of Natural Resources and the Environment, Cornell University K. Lisa Yang Center for Conservation Bioacoustics Cornell Lab of Ornithology, Cornell University Email: [email protected] Web: https://www.birds.cornell.edu/ccb/dawn-parry/
_______________________________________________ MARMAM mailing list [email protected] https://lists.uvic.ca/mailman/listinfo/marmam
