I will have to look at how offline changes the loading of the files. That is an odd and unexpected behavior.
They aren't actually duplicate files, what is happening is it is displaying the entry for the bam file (.bam) and the index file (.bai) as separate entries when offline instead of associating them as one entry. I'll investigate more. Lori Shepherd - Kern Bioconductor Core Team Roswell Park Comprehensive Cancer Center Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 ________________________________ From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Robert Castelo <robert.cast...@upf.edu> Sent: Thursday, April 4, 2024 2:40 PM To: bioc-devel@r-project.org <bioc-devel@r-project.org> Subject: [Bioc-devel] duplicated entries with 'ExperimentHub(localHub=TRUE)' hi, I'm getting duplicated entries when loading **offline** previously cached ExperimentHub resources. This code reproduces the problem: 1. If in a fresh empty cache of ExperimentHub I download 9 resources through the gDNAinRNAseqData package: library(gDNAinRNAseqData) bamfiles <- LiYu22subsetBAMfiles() length(bamfiles) [1] 9 2. Try to load them again from the local cache either going offline or using the 'offline=TRUE' argument to the loader function, which sets 'localHub=TRUE' in the call to 'ExperimentHub()': bamfiles <- LiYu22subsetBAMfiles(offline=TRUE) Using 'localHub=TRUE' If offline, please also see BiocManager vignette section on offline use snapshotDate(): 2024-04-02 see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for documentation loading from cache [...] length(bamfiles) [1] 18 3. If I examine the resources offline directly with 'ExperimentHub()' I see them duplicated with some IDs getting a '.1' suffix: library(ExperimentHub) eh <- ExperimentHub(localHub=TRUE) Using 'localHub=TRUE' If offline, please also see BiocManager vignette section on offline use snapshotDate(): 2024-04-02 length(eh) [1] 18 eh ExperimentHub with 18 records # snapshotDate(): 2024-04-02 # $dataprovider: NGDC # $species: Homo sapiens # $rdataclass: BamFile # additional mcols(): taxonomyid, genome, description, # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags, # rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[["EH8079"]]' EH8079 | EH8079.1 | EH8080 | EH8080.1 | EH8081 | ... EH8085.1 | EH8086 | EH8086.1 | EH8087 | EH8087.1 | title EH8079 RNA-seq data BAM file subset of HRR589632 contaminated with 0% gDNA EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with 0% gDNA EH8080 RNA-seq data BAM file subset of HRR589633 contaminated with 0% gDNA EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with 0% gDNA EH8081 RNA-seq data BAM file subset of HRR589634 contaminated with 0% gDNA ... ... EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with 10% ... EH8086 RNA-seq data BAM file subset of HRR589624 contaminated with 10% ... EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with 10% ... EH8087 RNA-seq data BAM file subset of HRR589625 contaminated with 10% ... EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with 10% ... Does anybody have an idea what might be going on with 'ExperimentHub(localHub=TRUE)'? Thanks! robert. _______________________________________________ Bioc-devel@r-project.org mailing list https://secure-web.cisco.com/1H0voxA7oQ0saDcNCWmRZwr1H6rkyUr0Fu4Ru-hZrq5GY1ay-R4ltvl_raeo94HUjjlKMox7wMWOkNHrqW28aJmsFXxCkYVatvRWHo5X5Pwpy3KKZLPxRybRw-xB-pjeKV38ia8MSC3_WURYilKunRSCMrcU8O0rBmThSR5Zip-TpfdAvp5oTkjIvudwgfsDPkVYxWwfoZIAFgRMj1x0D6yNG-HAsH5z4ejKrUklBnDvDPDK60h8e8HX0O31gA3pKSQYcN4v71RUYobDgAeciTZJwFe7PVneGo5q2nBuXNIhkwzKebrB5H9_O2At40PjQ9NOAKYCnl4N532p-NNGkHw/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel