hi,

I'm getting duplicated entries when loading **offline** previously cached ExperimentHub resources. This code reproduces the problem:

1. If in a fresh empty cache of ExperimentHub I download 9 resources through the gDNAinRNAseqData package:

library(gDNAinRNAseqData)

bamfiles <- LiYu22subsetBAMfiles()
length(bamfiles)
[1] 9

2. Try to load them again from the local cache either going offline or using the 'offline=TRUE' argument to the loader function, which sets 'localHub=TRUE' in the call to 'ExperimentHub()':

bamfiles <- LiYu22subsetBAMfiles(offline=TRUE)
Using 'localHub=TRUE'
  If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for documentation
loading from cache
[...]

length(bamfiles)
[1] 18

3. If I examine the resources offline directly with 'ExperimentHub()' I see them duplicated with some IDs getting a '.1' suffix:

library(ExperimentHub)

eh <- ExperimentHub(localHub=TRUE)
Using 'localHub=TRUE'
  If offline, please also see BiocManager vignette section on offline use
snapshotDate(): 2024-04-02
length(eh)
[1] 18
eh
ExperimentHub with 18 records
# snapshotDate(): 2024-04-02
# $dataprovider: NGDC
# $species: Homo sapiens
# $rdataclass: BamFile
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH8079"]]'


  EH8079   |
  EH8079.1 |
  EH8080   |
  EH8080.1 |
  EH8081   |
  ...
  EH8085.1 |
  EH8086   |
  EH8086.1 |
  EH8087   |
  EH8087.1 |
title
  EH8079   RNA-seq data BAM file subset of HRR589632 contaminated with 0% gDNA   EH8079.1 RNA-seq data BAM file subset of HRR589632 contaminated with 0% gDNA   EH8080   RNA-seq data BAM file subset of HRR589633 contaminated with 0% gDNA   EH8080.1 RNA-seq data BAM file subset of HRR589633 contaminated with 0% gDNA   EH8081   RNA-seq data BAM file subset of HRR589634 contaminated with 0% gDNA
  ... ...
  EH8085.1 RNA-seq data BAM file subset of HRR589623 contaminated with 10% ...   EH8086   RNA-seq data BAM file subset of HRR589624 contaminated with 10% ...   EH8086.1 RNA-seq data BAM file subset of HRR589624 contaminated with 10% ...   EH8087   RNA-seq data BAM file subset of HRR589625 contaminated with 10% ...   EH8087.1 RNA-seq data BAM file subset of HRR589625 contaminated with 10% ...

Does anybody have an idea what might be going on with 'ExperimentHub(localHub=TRUE)'?

Thanks!

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to