Hi Kasper -- we'll try to act on these, but some comments / looking for 
clarification...

> -----Original Message-----
> From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf Of
> Kasper Daniel Hansen
> Sent: Monday, September 14, 2015 10:45 PM
> To: bioc-devel@r-project.org
> Subject: [Bioc-devel] AnnotationHub: cleanup
> 
> I currently have the `pleasure` of dealing with students who have problems
> with installing AnnotationHub and/or downloading resources.  Here are some
> comments including some possible bug reports.

I hope this is on the whole a positive experience, and we'll do what we can to 
make it better.

> 
> 1) I think it is extremely dangerous that `cache(ahub)` starts by asking to
> download all resources!  May I suggest this only happens with a specific
> setting like `cache(ahub, download=TRUE)` or something similar.

> 
> 2) `cache(ahub)` deletes all cached information, except the sqlite database.
> Could we get a way to remove everything?
> 
> 3) While I can understand the difference between cache and hubCache, I
> would suggest that hubCache(ahub) = NULL removes all cached material
> included the sqlite database.

For each of the above the envisioned use case was that  'hub' is a subset, eg.,

  subhub = query(hub, c("homo", "ensembl", "81"))

and the user wanted to manipulate all records in the sub-hub. cache(subhub) 
asks about the 'really download" if the size of the (sub)hub is greater than 
hubOption("MAX_DOWNLOADS"), which by default is 10; it seems like asking is the 
same as requiring an argument? fileName(subhub) may be closer to what you're 
looking for...? the path to the file name, or NA if It is not in the cache.

For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND the 
sqlite file for the entire hub.

The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and 
removed with file.remove(dbfile(subhub))). In some ways it wasn't envisioned 
that this manual manipulation would be a common use case (!).

> 
> 4) It seems that AnnotationHub in the release version of Bioconductor
> defaults to using https://.  Wasn't full support for https:// introduced in R
> 3.2.2; if so, it seems to be a critical bug that it is using https://

AnnotationHub uses httr::GET and ultimately curl::curl_fetch_disk rather than 
native R support, so what R does is not directly relevant. From ?curl

     Drop-in replacement for base 'url' that supports https, ftps,
     gzip, deflate, etc. Default behavior is identical to 'url', but
     request can be fully configured by passing a custom 'handle'.

So I wonder what the actual problem is?

> 5) Perhaps it should be considered that the default hubCache path is
> versioned, perhaps with Bioc version, perhaps with something else.  This
> might cause problems for people running multiple versions of R.

The data base is supposed to handle versioning, so if you've populated the 
cache with Bioc 3.2 and are now accessing the cache with Bioc 3.1, only the 3.1 
resources are visible. The hope was to avoid multiple copies of these possibly 
large resources.

> 6) I strongly suggest that the output printed when retrieving an
> AnnotationHub resource includes the download url.

Ok something that's easy to do! Sometimes this will be cryptic (when the 
resource is cached in the AnnotationHub server, rather than being retrieved 
from the original source)

> 7) If you run AnnotationHub without having GenomicRanges / rtracklayer
> installed, it downloads the resource and then pangs out with an error.  To me
> it seems more natural to pang out with an error immediately, especially since
> when it works, it appears from message printing that loading the library
> happens prior to download.

I guess by 'run AnnotationHub' you mean retrieve a specific resource?

The import recipes generally start by require()ing the necessary libraries. I 
spotted a couple of recipes that didn't follow this convention (for 2bit and 
chain file resources from rtracklayer; none that involved GenomicRanges). Are 
there specific examples?

Martin

> 
> Best,
> Kasper
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to