I followed part of this interchange with interest. I would love to see very wide adoption and appreciation of AnnotationHub and what I will describe does not seem to constitute important obstacles to this, but I have to confess that aspects of the model and grammar are confusing to me.
I use "cache" mainly as a noun. And in computing applications, IMHO, a cache is something to be hidden far from the active interface. In AnnotationHub "cache" names an important function and a key datastructure for annotation archiving. What I understand is (2.1.40): ah = AnnotationHub() # creates object for file and database access, will update db if appropriate cache(ah) # will offer to acquire all available hub resources for local caching, upon decline will provide a named vector of paths > cache(ah) download 40503 resources? [y/n] n AH5086 AH5087 "/Users/stvjc/.AnnotationHub/5086" "/Users/stvjc/.AnnotationHub/5087" AH14108 AH15146 I am not sure this vector is going to get much use. Maybe a negative response should return NULL? The help page says cache(x)’ and ‘cache(x) <- value’: Adds (downloads) all resources in ‘x’, or removes all local resources corresponding to the records in ‘x’ from the cache. "download" seems like a reasonable name for part of this functionality. "cache<-" seems to be concerned mainly with deletion. I can certainly define private alternate terms for these tasks in my .Rprofile but I do think a closer correspondence of function name to action could pay off. On Tue, Sep 15, 2015 at 10:34 AM, Kasper Daniel Hansen < kasperdanielhan...@gmail.com> wrote: > On Tue, Sep 15, 2015 at 12:25 AM, Morgan, Martin < > martin.mor...@roswellpark.org> wrote: > > > Hi Kasper -- we'll try to act on these, but some comments / looking for > > clarification... > > > > > -----Original Message----- > > > From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf > Of > > > Kasper Daniel Hansen > > > Sent: Monday, September 14, 2015 10:45 PM > > > To: bioc-devel@r-project.org > > > Subject: [Bioc-devel] AnnotationHub: cleanup > > > > > > I currently have the `pleasure` of dealing with students who have > > problems > > > with installing AnnotationHub and/or downloading resources. Here are > > some > > > comments including some possible bug reports. > > > > I hope this is on the whole a positive experience, and we'll do what we > > can to make it better. > > > > Well, I love the package and I love it even more having prepared material > on it. And the people who complain is of course enriched for people who > have problems - no way to know if it just works for most people. > > And of course right now it is more troublesome since I prepared the class > using R-3.2.1 and then 3.2.2 was released just before we started and had > the http -> https change which is an obvious suspect when people have > download problems :) > > > 1) I think it is extremely dangerous that `cache(ahub)` starts by asking > to > > > > download all resources! May I suggest this only happens with a > specific > > > setting like `cache(ahub, download=TRUE)` or something similar. > > > > > > > > 2) `cache(ahub)` deletes all cached information, except the sqlite > > database. > > > Could we get a way to remove everything? > > > > > > 3) While I can understand the difference between cache and hubCache, I > > > would suggest that hubCache(ahub) = NULL removes all cached material > > > included the sqlite database. > > > > For each of the above the envisioned use case was that 'hub' is a > subset, > > eg., > > > > subhub = query(hub, c("homo", "ensembl", "81")) > > > > and the user wanted to manipulate all records in the sub-hub. > > cache(subhub) asks about the 'really download" if the size of the > (sub)hub > > is greater than hubOption("MAX_DOWNLOADS"), which by default is 10; it > > seems like asking is the same as requiring an argument? fileName(subhub) > > may be closer to what you're looking for...? the path to the file name, > or > > NA if It is not in the cache. > > > > For cache(subhub) = NULL it wouldn't make sense to delete 5 resources AND > > the sqlite file for the entire hub. > > > > The sqlite file can be discovered with dbfile(hub) / dbfile(subhub), and > > removed with file.remove(dbfile(subhub))). In some ways it wasn't > > envisioned that this manual manipulation would be a common use case (!). > > > Ok. Let me perhaps rephrase my wish list > 1) some easy way to reset the entire cache issue, with emphasis on easy. > This is most likely to be used by beginners. Who it's done, I don't care > to much about. And I suggest a heading in ?AnnotationHub called something > like "Flushing the cache" or something. > 2) It seems natural that there is a way (for problem reporting) to report > which resources are cached, which is (again) easy and does not involve > download. I don't care if it is cache() or some other name. > > > 4) It seems that AnnotationHub in the release version of Bioconductor > > > defaults to using https://. Wasn't full support for https:// > > introduced in R > > > 3.2.2; if so, it seems to be a critical bug that it is using https:// > > > > AnnotationHub uses httr::GET and ultimately curl::curl_fetch_disk rather > > than native R support, so what R does is not directly relevant. From > ?curl > > > > Drop-in replacement for base 'url' that supports https, ftps, > > gzip, deflate, etc. Default behavior is identical to 'url', but > > request can be fully configured by passing a custom 'handle'. > > > > So I wonder what the actual problem is? > > > > Interesting. Well, at least one user is behind a proxy and uses the tips > in ?download.file to set a proxy server. Perhaps that doesn't work with > httr? I don't know. But there are more than one person with problems. > > > 5) Perhaps it should be considered that the default hubCache path is > > > versioned, perhaps with Bioc version, perhaps with something else. > This > > > might cause problems for people running multiple versions of R. > > > > The data base is supposed to handle versioning, so if you've populated > the > > cache with Bioc 3.2 and are now accessing the cache with Bioc 3.1, only > the > > 3.1 resources are visible. The hope was to avoid multiple copies of these > > possibly large resources. > > > That sounds pretty nifty.. I was thinking re-design of the database issues. > > > > 6) I strongly suggest that the output printed when retrieving an > > > > AnnotationHub resource includes the download url. > > > > Ok something that's easy to do! Sometimes this will be cryptic (when the > > resource is cached in the AnnotationHub server, rather than being > retrieved > > from the original source) > > > Perhaps it should just say "loading from cache" > > > > > 7) If you run AnnotationHub without having GenomicRanges / rtracklayer > > > installed, it downloads the resource and then pangs out with an error. > > To me > > > it seems more natural to pang out with an error immediately, especially > > since > > > when it works, it appears from message printing that loading the > library > > > happens prior to download. > > > > I guess by 'run AnnotationHub' you mean retrieve a specific resource? > > > > The import recipes generally start by require()ing the necessary > > libraries. I spotted a couple of recipes that didn't follow this > convention > > (for 2bit and chain file resources from rtracklayer; none that involved > > GenomicRanges). Are there specific examples? > > > > As a test case I got a Windows virtual machine up and running, total clean, > and just did biocLite("AnnotationHub"). Then I picked two random > resources and tried to download them; one was a UCSC chain file and I don't > know the other one. In both cases I totally got a decent error message, > which I can fully understand. But looking at it with beginner eyes, I just > thought it was weird that the error on missing a library happened after > download. It's not a bit deal, but if you don't know what you're doing you > might get confused. > > Best, > Kasper > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel