Re: [R] Function hints
hadley wickham wrote: >> what I really would love to see would be an improved help.search(): >> on r-devel I found a reference to the /concept tag in .Rd files and the >> fact that it is rarely used (again: I was not aware of this :-( ...), >> which might serve as keyword container suitable for improving >> help.search() results. what about changing the syntax here to >> something like >> \concept { >> keyword = score, >> keyword = score >> ... >> } >> where score would be restricted to a small range of values (say, 1-3 or >> 1-5). if package maintainer then would choose a handful of sensible >> keywords (and scores) for a package and its functions one could expect >> improved search results. this might be a naive idea, but could a >> sort-by-relevance in the help.search() output profit from this? > > This is not something I think you can solve automatically. Good > keywording requries a lot of effort, and needs to be consistent to be > useful. The only way to achieve consistency is to have only person I was thinking of manual keywording (by the package authors, nobody else!) as a means to give the search engine (help.search()) reasonable information including a (subjective) relevance score for each keyword. of course, the problem is the same as with every (especially permuted) index: to find the best compromise betweeen indexing next to nothing and indexing everything (the best probably meaning to index comprehensively but not excessively with reasonable index terms) in the documents at hand. sure, consistency could not be enforced but it's not consistent right now, simply because the real \keyword tag is far to restrictive for indexing purposes(only a handful of predefined allowed keywords) and otherwise only the name/alias and title in the Rd files seem to be searched (and here the author must be really aware that these fields are at the moment the ones which should be forced to contain the relevant 'keywords' if the function is to be found by help.search -- this imposes sometimes rather artificial constraints on the wording, especially if you try to include some general keyword in the title of a very specialized function). looking at the example I gave (help.search("fitting") etc.) it's quite clear that `nls' simply is not found because 'fitting' does not occur in the title, but I trust, if asked to provide, say, three keywords, one of them would contain "fit" or "fitting". I mean, every scientific journal asks you to do just this: provide some free-text keywords, which you think to be relevant for the paper. there are no restrictions/directives, usually, but the purpose (to categorize the paper a bit) is served quite well. and maybe the \concept tag really is meant for something different, I'm not sure. what I have in mind really is similar to providing index terms (plus scores to guide `help.search' in sorting). to stay with the `nls' example: \concept { non-linear fitting = 4 non-linear least-squares = 5 non-linear models = 3 parameter estimimation = 2 gauss-newton = 1 } would probably achieve that `nls' usually is correctly found (if this syntax were allowed). apart from the scores (which would be nice, I think) my main point is that extensive use of \concept (or a new tag `\index', for instance, if \concept's purpose is actually different -- I'm not sure) should be pushed to get better hits from help.search(). I personally have decided to start using the \concept tag in its present form for our local .Rd files extensively to "inject" a sufficient number of free-text relevant keywords into help.search() joerg > keywording (difficult/expensive), or exhaustively document the process > of keywording and then require all package authors to read and use > (impossible). > > Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function hints
> what I really would love to see would be an improved help.search(): > on r-devel I found a reference to the /concept tag in .Rd files and the > fact that it is rarely used (again: I was not aware of this :-( ...), > which might serve as keyword container suitable for improving > help.search() results. what about changing the syntax here to something like > \concept { > keyword = score, > keyword = score > ... > } > where score would be restricted to a small range of values (say, 1-3 or > 1-5). if package maintainer then would choose a handful of sensible > keywords (and scores) for a package and its functions one could expect > improved search results. this might be a naive idea, but could a > sort-by-relevance in the help.search() output profit from this? This is not something I think you can solve automatically. Good keywording requries a lot of effort, and needs to be consistent to be useful. The only way to achieve consistency is to have only person keywording (difficult/expensive), or exhaustively document the process of keywording and then require all package authors to read and use (impossible). Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function hints
Jonathan Baron wrote: > On 06/19/06 13:13, Duncan Murdoch wrote: >>> `help.search' does not allow full text search in the manpages (I can >>> imagine why (1000 hits...), but without such a thing google, for >>> instance, would probably not be half as useful as it is, right?) and >>> there is no "sorting by relevance" in the `help.search' output, I think. >>> how this sorting could be achieved is a different question, of course. >> You probably want RSiteSearch("keyword", restrict="functions") (or even >> without the "restrict" part). > > Yes. The restrict part will speed things up quite a bit, if you > want to restrict to functions. > > Or, alternatively, you could use Namazu (which I use to generate > what RSiteSearch provides) to generate an index specific to your > own installed functions and packages. The trick is to cd to the > directory /usr/lib/R/library, or the equivalent, and then say > > mknmz -q */html > > which will pick up the html version of all the man pages > (assuming you have generated them, and I have no idea whether > this can be done on Windows). To update, say > > mknmz --update=. -q */html > > Then make a bookmark for the Namazu search page in your browser, > as a local file. (I haven't given all the details. You have to > install Namazu and follow the instructions.) > > Or, if you have a web server, you could let Google do it for > you. But, I warn you, Google will fill up your web logs pretty > fast if you don't exclude it with robots.txt. I don't let it > search my R stuff. > > I think that Macs and various Linux versions also have other > alternative built-in search capabilities, but I haven't tried > them. Beagle is the new Linux search tool, but I don't know what > it does. > > Jon thanks for theses tips. I was not aware of the `RSiteSearch' function (I did know of the existence of the web sites, though) and this helps, but of course this is depdendent on web access (off-line labtop usage...) and does not know of 'local' (non-CRAN) packages (and knows of maybe "too many" contributed packages, which I might not want to consider for one reason or the other) thanks also for the hint on `Namazu'. maybe I do as adviced to get a index which is aware of my local configuration and private packages. (under MacOS there is a very good and fast full text search engine, but it cannot be told to only search the R documentation, for instance, so one gets lots of other hits as well.) what I really would love to see would be an improved help.search(): on r-devel I found a reference to the /concept tag in .Rd files and the fact that it is rarely used (again: I was not aware of this :-( ...), which might serve as keyword container suitable for improving help.search() results. what about changing the syntax here to something like \concept { keyword = score, keyword = score ... } where score would be restricted to a small range of values (say, 1-3 or 1-5). if package maintainer then would choose a handful of sensible keywords (and scores) for a package and its functions one could expect improved search results. this might be a naive idea, but could a sort-by-relevance in the help.search() output profit from this? to make it short: I'm not happy with the output, for instance, of help.search("fitting") #1 vs. help.search("linear fitting")#2 vs. help.search("non-linear fitting")#3 I somehow feel that `lm' and `nls' should both be found in the first search and that they should be near the top of the lists when they are found. but `lm' is found only in #1 (near the bottom of the list) and `nls' not at all (which is really bad). this is partly a problem, of course, of inconsistent nomenclature in the manpages but also due to the fact that help.search() only accepts single phrases as pattern (and maybe the absense of "concept" keywords including a score?) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function hints
On 06/19/06 13:13, Duncan Murdoch wrote: > > `help.search' does not allow full text search in the manpages (I can > > imagine why (1000 hits...), but without such a thing google, for > > instance, would probably not be half as useful as it is, right?) and > > there is no "sorting by relevance" in the `help.search' output, I think. > > how this sorting could be achieved is a different question, of course. > > You probably want RSiteSearch("keyword", restrict="functions") (or even > without the "restrict" part). Yes. The restrict part will speed things up quite a bit, if you want to restrict to functions. Or, alternatively, you could use Namazu (which I use to generate what RSiteSearch provides) to generate an index specific to your own installed functions and packages. The trick is to cd to the directory /usr/lib/R/library, or the equivalent, and then say mknmz -q */html which will pick up the html version of all the man pages (assuming you have generated them, and I have no idea whether this can be done on Windows). To update, say mknmz --update=. -q */html Then make a bookmark for the Namazu search page in your browser, as a local file. (I haven't given all the details. You have to install Namazu and follow the instructions.) Or, if you have a web server, you could let Google do it for you. But, I warn you, Google will fill up your web logs pretty fast if you don't exclude it with robots.txt. I don't let it search my R stuff. I think that Macs and various Linux versions also have other alternative built-in search capabilities, but I haven't tried them. Beagle is the new Linux search tool, but I don't know what it does. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function hints
On 6/19/2006 12:14 PM, Joerg van den Hoff wrote: > just a feedback: that's a useful function, thank you. > > but the problem is probably more general: frequently I do not really > want to know what I generally can do with a data frame, for instance, > but rather I would like to use `help.search' as I would use, say, Google > (and with the same rate of success...). > but the actual `keywords' in the manpages seem insufficient and > `help.search' does not allow full text search in the manpages (I can > imagine why (1000 hits...), but without such a thing google, for > instance, would probably not be half as useful as it is, right?) and > there is no "sorting by relevance" in the `help.search' output, I think. > how this sorting could be achieved is a different question, of course. You probably want RSiteSearch("keyword", restrict="functions") (or even without the "restrict" part). Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Function hints
hadley wickham wrote: > One of the recurring themes in the recent UserR conference was that > many people find it difficult to find the functions they need for a > particular task. Sandy Weisberg suggested a small idea he would like > to see: a hints function that given an object, lists likely > operations. I've done my best to implement this function using the > tools currently available in R, and my code is included at the bottom > of this email (I hope that I haven't just duplicated something already > present in R). I think Sandy's idea is genuinely useful, even in the > limited form provided by my implementation, and I have already > discovered a few useful functions that I was unaware of. > > While developing and testing this function, I ran into a few problems > which, I think, represent underlying problems with the current > documentation system. These are typified by the results of running > hints on a object produced by glm (having class c("glm", "lm")). I > have outlined (very tersely) some possible solutions. Please note > that while these solutions are largely technological, the problem is > at heart sociological: writing documentation is no easier (and perhaps > much harder) than writing a scientific publication, but the rewards > are fewer. > > Problems: > > * Many functions share the same description (eg. head, tail). > Solution: each rdoc file should only describe one method. Problem: > Writing rdoc files is tedious, there is a lot of information > duplicated between the code and the documenation (eg. the usage > statement) and some functions share a lot of similar information. > Solution: make it easier to write documentation (eg. documentation > inline with code), and easier to include certain common descriptions > in multiple methods (eg. new include command) > > * It is difficult to tell which functions are commonly > used/important. Solution: break down by keywords. Problem: keywords > are not useful at the moment. Solution: make better list of keywords > available and encourage people to use it. Problem: people won't > unless there is a strong incentive, plus good keywording requires > considerable expertise (especially in bulding up list). This is > probably insoluable unless one person systematically keywords all of > the base packages. > > * Some functions aren't documented (eg. simulate.lm, formula.glm) - > typically, these are methods where the documentation is in the > generic. Solution: these methods should all be aliased to the generic > (by default?), and R CMD check should be amended to check for this > situation. You could also argue that this is a deficiency with my > function, and easily fixed by automatically referring to the generic > if the specific isn't documented. > > * It can't supply suggestions when there isn't an explicit method > (ie. .default is used), this makes it pretty useless for basic > vectors. This may not really be a problem, as all possible operations > are probably too numerous to list. > > * Provides full name for function, when best practice is to use > generic part only when calling function. However, getting precise > documentation may requires that full name. I do the best I can > (returning the generic if specific is alias to a documentation file > with the same method name), but this reflects a deeper problem that > the name you should use when calling a function may be different to > the name you use to get documentation. > > * Can only display methods from currently loaded packages. This is a > shortcoming of the methods function, but I suspect it is difficult to > find S3 methods without loading a package. > > Relatively trivial problems: > > * Needs wide display to be effective. Could be dealt with by > breaking description in a sensible manner (there may already by R code > to do this. Please let me know if you know of any) > > * Doesn't currently include S4 methods. Solution: add some more code > to wrap showMethods > > * Personally, I think sentence case is more aesthetically pleasing > (and more flexible) than title case. > > > Hadley > > > hints <- function(x) { > db <- eval(utils:::.hsearch_db()) > if (is.null(db)) { > help.search("abcd!", rebuild=TRUE, agrep=FALSE) > db <- eval(utils:::.hsearch_db()) > } > > base <- db$Base > alias <- db$Aliases > key <- db$Keywords > > m <- all.methods(class=class(x)) > m_id <- alias[match(m, alias[,1]), 2] > keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1]) > > f.names <- cbind(m, base[match(m_id, base[,3]), 4]) > f.names <- unlist(lapply(1:nrow(f.names), function(i) { > if (is.na(f.names[i, 2])) return(f.names[i, 1]) > a <- methodsplit(f.names[i, 1]) > b <- methodsplit(f.names[i, 2]) > > if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1] > })) > >
[R] Function hints
One of the recurring themes in the recent UserR conference was that many people find it difficult to find the functions they need for a particular task. Sandy Weisberg suggested a small idea he would like to see: a hints function that given an object, lists likely operations. I've done my best to implement this function using the tools currently available in R, and my code is included at the bottom of this email (I hope that I haven't just duplicated something already present in R). I think Sandy's idea is genuinely useful, even in the limited form provided by my implementation, and I have already discovered a few useful functions that I was unaware of. While developing and testing this function, I ran into a few problems which, I think, represent underlying problems with the current documentation system. These are typified by the results of running hints on a object produced by glm (having class c("glm", "lm")). I have outlined (very tersely) some possible solutions. Please note that while these solutions are largely technological, the problem is at heart sociological: writing documentation is no easier (and perhaps much harder) than writing a scientific publication, but the rewards are fewer. Problems: * Many functions share the same description (eg. head, tail). Solution: each rdoc file should only describe one method. Problem: Writing rdoc files is tedious, there is a lot of information duplicated between the code and the documenation (eg. the usage statement) and some functions share a lot of similar information. Solution: make it easier to write documentation (eg. documentation inline with code), and easier to include certain common descriptions in multiple methods (eg. new include command) * It is difficult to tell which functions are commonly used/important. Solution: break down by keywords. Problem: keywords are not useful at the moment. Solution: make better list of keywords available and encourage people to use it. Problem: people won't unless there is a strong incentive, plus good keywording requires considerable expertise (especially in bulding up list). This is probably insoluable unless one person systematically keywords all of the base packages. * Some functions aren't documented (eg. simulate.lm, formula.glm) - typically, these are methods where the documentation is in the generic. Solution: these methods should all be aliased to the generic (by default?), and R CMD check should be amended to check for this situation. You could also argue that this is a deficiency with my function, and easily fixed by automatically referring to the generic if the specific isn't documented. * It can't supply suggestions when there isn't an explicit method (ie. .default is used), this makes it pretty useless for basic vectors. This may not really be a problem, as all possible operations are probably too numerous to list. * Provides full name for function, when best practice is to use generic part only when calling function. However, getting precise documentation may requires that full name. I do the best I can (returning the generic if specific is alias to a documentation file with the same method name), but this reflects a deeper problem that the name you should use when calling a function may be different to the name you use to get documentation. * Can only display methods from currently loaded packages. This is a shortcoming of the methods function, but I suspect it is difficult to find S3 methods without loading a package. Relatively trivial problems: * Needs wide display to be effective. Could be dealt with by breaking description in a sensible manner (there may already by R code to do this. Please let me know if you know of any) * Doesn't currently include S4 methods. Solution: add some more code to wrap showMethods * Personally, I think sentence case is more aesthetically pleasing (and more flexible) than title case. Hadley hints <- function(x) { db <- eval(utils:::.hsearch_db()) if (is.null(db)) { help.search("abcd!", rebuild=TRUE, agrep=FALSE) db <- eval(utils:::.hsearch_db()) } base <- db$Base alias <- db$Aliases key <- db$Keywords m <- all.methods(class=class(x)) m_id <- alias[match(m, alias[,1]), 2] keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1]) f.names <- cbind(m, base[match(m_id, base[,3]), 4]) f.names <- unlist(lapply(1:nrow(f.names), function(i) { if (is.na(f.names[i, 2])) return(f.names[i, 1]) a <- methodsplit(f.names[i, 1]) b <- methodsplit(f.names[i, 2]) if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1] })) hints <- cbind(f.names, base[match(m_id, base[,3]), 5]) hints <- hints[order(tolower(hints[,1])),] hints <- rbind(c("", "---"), hints) rown