Re: [R] Function hints

2006-06-20 Thread Joerg van den Hoff
hadley wickham wrote:
>> what I really would love to see would be an improved help.search():
>> on r-devel I found a reference to the /concept tag in .Rd files and the
>> fact that it is rarely used (again: I was not aware of this :-( ...),
>> which might serve as keyword container suitable for improving
>> help.search() results. what about changing the syntax here to 
>> something like
>> \concept {
>> keyword = score,
>> keyword = score
>> ...
>> }
>> where score would be restricted to a small range of values (say, 1-3 or
>> 1-5). if package maintainer then would choose a handful of sensible
>> keywords (and scores) for a package and its functions one could expect
>> improved search results. this might be a naive idea, but could a
>> sort-by-relevance in the help.search() output profit from this?
> 
> This is not something I think you can solve automatically.  Good
> keywording requries a lot of effort, and needs to be consistent to be
> useful.  The only way to achieve consistency is to have only person
I was thinking of manual keywording (by the package authors, nobody 
else!) as a means to give the search engine (help.search()) reasonable 
information including a (subjective) relevance score for each keyword.
of course, the problem is the same as with every (especially permuted) 
index: to find the best compromise betweeen indexing next to nothing and 
indexing everything (the best probably meaning to index comprehensively 
but not excessively with reasonable index terms) in the documents at hand.
sure, consistency could not be enforced but it's not consistent right 
now, simply because the real \keyword tag is far to restrictive for 
indexing purposes(only a handful of predefined allowed keywords) and 
otherwise only the name/alias and title in the Rd files seem to be 
searched (and here the author must be really aware that these fields are 
at the moment the ones which should be forced to contain the relevant 
'keywords' if the function is to be found by help.search -- this imposes 
sometimes rather artificial constraints on the wording, especially if 
you try to include some general keyword in the title of a very 
specialized function).

looking at the example I gave
(help.search("fitting") etc.) it's quite clear that `nls' simply is not 
found because 'fitting' does not occur in the title, but I trust, if 
asked to provide, say, three keywords, one of them would contain "fit" 
or "fitting". I mean, every scientific journal asks you to do just this: 
provide some free-text keywords, which you think to be relevant for the 
paper. there are no restrictions/directives, usually, but the purpose 
(to categorize the paper a bit) is served quite well.

and maybe the \concept tag really is meant for something different, I'm 
not sure. what I have in mind really is similar to providing index terms 
(plus scores to guide `help.search' in sorting). to stay with the `nls' 
example:
\concept {
non-linear fitting = 4
non-linear least-squares = 5
non-linear models = 3
parameter estimimation = 2
gauss-newton = 1
}
would probably achieve that `nls' usually is correctly found (if this 
syntax were allowed). apart from the scores (which would be nice, I 
think) my main point is that extensive use of \concept (or a new tag 
`\index', for instance, if \concept's purpose is actually different -- 
I'm not sure) should be pushed to get better hits from help.search().

I personally have decided to start using the \concept tag in its present 
form for our local .Rd files extensively to "inject" a sufficient number 
of free-text relevant keywords into help.search()


joerg
> keywording (difficult/expensive), or exhaustively document the process
> of keywording and then require all package authors to read and use
> (impossible).
> 
> Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function hints

2006-06-20 Thread hadley wickham
> what I really would love to see would be an improved help.search():
> on r-devel I found a reference to the /concept tag in .Rd files and the
> fact that it is rarely used (again: I was not aware of this :-( ...),
> which might serve as keyword container suitable for improving
> help.search() results. what about changing the syntax here to something like
> \concept {
> keyword = score,
> keyword = score
> ...
> }
> where score would be restricted to a small range of values (say, 1-3 or
> 1-5). if package maintainer then would choose a handful of sensible
> keywords (and scores) for a package and its functions one could expect
> improved search results. this might be a naive idea, but could a
> sort-by-relevance in the help.search() output profit from this?

This is not something I think you can solve automatically.  Good
keywording requries a lot of effort, and needs to be consistent to be
useful.  The only way to achieve consistency is to have only person
keywording (difficult/expensive), or exhaustively document the process
of keywording and then require all package authors to read and use
(impossible).

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function hints

2006-06-20 Thread Joerg van den Hoff
Jonathan Baron wrote:
> On 06/19/06 13:13, Duncan Murdoch wrote:
>>> `help.search' does not allow full text search in the manpages (I can
>>> imagine why (1000 hits...), but without such a thing google, for
>>> instance, would probably not be half as useful as it is, right?) and
>>> there is no "sorting by relevance" in the `help.search' output, I think.
>>> how this sorting could be achieved is a different question, of course.
>> You probably want RSiteSearch("keyword", restrict="functions") (or even
>> without the "restrict" part).
> 
> Yes.  The restrict part will speed things up quite a bit, if you
> want to restrict to functions.
> 
> Or, alternatively, you could use Namazu (which I use to generate
> what RSiteSearch provides) to generate an index specific to your
> own installed functions and packages.  The trick is to cd to the
> directory /usr/lib/R/library, or the equivalent, and then say
> 
> mknmz -q */html
> 
> which will pick up the html version of all the man pages
> (assuming you have generated them, and I have no idea whether
> this can be done on Windows).  To update, say
> 
> mknmz --update=. -q */html
> 
> Then make a bookmark for the Namazu search page in your browser,
> as a local file.  (I haven't given all the details.  You have to
> install Namazu and follow the instructions.)
> 
> Or, if you have a web server, you could let Google do it for
> you.  But, I warn you, Google will fill up your web logs pretty
> fast if you don't exclude it with robots.txt.  I don't let it
> search my R stuff.
> 
> I think that Macs and various Linux versions also have other
> alternative built-in search capabilities, but I haven't tried
> them.  Beagle is the new Linux search tool, but I don't know what
> it does.
> 
> Jon

thanks for theses tips. I was not aware of the  `RSiteSearch' function 
(I did know of the existence of the web sites, though) and this helps, 
but of course this is depdendent on web access (off-line labtop 
usage...) and does not know of 'local' (non-CRAN) packages (and knows of 
maybe "too many" contributed packages, which I might not want to 
consider for one reason or the other)

thanks also for the hint on `Namazu'. maybe I do as adviced to get a 
index which is aware of my local configuration and private packages. 
(under MacOS there is a very good and fast full text search engine, but 
it cannot be told to only search the R documentation, for instance, so 
one gets lots of other hits as well.)

what I really would love to see would be an improved help.search():
on r-devel I found a reference to the /concept tag in .Rd files and the 
fact that it is rarely used (again: I was not aware of this :-( ...), 
which might serve as keyword container suitable for improving 
help.search() results. what about changing the syntax here to something like
\concept {
keyword = score,
keyword = score
...
}
where score would be restricted to a small range of values (say, 1-3 or 
1-5). if package maintainer then would choose a handful of sensible 
keywords (and scores) for a package and its functions one could expect 
improved search results. this might be a naive idea, but could a 
sort-by-relevance in the help.search() output profit from this?

to make it short: I'm not happy with the output, for instance, of
help.search("fitting")   #1
vs.
help.search("linear fitting")#2
vs.
help.search("non-linear fitting")#3
I somehow feel that `lm' and `nls' should both be found in the first 
search and that they should be near the top of the lists when they are 
found.

but `lm' is found only in #1 (near the bottom of the list) and `nls' not 
at all (which is really bad). this is partly a problem, of course, of 
inconsistent nomenclature in the manpages but also due to the fact that 
help.search() only accepts single phrases as pattern (and maybe the 
absense of "concept" keywords including a score?)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function hints

2006-06-19 Thread Jonathan Baron
On 06/19/06 13:13, Duncan Murdoch wrote:
> > `help.search' does not allow full text search in the manpages (I can
> > imagine why (1000 hits...), but without such a thing google, for
> > instance, would probably not be half as useful as it is, right?) and
> > there is no "sorting by relevance" in the `help.search' output, I think.
> > how this sorting could be achieved is a different question, of course.
> 
> You probably want RSiteSearch("keyword", restrict="functions") (or even
> without the "restrict" part).

Yes.  The restrict part will speed things up quite a bit, if you
want to restrict to functions.

Or, alternatively, you could use Namazu (which I use to generate
what RSiteSearch provides) to generate an index specific to your
own installed functions and packages.  The trick is to cd to the
directory /usr/lib/R/library, or the equivalent, and then say

mknmz -q */html

which will pick up the html version of all the man pages
(assuming you have generated them, and I have no idea whether
this can be done on Windows).  To update, say

mknmz --update=. -q */html

Then make a bookmark for the Namazu search page in your browser,
as a local file.  (I haven't given all the details.  You have to
install Namazu and follow the instructions.)

Or, if you have a web server, you could let Google do it for
you.  But, I warn you, Google will fill up your web logs pretty
fast if you don't exclude it with robots.txt.  I don't let it
search my R stuff.

I think that Macs and various Linux versions also have other
alternative built-in search capabilities, but I haven't tried
them.  Beagle is the new Linux search tool, but I don't know what
it does.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function hints

2006-06-19 Thread Duncan Murdoch
On 6/19/2006 12:14 PM, Joerg van den Hoff wrote:

> just a feedback: that's a useful function, thank you.
> 
> but the problem is probably more general: frequently I do not really 
> want to know what I generally can do with a data frame, for instance, 
> but rather I would like to use `help.search' as I would use, say, Google 
> (and with the same rate of success...).
> but the actual `keywords' in the manpages seem insufficient and 
> `help.search' does not allow full text search in the manpages (I can 
> imagine why (1000 hits...), but without such a thing google, for 
> instance, would probably not be half as useful as it is, right?) and 
> there is no "sorting by relevance" in the `help.search' output, I think. 
> how this sorting could be achieved is a different question, of course.

You probably want RSiteSearch("keyword", restrict="functions") (or even 
without the "restrict" part).

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function hints

2006-06-19 Thread Joerg van den Hoff
hadley wickham wrote:
> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task.  Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations.  I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R).  I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> 
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system.  These are typified by the results of running
> hints on a object produced by glm (having class c("glm", "lm")).  I
> have outlined (very tersely) some possible solutions.  Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> 
> Problems:
> 
>  * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)
> 
>  * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment.  Solution:  make better list of keywords
> available and encourage people to use it.  Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list).  This is
> probably insoluable unless one person systematically keywords all of
> the base packages.
> 
>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic.  Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation.  You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.
> 
>  * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors.  This may not really be a problem, as all possible operations
> are probably too numerous to list.
> 
>  * Provides full name for function, when best practice is to use
> generic part only when calling function.  However, getting precise
> documentation may requires that full name.  I do the best I can
> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> 
>  * Can only display methods from currently loaded packages.  This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> 
> Relatively trivial problems:
> 
>  * Needs wide display to be effective.  Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this.  Please let me know if you know of any)
> 
>  * Doesn't currently include S4 methods.  Solution: add some more code
> to wrap showMethods
> 
>  * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.
> 
> 
> Hadley
> 
> 
> hints <- function(x) {
>   db <- eval(utils:::.hsearch_db())
>   if (is.null(db)) {
>   help.search("abcd!", rebuild=TRUE, agrep=FALSE)
>   db <- eval(utils:::.hsearch_db())
>   }
> 
>   base <- db$Base
>   alias <- db$Aliases
>   key <- db$Keywords
> 
>   m <- all.methods(class=class(x))
>   m_id <- alias[match(m, alias[,1]), 2]
>   keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> 
>   f.names <- cbind(m, base[match(m_id, base[,3]), 4])
>   f.names <- unlist(lapply(1:nrow(f.names), function(i) {
>   if (is.na(f.names[i, 2])) return(f.names[i, 1])
>   a <- methodsplit(f.names[i, 1])
>   b <- methodsplit(f.names[i, 2])
>   
>   if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]  
>   }))
>   
>

[R] Function hints

2006-06-19 Thread hadley wickham
One of the recurring themes in the recent UserR conference was that
many people find it difficult to find the functions they need for a
particular task.  Sandy Weisberg suggested a small idea he would like
to see: a hints function that given an object, lists likely
operations.  I've done my best to implement this function using the
tools currently available in R, and my code is included at the bottom
of this email (I hope that I haven't just duplicated something already
present in R).  I think Sandy's idea is genuinely useful, even in the
limited form provided by my implementation, and I have already
discovered a few useful functions that I was unaware of.

While developing and testing this function, I ran into a few problems
which, I think, represent underlying problems with the current
documentation system.  These are typified by the results of running
hints on a object produced by glm (having class c("glm", "lm")).  I
have outlined (very tersely) some possible solutions.  Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier (and perhaps
much harder) than writing a scientific publication, but the rewards
are fewer.

Problems:

 * Many functions share the same description (eg. head, tail).
Solution: each rdoc file should only describe one method. Problem:
Writing rdoc files is tedious, there is a lot of information
duplicated between the code and the documenation (eg. the usage
statement) and some functions share a lot of similar information.
Solution: make it easier to write documentation (eg. documentation
inline with code), and easier to include certain common descriptions
in multiple methods (eg. new include command)

 * It is difficult to tell which functions are commonly
used/important. Solution: break down by keywords. Problem: keywords
are not useful at the moment.  Solution:  make better list of keywords
available and encourage people to use it.  Problem: people won't
unless there is a strong incentive, plus good keywording requires
considerable expertise (especially in bulding up list).  This is
probably insoluable unless one person systematically keywords all of
the base packages.

 * Some functions aren't documented (eg. simulate.lm, formula.glm) -
typically, these are methods where the documentation is in the
generic.  Solution: these methods should all be aliased to the generic
(by default?), and R CMD check should be amended to check for this
situation.  You could also argue that this is a deficiency with my
function, and easily fixed by automatically referring to the generic
if the specific isn't documented.

 * It can't supply suggestions when there isn't an explicit method
(ie. .default is used), this makes it pretty useless for basic
vectors.  This may not really be a problem, as all possible operations
are probably too numerous to list.

 * Provides full name for function, when best practice is to use
generic part only when calling function.  However, getting precise
documentation may requires that full name.  I do the best I can
(returning the generic if specific is alias to a documentation file
with the same method name), but this reflects a deeper problem that
the name you should use when calling a function may be different to
the name you use to get documentation.

 * Can only display methods from currently loaded packages.  This is a
shortcoming of the methods function, but I suspect it is difficult to
find S3 methods without loading a package.

Relatively trivial problems:

 * Needs wide display to be effective.  Could be dealt with by
breaking description in a sensible manner (there may already by R code
to do this.  Please let me know if you know of any)

 * Doesn't currently include S4 methods.  Solution: add some more code
to wrap showMethods

 * Personally, I think sentence case is more aesthetically pleasing
(and more flexible) than title case.


Hadley


hints <- function(x) {
db <- eval(utils:::.hsearch_db())
if (is.null(db)) {
help.search("abcd!", rebuild=TRUE, agrep=FALSE)
db <- eval(utils:::.hsearch_db())
}

base <- db$Base
alias <- db$Aliases
key <- db$Keywords

m <- all.methods(class=class(x))
m_id <- alias[match(m, alias[,1]), 2]
keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])

f.names <- cbind(m, base[match(m_id, base[,3]), 4])
f.names <- unlist(lapply(1:nrow(f.names), function(i) {
if (is.na(f.names[i, 2])) return(f.names[i, 1])
a <- methodsplit(f.names[i, 1])
b <- methodsplit(f.names[i, 2])

if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]  
}))

hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
hints <- hints[order(tolower(hints[,1])),]
hints <- rbind(c("", "---"), hints)
rown