from:"\"Robert Gentleman\""

Re: [Rd] The default behaviour of a missing entry in an environment

2009-11-16 Thread Robert Gentleman

Hi,

On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch  wrote:
> On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:
>>
>> On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch 
>> wrote:
>>>
>>> On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
>>>>
>>>> Note that one should use inherits = FALSE argument on get and exists
>>>> to avoid returning objects from the parent, the parent of the parent,
>>>> etc.
>>>
>>> I disagree.  Normally you would want to receive those objects.  If you
>>> didn't, why didn't you set the parent of the environment to emptyenv()
>>> when
>>> you created it?
>>>
>>
>> $ does not look into the parent so if you are trying to get those
>> semantics you must use inherits = FALSE.
>
> Whoops, yes.  That's another complaint about $ on environments.

 That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.

  Robert

>
> Duncan Murdoch
>
>>
>>> x <- 3
>>> e <- new.env()
>>> "x" %in% names(e)
>>
>> [1] FALSE
>>>
>>> get("x", e) # oops
>>
>> [1] 3
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



-- 
Robert Gentleman
rgent...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Non-GPL packages for R

2009-09-11 Thread Robert Gentleman


Hi,

Peter Dalgaard wrote:

Prof. John C Nash wrote:

The responses to my posting yesterday seem to indicate more consensus
than I expected:


 Umm, I had thought that it was well established that responders need 
not represent the population being surveyed.  I doubt that there is 
consensus at the level you are suggesting (certainly I don't agree) and 
as Peter indicates below the issue is: what is maintainable with the 
resources we have, not what is the best solution given unlimited resources.


  Personally, I would like to see something that was a bit easier to 
deal with programmatically that indicated when a package was GPL (or 
Open source actually) compatible and when it is not.  This could then be 
used to write a decent function to identify suspect packages so that 
users would know when they should be concerned.


  It is also the case that things are not so simple, as dependencies 
can make a package unusable even if it is itself GPL-compatible.  This 
also makes the notion of some simple split into free and non-free (or 
what ever split you want) less trivial than is being suggested.


  Robert



1) CRAN should be restricted to GPL-equivalent licensed packages


GPL-_compatible_ would be the word. However, this is not what has been
done in the past. There are packages with "non-commercial use" licences,
and the survival package was among them for quite a while. As far as I
know, the CRAN policy has been to ensure only that redistribution is
legal and that whatever license is used is visible to the user. People
who have responded on the list do not necessarily speak for CRAN. In the
final analysis, the maintainers must decide what is maintainable.

The problem with Rdonlp2 seems to have been that the interface packages
claimed to be LGPL2 without the main copyright holder's consent (and it
seems that he cannot grant consent for reasons of TU-Darmstadt
policies). It is hard to safeguard agaist that sort of thing. CRAN
maintainers must assume that legalities have been cleared and accept the
license in good faith.

(Even within the Free Software world there are current issues with,
e.g., incompatibilities between GPL v.2 and v.3, and also with the
Eclipse license. Don't get me started...)


2) r-forge could be left "buyer beware" using DESCRIPTION information
3) We may want a specific repository for restricted packages (RANC?)

How to proceed? A short search on Rseek did not turn up a chain of
command for CRAN.

I'm prepared to help out with documentation etc. to move changes
forward. They are not, in my opinion, likely to cause a lot of trouble
for most users, and should simplify things over time.

JN

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] debug

2009-07-27 Thread Robert Gentleman


Hi,
  I just committed a change to R-devel so that if debug is called on an 
S3 generic function, all methods will also automatically have debug 
turned on for them (if they are dispatched to from the generic).


  I hope to be able to extend this to S4 and a few other cases that are 
currently not being handled over the the next few weeks.


  Please let me know if you have problems, or suggested improvements.

 Robert
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tabulate can accept NA values?

2009-07-20 Thread Robert Gentleman

should be in devel now, NAs are ignored (as are non-integers and things 
outside the nbin argument)


Martin Morgan wrote:

tabulate has

.C("R_tabulate", as.integer(bin), as.integer(length(bin)),
   as.integer(nbins), ans = integer(nbins), PACKAGE="base")$ans

The implementation of R_tabulate has

if(x[i] != R_NaInt && x[i] > 0 && x[i] <= *nbin)

and so copes with (silently drops) NA. Perhaps the .C could have
NAOK=TRUE? This is useful in apply'ing tabulate to the rows or columns
of a (large) matrix, where the work-around involves introducing some
artificial NA value (and consequently copying the matrix) outside the
range of tabulate's nbin argument.

Martin  


--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Can a function know what other function called it?

2009-05-23 Thread Robert Gentleman

Hi Kynn,

Kynn Jones wrote:
> Suppose function foo calls function bar.  Is there any way in which
> bar can find out the name of the function that called it, "foo"?

 essentially yes. You can find out about the call stack by using sys.calls and
sys.parents etc. The man page plus additional manuals should be sufficient, but
let us know if there are things that are not clear.

> 
> There are two generalization to this question that interest me.
> First, can this query go farther up the call stack?  I.e. if bar now
> calls baz, can baz find out the name of the function that called the
> function that called it, i.e. "foo"?  Second, what other information,

 yes - you can (at least currently) get access to the entire calling stack and
some manipulations can be performed.

> beside its name, can bar find about the environment where it was
> called?  E.g. can it find out the file name and line number of the

 there is no real concept of file and line number associated with a function
definition (nor need their even be a name - functions can be anonymous).

 If you want to map back to source files then I think that currently we do not
keep quite enough information when a function is sourced. Others may be able to
elaborate more (or correct my mistakes).  I think we currently store the actual
text for the body of the function so that it can be used for printing, but we
don't store a file name/location/line number or anything of that sort. It could
probably be added, but would be a lot of work, so it would need someone who
really wanted it to do that.

 However, you can find out lots of other things if you want.  Do note that while
 it is possible to determine which function initiated the call, it is not
necessarily possible to figure out which of the calls (if there is more than one
in the body of the function) is active.  R does not keep track of things in that
way. To be clear if foo looks like:

  foo <- function(x) {
bar(x)
x = sqrt(x)
bar(x)
  }
  and you have a breakpoint in bar, you could not (easily) distinguish which of
the two calls to bar was active. There is no line counter or anything of that
sort available.

 best wishes
   Robert

> function call?
> 
> Thanks!
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] step by step debugger in R?

2009-05-23 Thread Robert Gentleman

Hi,
  I stripped the cc's as I believe that all read this list.

Romain Francois wrote:
> [moving this to r-devel]
> 
> Robert Gentleman wrote:
>> Hi,
>>
>> Romain Francois wrote:
>>  
>>> Duncan Murdoch wrote:
>>>
>>>> On 5/22/2009 10:59 AM, Michael wrote:
>>>>  
>>>>> Really I think if there is a Visual Studio strength debugger, our
>>>>> collective time spent in developing R code will be greatly reduced.
>>>>> 
>>>> If someone who knows how to write a debugger plugin for Eclipse wants
>>>> to help, we could have that fairly easily.  All the infrastructure is
>>>> there; it's the UI part that's missing.
>>>>
>>>> Duncan Murdoch
>>>>   
>>> [I've copied Mark Bravington and Robert Gentleman to the list as they
>>> are likely to have views here, and I am not sure they monitor R-help]
>>>
>>> Hello,
>>>
>>> Making a front-end to debugging was one of the proposed google summer of
>>> code for this year [1], it was not retained eventually, but I am still
>>> motivated.
>>>
>>> Pretty much all infrastructure is there, and some work has been done
>>> __very recently__ in R's debugging internals (ability to step up). As I
>>> see it, the ability to call some sort of hook each time the debugger
>>> waits for input would make it much easier for someone to write
>>> 
>>
>>  I have still not come to an understanding of what this is supposed to
>> do? When
>> you have the browser prompt you can call any function or code you want
>> to. There
>> is no need for something special to allow you to do that.
>>   
> Sure. What I have in mind is something that gets __automatically__
> called, similar to the task callback but happening right before the user
> is given the browser prompt.

 I am trying to understand the scenario you have in mind. Is it that the user is
running R directly and your debugger is essentially a helper function that gets
updated etc as R runs?

 If so, then I don't think that works very well and given the constraints we
have with R I don't think it will be able to solve many of the problems that an
IDE should.  The hook you want will give you some functionality, but no where
near enough.

 Let me suggest instead that the IDE should be running the show. It should
initialize an instance of R, but it controls all communication and hence
controls what is rendered on the client side.  If that is what you mean by
embedding R, then yes that is what is needed. There is no way that I can see to
support most of the things that IDE type debuggers support without the IDE
controlling the communication with R.

 And if I am wrong about what your debugger will look like please let me know.

 best wishes
   Robert


> 
>>> front-ends. A recent post of mine (patch included) [2] on R-devel
>>> suggested a custom prompt for browser which would do the trick, but I
>>> now think that a hook would be more appropriate. Without something
>>> similar to that, there is no way that I know of for making a front-end,
>>> unless maybe if you embed R ... (please let me know how I am wrong)
>>> 
>>
>>  I think you are wrong. I can't see why it is needed. The external
>> debugger has
>> lots of options for handling debugging. It can rewrite code (see
>> examples in
>> trace for how John Chambers has done this to support tracing at a
>> location),
>> which is AFAIK a pretty standard approach to writing debuggers. It can
>> figure
>> out where the break point is (made a bit easier by allowing it to put
>> in pieces
>> of text in the call to browser).  These are things the internal
>> debugger can't do.
>>
>>   
> Thanks. I'll have another look into that.
> 
>>> There is also the debug package [3,4] which does __not__ work with R
>>> internals but rather works with instrumenting tricks at the R level.
>>> debug provides a tcl/tk front-end. It is my understanding that it does
>>> not work using R internals (do_browser, ...) because it was not possible
>>> at the time, and I believe this is still not possible today, but I might
>>> be wrong. I'd prefer to be wrong actually.
>>> 
>>
>>   I don't understand this statement. It has always been possible to
>> work with
>> the internal version - but one can also take the approach of rewriting
>> code.
>> There are some difficulties supporting all the operations that one
>> would

[Rd] a statement about package licenses

2009-05-01 Thread Robert Gentleman

We are writing on behalf of the R Foundation, to clarify our position on
the licenses under which developers may distribute R packages.
Readers should also see FAQ 2.11: this message is not legal advice,
which we never offer.  Readers should also be aware that besides
the R Foundation, R has many other copyright holders, listed in the
copyright notices in the source.  Each of those copyright holders may
have a different opinion on the issues discussed here.

We welcome packages that extend the capabilities of R, and believe
that their value to the community is increased if they can be offered
with open-source licenses.  At the same time, we have no desire to
discourage other license forms that developers feel are required. Of
course, such licenses as well as the contents of the package and the
way in which it is distributed must respect the rights of the copyright
holders and the terms of the R license.

When we think that a package is in violation of these rights, we
contact the author directly, and so far package authors have always agreed to
comply with our license (or convinced us that they are already in compliance).
We have no desire to be involved in legal actions---our interest is in providing
good software.  However, everyone should understand that there are conceivable
circumstances in which we would be obliged to take action. Our experience to
date and the assurances of some fine commercial developers make us optimistic
that these circumstances will not arise.

The R Foundation

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] install.packages and dependency version checking

2008-12-15 Thread Robert Gentleman

Hi,

Prof Brian Ripley wrote:
> I've started to implement checks for package versions on dependencies in
> install.packages().  However, this is revealing a number of
> problems/misconceptions.
> 
> 
> (A) We do not check versions when loading namespaces, ahd the namespace
> registry does not contain version information.  So that for example
> (rtracklayer)
> 
> Depends: R (>= 2.7.0), Biobase, methods, RCurl
> Imports: XML (>= 1.98-0), IRanges, Biostrings
> 
> will never check the version of namespace XML that is loaded, either
> already loaded or resulting from loading this package's namespace.  For
> this to be operational we would need to extend the syntax of the
> imports() and importsFrom() directive in a NAMESPACE file to allow
> version restrictions. I am not sure this is worth doing, as an
> alternative is to put the imported package in Depends.
> 
> The version dependence will in a future release cause an update of XML
> when rtracklayer is installed, if needed (and available).
> 
> 

  I think we need to have this functionality in both Imports and Depends,
  see my response to another point for why.

> (B) Things like (package stam)
> 
> Depends: R (>= 2.7.0), GO.db (>= 2.1.3), Biobase (>= 1.99.5), pamr (>=
> 1.37.0), cluster (>= 1.11.10), annaffy (>= 1.11.5), methods (>=
> 2.7.0), utils (>= 2.7.0)
> 
> are redundant: the versions of method and utils are always the same as
> that of R.
> 
> And there is no point in having a package in both Depends: and Imports:,
> as Biostrings has.

  I don't think that is true.  There are cases where both Imports and Depends
are reasonable.  The purpose of importing is to ensure correct resolution of
symbols in the internal functions of a package. I would do that in almost all
cases.  In some instances I want users to see functionality from another package
- and I can then either a) (re)export those functions, or if there are lots of
them, then b) just put the package also in Depends.  Now, a) is a bit less
useful than it could be since R CMD check gets annoyed about these re-exported
functions (I don't think it should care, the man page exists and is findable).

> 
> 
> (C) There is no check on the version of a package suggested by
> Suggests:, unless the package itself provides one (and I found no
> instances).

  It may be worthwhile, but this is a less frequent use case and I would
prioritize it lower than having that functionality in Imports.

> 
> 
> (D) We can really only handle >= dependencies on package versions (but
> then I can see no other ops in use).  install.packages() will find the
> latest version available on the repositories, and we possibly need to
> check version requirements on the same dependency many times.  Given
> that BioC has a penchant for having version dependencies on unavailable
> versions (e.g. last week on IRanges (>= 1.1.7) with 1.1.4 available), we
> may be able to satisfy the requirements of some packages and not others.
> (In that case the strategy used is to install the latest available
> version if the one installed does not suffice for those we can satisfy,
> and report the problem(s).)
> 

  I suspect one needs = (basically as Gabor pointed out, some packages have 
issues).

> 
> (E) One of the arguments that has been used to do this version checking
> at install time is to avoid installing packages that cannot work. It
> would be possible to extend the approach to do so, but I am going to
> leave that to those who advocated it.
> 
> 
> The net effect of the current changes will be that if there is a
> dependence that is already installed but a later version is available
> and will help satisfy a >= dependence, it will be added to the list of
> packages to be installed.  As we have seen with Matrix this last week,
> that can have downsides in stopping previously functional packages working.
> 
> This is work in progress: there is no way to write a test suite that
> will encapsulate all the possible scenarios so weneed to get experience
> until 2.9.0 is released.  Please report any quirks to R-devel if they
> are completely reproducible (and preferably with the code change needed
> to fix them, since the chance of anyone else being able to reproduce
> them are fairly slim).
> 
  thanks
Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] wish: exportClassPattern

2008-12-04 Thread Robert Gentleman

should be in the most recent devel,


Prof Brian Ripley wrote:
> Michael,
> 
> This seems a reasonable request, but most of us will not have enough
> classes in a package for it to make a difference.
> 
> A patch to do this would certainly speed up implementation.
> 
> On Fri, 21 Nov 2008, Michael Lawrence wrote:
> 
>> It would be nice to have a more convenient means of exporting multiple
>> classes from a package namespace. Why not have something like
>> exportClassPattern() that worked like exportPattern() except for classes?
>>
>> Thanks,
>> Michael
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] why is \alias{anRpackage} not mandatory?

2008-10-07 Thread Robert Gentleman


Hi,
  Not that I have time, but I suspect that there are not that many ways 
to document a package, probably only five or six variants in the wild 
and an overview function, with a name like packageOverview, would be 
relatively easy to write and would be able to extract all available 
information into a single place for a user.


An advantage of such an approach is that it can be updated, independent 
of what particular developers/packages do, leaving us free to choose our 
own manner of documentation.  Also, one could easily imaging giving 
overviews of the class structure and functions + relationships (a thread 
on r-help) could be options, as could extracting doxygen type comments. 
But, as with all things R, someone needs to actually put it together - 
the tools are all there.


best wishes
  Robert


Duncan Murdoch wrote:

On 07/10/2008 10:17 AM, hadley wickham wrote:
This shows up in the HTML help system.  It would be better if it 
showed up
in all help formats, but there are other ways to do that, e.g. 
creating an

Rd help page pointing to those files.


Or you can just link to them from your website.

I don't think you'd argue with the statement that there's already too
many different ways to find R documentation.  


I think that's a paraphrase of one of my earlier posts.


There are plenty of
hacks and work-arounds to jam different types of documentation in
different places, but they are just hacks and work-arounds.  My
feeling is that evolutionary modification of the documentation system
is only going to get us so far, and at some point the entire
foundations need to be rethought.


I don't agree with this.  Back in 2001 when this was first proposed it 
might have worked, but there's far too much inertia now to make a big 
change.  Weren't you the one who objected to a requirement for a 
foo-package help topic?  How would you like to rewrite all the help 
files for all of your packages?  (I imagine not much.  I'm certainly not 
going to do that for mine.)


I think any change we make now needs to be incremental, but there's a 
tremendous amount of friction against anything at all, and very few 
offers of support to actually do the work.


Here are things I'm currently working on, that I'd appreciate support for:

 - Formalizing the Rd format and writing a parser for it.  (The current 
parser finds errors in about 2-5% of base package man files.  Should it 
be more permissive? I would guess it will find more errors in 
contributed packages.)  Can it make changes?  I would really love to say 
that % is nothing special in an R code section in an Rd file, but there 
are lots of pages that use it as a comment, as it is documented to be.


 - Allowing macros in an Rd file.  This will give a way to avoid 
duplication of information, will allow you to include an index of 
whatever sort of files you want, generated on the fly, and will slice 
bread if you write a macro for it.


 - Source level debugging support.  Gabor mentioned that it's hard to 
debug Sweave files; this could help.




Of course, the problem is having enough time to do that, and then to
code up the solution!


That's the main problem.  I find the coding is much easier than the 
design, though.  I can code on my own, but the design really needs 
careful thought and criticism.  (It's easy to get shallow criticism; the 
hard thing is to get useful criticism.)  That means at least two people 
need to find time to work together on the problem, and in my experience, 
that has almost never happened with any of the problems above.  So I 
move very, very slowly on them.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion for the optimization code

2008-08-08 Thread Robert Gentleman




Duncan Murdoch wrote:

On 8/8/2008 8:56 AM, Mathieu Ribatet wrote:

Dear list,

Here's a suggestion about the different optimization code. There are 
several optimization procedures in the base package (optim, optimize, 
nlm, nlminb, ..). However, the output of these functions are slightly 
different. For instance,


   1. optim returns a list with arguments par (the estimates), value the
  minimum (maxima) of the objective function, convergence (optim
  .convergence)
   2. optimize returns a list with arguments minimum (or maximum) giving
  the estimates, objective the value of the obj. function
   3. nlm returns a list with arguments minimum giving the minimum of
  the obj. function, minimum the estimates, code the optim. 
convergence

   4. nlminb returns a list with arguments par (the estimates),
  objective, convergence (conv. code), evaluations

Furthermore, optim keeps the names of the parameters while nlm, nlminb 
don't.

s
I believe it would be nice if all these optimizers have a kind of 
homogenized output. This will help in writing functions that can call 
different optimizers. Obviously, we can write our own function that 
homogenized the output after calling the optimizer, but I still 
believe this will be more user-friendly.


Unfortunately, changing the names within the return value would break a 
lot of existing uses of those functions.  Writing a wrapper to 
homogenize the output is probably the right thing to do.


  And potentially to harmonize inputs. The MLInterfaces package 
(Bioconductor) has done this for many machine learning algorithms, 
should you want an example to look at.


  Robert




Duncan Murdoch

Do you think this is a reasonable feature to implement - despite it 
isn't an important point?

Best,
Mathieu

* BTW, if this is relevant, I could try to do it.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: What should ?foo do?

2008-04-25 Thread Robert Gentleman



Duncan Murdoch wrote:
> On 4/25/2008 10:16 AM, Robert Gentleman wrote:
>>
>> Duncan Murdoch wrote:
>>> Currently ?foo does help("foo"), which looks for a man page with 
>>> alias foo.  If foo happens to be a function call, it will do a bit 
>>> more, so
>>>
>>> ?mean(something)
>>>
>>> will find the mean method for something if mean happens to be an S4 
>>> generic.  There are also the type?foo variations, e.g. methods?foo, 
>>> or package?foo.
>>>
>>> I think these are all too limited.
>>>
>>> The easiest search should be the most permissive.  Users should need 
>>> to do extra work to limit their search to man pages, with exact 
>>> matches, as ? does.
>>
>>While I like the idea, I don't really agree with the sentiment 
>> above. I think that the easiest search should be the one that you want 
>> the result of most often.
>> And at least for me that is the man page for the function, so I can 
>> check some detail; and it works pretty well.  I use site searches much 
>> less frequently and would be happy to type more for them.
> 
> That's true.
> 
> What's your feeling about what should happen when ?foo fails?

   present of list of man pages with spellings close to foo (we have the 
tools to do this in many places right now, and it would be a great help, 
IMHO, as spellings and capitalization behavior varies both between and 
within individuals), so the user can select one

> 
> 
>>
>>>
>>> We don't currently have a general purpose search for "foo", or 
>>> something like it.  We come close with RSiteSearch, and so possibly 
>>> ?foo should mean RSiteSearch("foo"), but
>>> there are problems with that: it can't limit itself to the current 
>>> version of R, and it doesn't work when you're offline (or when 
>>> search.r-project.org is down.)  We also have help.search("foo"), but 
>>> it is too limited. I'd like to have a local search that looks through 
>>> the man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., 
>>> specific to the current R installation, and I think ? should be 
>>> attached to that search.
>>
>>   I think that would be very useful (although there will be some 
>> decisions on which tool to use to achieve this). But, it will also be 
>> problematic, as one will get tons of hits for some things, and then 
>> selecting the one you really want will be a pain.
>>
>>   I would rather see that be one of the dyadic forms, say
>>
>>site?foo
>>
>>   or
>>all?foo
>>
>>   one could even imagine refining that for different subsets of the 
>> docs you have mentioned;
>>
>>help?foo #only man pages
>>guides?foo #the manuals, R Extensions etc
>>
>> and so on.
>>
>>You did not, make a suggestion as to how we would get the 
>> equivalent of ?foo now, if a decision to move were taken.
> 
> I didn't say, but I would assume there would be a way to do it, and it 
> shouldn't be hard to invoke.  Maybe help?foo as you suggested, or man?foo.

   If not then I would be strongly opposed -- I really think we want to 
make the most common thing the easiest to do.  And if we really think 
that might be different for different people, then disambiguate the 
"short-cut", ? in this case, from the command so that users have some 
freedom to customize, would be my favored alternative.

   I also wonder if one could not also provide some mechanism to provide 
distinct information on what is local vs what is on the internet. 
Something that would make tools like spotlight much more valuable, IMHO, 
is to tell me what I have on my computer, and what I can get, if I want 
to; at least as some form of option.


   Robert

> 
> Duncan Murdoch
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: What should ?foo do?

2008-04-25 Thread Robert Gentleman



Duncan Murdoch wrote:
> Currently ?foo does help("foo"), which looks for a man page with alias 
> foo.  If foo happens to be a function call, it will do a bit more, so
> 
> ?mean(something)
> 
> will find the mean method for something if mean happens to be an S4 
> generic.  There are also the type?foo variations, e.g. methods?foo, or 
> package?foo.
> 
> I think these are all too limited.
> 
> The easiest search should be the most permissive.  Users should need to 
> do extra work to limit their search to man pages, with exact matches, as 
> ? does.

   While I like the idea, I don't really agree with the sentiment above. 
I think that the easiest search should be the one that you want the 
result of most often.
And at least for me that is the man page for the function, so I can 
check some detail; and it works pretty well.  I use site searches much 
less frequently and would be happy to type more for them.

> 
> We don't currently have a general purpose search for "foo", or something 
> like it.  We come close with RSiteSearch, and so possibly ?foo should 
> mean RSiteSearch("foo"), but
> there are problems with that: it can't limit itself to the current 
> version of R, and it doesn't work when you're offline (or when 
> search.r-project.org is down.)  We also have help.search("foo"), but it 
> is too limited. I'd like to have a local search that looks through the 
> man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., specific 
> to the current R installation, and I think ? should be attached to that 
> search.

  I think that would be very useful (although there will be some 
decisions on which tool to use to achieve this). But, it will also be 
problematic, as one will get tons of hits for some things, and then 
selecting the one you really want will be a pain.

  I would rather see that be one of the dyadic forms, say

   site?foo

  or
   all?foo

  one could even imagine refining that for different subsets of the docs 
you have mentioned;

   help?foo #only man pages
   guides?foo #the manuals, R Extensions etc

and so on.

   You did not, make a suggestion as to how we would get the equivalent 
of ?foo now, if a decision to move were taken.


> 
> Comments, please.
> 
> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check should check date in description

2008-04-04 Thread Robert Gentleman



hadley wickham wrote:
>>   Please no.  If people want one then they should add it manually. It is
>> optional, and some of us have explicitly opted out and would like to
>> continue to do so.
> 
> To clarify, do you mean you have decided not to provide a date field
> in the DESCRIPTION file?  If so, would you mind elaborating why?

  Sure: The date of what?


> 
> Hadley
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check should check date in description

2008-04-04 Thread Robert Gentleman



Kurt Hornik wrote:
>>>>>> hadley wickham writes:
> 
>>> I recently thought about this.  I see several issues.
>>>
>>> * How can we determine if it is "old"?  Relative to the time when the
>>> package was uploaded to a repository?
>>>
>>> * Some developers might actually want a different date for a variety of
>>> reasons ...
>>>
>>> * What we currently say in R-exts is
>>>
>>> The optional `Date' field gives the release date of the current
>>> version of the package.  It is strongly recommended to use the
>>> -mm-dd format conforming to the ISO standard.
>>>
>>> Many packages do not comply with the latter (but I have some code to
>>> sanitize most of these), and "release date" may be a moving target.
>>>
>>> The best that I could think of is to teach R CMD build to *add* a Date
>>> field if there was none.
> 
>> That sounds like a good solution to me.
> 
> Ok.  However, 2.7.0 feature freeze soon ...

   Please no.  If people want one then they should add it manually. It 
is optional, and some of us have explicitly opted out and would like to 
continue to do so.


> 
>> Otherwise, maybe just a message from R CMD check?  i.e. just like
>> failing the codetools checks, it might be perfectly ok, but you should
>> be doing it consciously, not by mistake.
> 
> I am working on that, too (e.g. a simple NOTE in case the date spec
> cannot be canonicalized, etc.).  If file time stamps were realiable, we
> could compare these to the given date.  This is I guess all we can do
> for e.g. CRAN's daily checking (where comparing to the date the check
> is run is not too useful) ...

   But definitely not a warning.

   Robert

> 
> Best
> -k
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem with new("externalptr")

2008-01-29 Thread Robert Gentleman

Hi,

Herve Pages wrote:
> Hi,
> 
> It seems that new("externalptr") is always returning the same instance, and
> not a new one as one would expect from a call to new(). Of course this is hard
> to observe:
> 
>   > new("externalptr")
>   
>   > new("externalptr")
>   
> 
> since not a lot of details are displayed.
> 
> For example, it's easy to see that 2 consecutive calls to new("environment")
> create different instances:
> 
>   > new("environment")
>   
>   > new("environment")
>   

   getMethod("initialize", "environment")
and

   getMethod("initialize", "externalptr")

  will give some hints about the difference.
> 
> But for new("externalptr"), I had to use the following C routine:
> 
>   SEXP sexp_address(SEXP s)
>   {
> SEXP ans;
> char buf[40];
> 
> snprintf(buf, sizeof(buf), "%p", s);
> PROTECT(ans = NEW_CHARACTER(1));
> SET_STRING_ELT(ans, 0, mkChar(buf));
> UNPROTECT(1);
> return ans;
>   }
> 
> Then I get:
> 
>   > .Call("sexp_address", new("externalptr"))
>   [1] "0xde2ce0"
>   > .Call("sexp_address", new("externalptr"))
>   [1] "0xde2ce0"
> 
> Isn't that wrong?

   Not what you want, but not wrong. In the absence of an initialize 
method all calls to "new" are guaranteed to return the prototype; so I 
think it behaves as documented.

   new("environment") would also always return the same environment, 
were it not for the initialize method.  So you might want to contribute 
an initialize method for externalptr, but as you said, they are not 
useful at the R level so I don't know just what problem is being solved.

   This piece of code might be useful in such a construction
.Call("R_externalptr_prototype_object", PACKAGE = "methods")

  which does what you would like.

  best wishes
Robert

> 
> I worked around this problem by writing the following C routine:
> 
>   SEXP xp_new()
>   {
> return R_MakeExternalPtr(NULL, R_NilValue, R_NilValue);
>   }
> 
> so I can create new "externalptr" instances from R with:
> 
>   .Call("xp_new")
> 
> I understand that there is not much you can do from R with an "externalptr"
> instance and that you will have to manipulate them at the C level anyway.
> But since new("externalptr") exists and seems to work, wouldn't that be
> better if it was really creating a new instance at each call?
> 
> Thanks!
> H.
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] course announcement

2008-01-07 Thread Robert Gentleman

Hi,
   We will be holding an advanced course in R programming at the FHCRC 
(Seattle), Feb 13-15. There will be some emphasis on Bioinformatic 
applications, but not much.

   Sign up at:
https://secure.bioconductor.org/SeattleFeb08/index.php

   please note space is very limited so make sure you have a 
registration before making any travel plans. Also, this is definitely 
not a course for beginners.

   Best wishes
 Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Evaluating R expressions from C

2008-01-04 Thread Robert Gentleman

Hi Terry,

Terry Therneau wrote:
> I am currently puzzled by a passage in the R Extensions manual, section 5.10:
> 
> SEXP lapply(SEXP list, SEXP expr, SEXP rho)
>  {
>R_len_t i, n = length(list);
>SEXP ans;
>  
>if(!isNewList(list)) error("`list' must be a list");
>if(!isEnvironment(rho)) error("`rho' should be an environment");
>PROTECT(ans = allocVector(VECSXP, n));
>for(i = 0; i < n; i++) {
>  defineVar(install("x"), VECTOR_ELT(list, i), rho);
>  SET_VECTOR_ELT(ans, i, eval(expr, rho));
>}
> 
> I'm trying to understand this code beyond just copying it, and don't find 
> definitions for many of the calls.  PROTECT and SEXP have been well discussed 
> previously in the document, but what exactly are
>   R_len_t
>   defineVar

this function defines the variable (SYMSXP; one type of SEXP) of 
its first argument, to have the value given by its second argument, in 
the environment defined by its third argument. There are lots of 
variants, these are largely in envir.c

>   install

   all symbols in R are unique (there is only one symbol named x, even 
though it might have bindings in many different environments). So to get 
the unique "thing" (a SYMSXP) you call install (line 1067 in names.c has 
a pretty brief comment to this effect). This makes it efficient to do 
variable look up, as we only need to compare pointers (within an 
environment), not compare names.

>   VECTOR_ELT

 access the indicated element (2nd arg) of the vector (first arg)

>   SET_VECTOR_ELT

  set the indicated element (2nd arg), of the vector (1st arg) to 
the value (3rd arg)

>   
> The last I also found in 5.7.4, but it's not defined there either.  
> 
> So:
>What do these macros do?  Some I could guess, like is.Environment; and I'm 
> fairly confident of R_len_t.  Others I need some help.
>Perhaps they are elswhere in the document?  (My version of acrobat can't 
> do 
> searches.)  Is there another document that I should look at first?
>Why "isNewList"?  I would have guessed "isList".  What's the difference?

"old lists" are of the CAR-CDR variant, and largely only used 
internally these days.  "new lists", are generic vectors, and are what 
users will almost always encounter (even users that program internals, 
you pretty much need to be messing with the language itself to run into 
the CAR-CDR variety).

best wishes
     Robert

>    
>   Thanks for any help,
>   Terry Therneau
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] install.packages() and configure.args

2007-10-22 Thread Robert Gentleman

since, in Herve's example only one package was named, it would be nice 
to either, make sure configure args are associated with it, or to force 
only named configure.args parameters, and possibly check the names?


Duncan Temple Lang wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi Herve
> 
>  The "best" way to specify configure.args when there are multiple
> packages (either directly or via dependencies) is to use names on
> the character vector, i.e.
> 
>install.packages("Rgraphviz",
> rep="http://bioconductor.org/packages/2.1/bioc";,
> 
> configure.args=c(Rgraphviz="--with-graphviz=/some/non/standard/place"))
> 
> 
> This allows one to specify command line arguments for many packages
> simultaneously and unambiguously.
> 
> install.packages() only uses configure.args when there are no names
> if there is only one package being installed.  It could be made
> smarter to apply this to the first of the pkgs only, or
> to identify the packages as direct and dependent.  But it is not
> obvious it is worth the effort as using names on configure.args
> provides a complete solution and is more informative.
> 
> Thanks for pointing this out.
> 
>  D.
> 
> Herve Pages wrote:
>> Hi,
>>
>> In the case where install.packages("packageA") also needs to install
>> required package "packageB", then what is passed thru the 'configure.args'
>> argument seems to be lost when it's the turn of packageA to be installed
>> (the last package to get installed).
>>
>> This is not easy to reproduce but let's say you have the graphviz libraries
>> installed on your system, but you don't have the graph package installed yet.
>> Then this
>>
>>   install.packages("Rgraphviz",
>>rep="http://bioconductor.org/packages/2.1/bioc";,
>>configure.args="--with-graphviz=/some/non/standard/place")
>>
>> will fail because --with-graphviz=/some/non/standard/place doesn't seem to be
>> passed to Rgraphviz's configure script. But if you already have the graph 
>> package,
>> then it will work.
>>
>> Cheers,
>> H.
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHGnBg9p/Jzwa2QP4RAp0NAJ9Qe/thxdrX8CpFVcRP2UoHk1txFACeL9uM
> twmID5hsclilHhIfPsuFt7A=
> =vCz1
> -END PGP SIGNATURE-
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] gregexpr (PR#9965)

2007-10-11 Thread Robert Gentleman

Yes, we had originally wanted it to find all matches, but user 
complaints that it did not perform as Perl does were taken to prevail. 
There are different ways to do this, but it seems the notion that one 
not start looking for the next match until after the previous one is 
more common.  I did consciously decide not to have a switch, and instead 
we wrote something that does what we wanted it to do and put it in the 
Biostrings package (from Bioconductor) as geregexpr2 (sorry but only 
fixed = TRUE is supported, since that is all we needed).

best wishes
   Robert

Prof Brian Ripley wrote:
> This was a deliberate change for R 2.4.0 with SVN log:
> 
> r38145 | rgentlem | 2006-05-20 23:58:14 +0100 (Sat, 20 May 2006) | 2 lines
> fixing gregexpr infelicity
> 
> So it seems the author of gregexpr believed that the bug was in 2.3.1, not 
> 2.5.1.
> 
> On Wed, 10 Oct 2007, [EMAIL PROTECTED] wrote:
> 
>> Full_Name: Peter Dolan
>> Version: 2.5.1
>> OS: Windows
>> Submission from: (NULL) (128.193.227.43)
>>
>>
>> gregexpr does not find all matching substrings if the substrings overlap:
>>
>>> gregexpr("abab","ababab")
>> [[1]]
>> [1] 1
>> attr(,"match.length")
>> [1] 4
>>
>> It does work correctly in Version 2.3.1 under linux.
> 
> 'correctly' is a matter of definition, I believe: this could be considered 
> to be vaguely worded in the help.
> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 'load' does not properly add 'show' methods for classes extending 'list'

2007-09-25 Thread Robert Gentleman

I think that it would be best then, not to load the package, as loading 
it in this way, means that it is almost impossible to get the methods 
registered correctly. That does seem to be a bug, or at least a major 
inconvenience.  And one might wonder at the purpose of attaching if not 
to make methods available.

That said the documentation, indeed does not state that anything good 
will happen. It also does not state that something bad will happen either.

best wishes
   Robert


Prof Brian Ripley wrote:
> I am not sure why you expected this to work: I did not expect it to and 
> could not find relevant documentation to suggest it should.
> 
> Loading an object created from a non-attached package does not in general 
> attach that package and make the methods for the class of 'x' available. 
> We have talked about attaching the package defining the class when an S4 
> object is loaded, and that is probably possible now S4 objects can be 
> unambiguously distinguished (although I still worry about multiple 
> packages with the same generic and their order on the search path).
> 
> In your example there is no specific 'show' method on the search path when 
> 'show' is called via autoprinting in the second session, so 'showDefault' 
> is called.  Package GSEABase gets attached as an (undocumented) side 
> effect of calling 'getClassDef' from 'showDefault'.  I can see no 
> documentation (and in particular not in ?showDefault) that 'showDefault' 
> is supposed to attach the package defining the class and re-dispatch to a 
> 'show' method that package contains.  Since attaching packages behind the 
> user's back can have nasty side effects (the order of the search path does 
> mattter), I think the pros and cons need careful consideration: a warning 
> along the lines of
> 
>'object 'x' is of class "GeneSetCollection" from package 'GSEABase'
>which is not on the search path
> 
> might be more appropriate.  Things would potentially be a lot smoother if 
> namespaces could be assumed, as loading a namespace has few side effects 
> (and if loading a namespace registered methods for visible S4 generics 
> smoothly).
> 
> Until I see documentation otherwise, I will continue to assume that I do 
> need to attach the class-defining package(s) for things to work correctly.
> 
> 
> On Mon, 24 Sep 2007, Martin Morgan wrote:
> 
>> The GeneSetCollection class in the Bioconductor package GSEABase
>> extends 'list'
>>
>>> library(GSEABase)
>>> showClass("GeneSetCollection")
>> Slots:
>>
>> Name:  .Data
>> Class:  list
>>
>> Extends:
>> Class "list", from data part
>> Class "vector", by class "list", distance 2
>> Class "AssayData", by class "list", distance 2
>>
>> If I create an instance of this class and serialize it
>>
>>> x <- GeneSetCollection(GeneSet("X"))
>>> x
>> GeneSetCollection
>>  names: NA (1 total)
>>> save(x, file="/tmp/x.rda")
>> and then start a new R session and load the data object (without first
>> library(GSEABase)), the 'show' method is not added to the appropriate
>> method table.
>>
>>> load("/tmp/x.Rda")
>>> x
>> Loading required package: GSEABase
>> Loading required package: Biobase
>> Loading required package: tools
>>
>> Welcome to Bioconductor
>>
>>  Vignettes contain introductory material. To view, type
>>  'openVignette()'. To cite Bioconductor, see
>>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: AnnotationDbi
>> Loading required package: DBI
>> Loading required package: RSQLite
>> An object of class "GeneSetCollection"
>> [[1]]
>> setName: NA
>> geneIds: X (total: 1)
>> geneIdType: Null
>> collectionType: Null
>> details: use 'details(object)'
>>
>> Actually, the behavior is more complicate than appears; in a new R
>> session after loading /tmp/x.Rda, if I immediately do x[[1]] I get the
>> show,GeneSetCollection-method but not show,GeneSet-method.
>>
>> Sorry for the somewhat obscure example.
>>
>> Martin
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Jobs in Seattle

2007-09-13 Thread Robert Gentleman

Hi,
   As many of you will realize, Seth is going to be leaving us (pretty 
much immediately).  So we will be looking to replace him. In addition,
Martin Morgan is going to be moving into another role as well, one that 
will require an assistant.  In addition, I am looking for at least one 
post-doc (preferably with an interest in sequence related work).

   If any of these interest you, please check out the job descriptions
at:
http://www.fhcrc.org/about/jobs/

  and you can get some idea of salary level as well

  You can feel free to ask me about either the lead programmer job, or 
the post doc and should probably direct questions about the 
bioinformatics position to Martin.

   All applications must go through the FHCRC web site.

  thanks
Robert



-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 and Namespaces problems {was "error message from lmer"}

2007-07-18 Thread Robert Gentleman

Hi,

Martin Maechler wrote:
> Here is a reproducible example for the Bug that both Sebastian
> and Dale Barr found.
> 
> As Brian mentioned in an another thread,
> the problem is in the interaction of Namespaces and S4 generics
> and which S4 generic should keep which methods.
> 
> We know there are workarounds, but till now they seem either
> ugly or very much against the idea that conceptually there
> should be only one generic which may have methods defined in
> many different packages / namespaces.

   There should *not* be one generic. Generics are no different than any 
other function. Package A can have a generic named foo, and so can 
package B.  Other packages that want to add methods to a generic named 
foo need to know which one they would like to add to.  These generics 
can be masked. If Package A is first on the search path then that is the 
foo that is found first (and if Package B is first then that is the foo, 
users that specifically want foo from B should use B::foo).


> 
> I would like us (R-core, mostly) to resolve this as quickly as
> possible.
> 
> -
> 
> ### Do this in a fresh  R session:
> 
> summary # S3 generic
> find("summary") # base
> 
> library(stats4)
> summary # S4 generic
> find("summary") # stats4 , base
> 
> library(lme4)
> ## -> loads Matrix (and lattice)
> find("summary") # lme4, Matrix, stats4 , base   --- 4 times ! ---

   Have they all defined generics? If that is the case then there are 4.

   We did discuss, and I hope to make progress on the following 
proposal. For functions in base that have an S4 method defined for them 
(and hence we need a generic function), that we create a new package 
that lives slightly above base (and potentially other recommended 
packages) where these generics will live.  Developers can then rely on 
finding the generic there, and using it - if their intention is to 
extend the base generic.  Note that they may want to have their own 
generic with the same name as the one from base, and that is fine, it 
will mask the one in base.

   best wishes
Robert
> 
> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> ## -->
> ## Error in lmer(Reaction ~ Days + (Days | Subject), sleepstudy) :
> ##cannot get a slot ("Dim") from an object of type "NULL"
> 
> -
> Martin Maechler
> 
> 
>>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>>> on Thu, 28 Jun 2007 06:08:45 +0100 (BST) writes:
> 
> BDR> See the thread starting
> BDR> https://stat.ethz.ch/pipermail/r-devel/2007-June/046157.html
> BDR> https://stat.ethz.ch/pipermail/r-devel/2007-June/046160.html
> 
> BDR> I can't reproduce this without knowing what is in your
> BDR> startup files: it should work with --vanilla, so please
> BDR> try that and try to eliminate whatever is in your
> BDR> .Rprofile etc that is causing the problem.
> 
> BDR> Incidentally, using rcompletion is counterproductive in
> BDR> R 2.5.1 RC: the base functionality using rcompgen is a
> BDR> more sophisticated version.
> 
> BDR> On Wed, 27 Jun 2007, Sebastian P. Luque wrote:
> 
> >> Hi,
> >> 
> >> I've begun to use the lme4 package, rather than nlme, for more 
> flexibility
> >> during modelling, and running the examples in lmer I receive this error
> >> message:
> >> 
> >> ---<---cut here---start-->---
> R> (fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
> >> Error in printMer(object) : no slot of name "status" for this object 
> of class "table"
> >> 
> R> sessionInfo()
> >> R version 2.5.1 RC (2007-06-25 r42057)
> >> x86_64-pc-linux-gnu
> >> 
> >> locale:
> >> 
> LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C
> >> 
> >> attached base packages:
> >> [1] "stats4""stats" "graphics"  "grDevices" "utils" 
> "datasets"
> >> [7] "methods"   "base"
> >> 
> >> other attached packages:
> >> lme4  Matrix rcompletionrcompgen latticediveMove
&g

Re: [Rd] HTML vignette browser

2007-06-04 Thread Robert Gentleman



Deepayan Sarkar wrote:
> On 6/4/07, Seth Falcon <[EMAIL PROTECTED]> wrote:
>> Friedrich Leisch <[EMAIL PROTECTED]> writes:
>>> Looks good to me, and certainly something worth being added to R.
>>>
>>> 2 quick (related) comments:
>>>
>>> 1) I am not sure if we want to include links to the Latex-Sources by
>>>default, those might confuse unsuspecting novices a lot. Perhaps
>>>make those optional using an argument to browseVignettes(), which
>>>is FALSE by default?
>> I agree that the Rnw could confuse folks.  But I'm not sure it needs
>> to be hidden or turned off by default...  If the .R file was also
>> included then it would be less confusing I suspect as the curious
>> could deduce what Rnw is about by triangulation.
>>
>>> 2) Instead links to .Rnw files we may want to include links to the R
>>>code -> should we R CMD INSTALL a tangled version of each vignette
>>>such that we can link to it? Of course it is redundant information
>>>given the .Rnw, but we also have the help pages in several formats
>>>ready.
>> Including, by default, links to the tangled .R code seems like a
>> really nice idea.  I think a lot of users who find vignettes don't
>> realize that all of the code used to generate the entire document is
>> available to them -- I just had a question from someone who wanted to
>> know how to make a plot that appeared in a vignette, for example.
> 
> I agree that having a Stangled .R file would be a great idea (among
> other things, it would have the complete code, which many PDFs will
> not).
> 
> I don't have a strong opinion either way about linking to the .Rnw
> file. It should definitely be there if the PDF file is absent (e.g.
> for grid, and other packages installed with --no-vignettes, which I
> always do for local installation). Maybe we can keep them, but change
> the name to something more scary than "source", e.g. "LaTeX/Noweb
> source".

   I would very much prefer to keep the source, with some name, scary or 
not...

> 
> -Deepayan
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Calling R_PolledEvents from R_CheckUserInterrupt

2007-05-31 Thread Robert Gentleman

should be there shortly - I have no way of testing windows (right now, 
at least), so hopefully Duncan M will have time to take a look


Deepayan Sarkar wrote:
> On 5/5/07, Luke Tierney <[EMAIL PROTECTED]> wrote:
> 
> [...]
> 
>> However, R_PolledEvents is only called from a limited set of places
>> now (including the socket reading code to keep things responsive
>> during blocking reads).  But it is not called from the interupt
>> checking code, which means if a user does something equivalent to
>>
>> while (TRUE) {}
>>
>> there is not point where events get looked at to see a user interrupt
>> action. The current definition of R_CheckUserInterrupt is
>>
>> void R_CheckUserInterrupt(void)
>> {
>>  R_CheckStack();
>>  /* This is the point where GUI systems need to do enough event
>> processing to determine whether there is a user interrupt event
>> pending.  Need to be careful not to do too much event
>> processing though: if event handlers written in R are allowed
>> to run at this point then we end up with concurrent R
>> evaluations and that can cause problems until we have proper
>> concurrency support. LT */
>> #if  ( defined(HAVE_AQUA) || defined(Win32) )
>>  R_ProcessEvents();
>> #else
>>  if (R_interrupts_pending)
>>  onintr();
>> #endif /* Win32 */
>> }
>>
>> So only on Windows or Mac do we do event processing.  We could add a
>> R_PolledEvents() call in the #else bit to support this, though the
>> cautions in the comment do need to be kept in mind.
> 
> I have been using the following patch to src/main/errors.c for a while
> without any obvious ill effects. Could we add this to r-devel (with
> necessary changes for Windows, if any)?
> 
> -Deepayan
> 
> Index: errors.c
> ===
> --- errors.c(revision 41764)
> +++ errors.c(working copy)
> @@ -39,6 +39,8 @@
>  #include  /* for GEonExit */
>  #include  /* for imax2 */
> 
> +#include 
> +
>  #ifndef min
>  #define min(a, b) (a  #endif
> @@ -117,6 +119,8 @@
>  #if  ( defined(HAVE_AQUA) || defined(Win32) )
>  R_ProcessEvents();
>  #else
> +R_PolledEvents();
>  if (R_interrupts_pending)
> onintr();
>  #endif /* Win32 */
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Native implementation of rowMedians()

2007-05-14 Thread Robert Gentleman

We did think about this a lot, and decided it was better to have 
something like rowQ, which really returns requested order statistics, 
letting the user manipulate them on the return for their own version of 
median, or other quantiles, was a better approach. I would be happy to 
have this in R itself, if there is sufficient interest and we can remove 
the one in Biobase (without the need for deprecation/defunct as long  as 
the args are compatible). But, if the decision is to return a particular 
estimate of a quantile, then we would probably want to keep our function 
around, with its current name.

best wishes
   Robert


Martin Maechler wrote:
>>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>>> on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:
> 
> BDR> On Mon, 14 May 2007, Henrik Bengtsson wrote:
> >> On 5/14/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> >>> 
> >>> > Hi Henrik,
> >>> >>>>>> "HenrikB" == Henrik Bengtsson <[EMAIL PROTECTED]>
> >>> >>>>>> on Sun, 13 May 2007 21:14:24 -0700 writes:
> >>> >
> >>> >HenrikB> Hi,
> >>> >HenrikB> I've got a version of rowMedians(x, na.rm=FALSE) for 
> >>> matrices that
> >>> >HenrikB> handles missing values implemented in C.  It has been
> 
> BDR> [...]
> 
> >>> Also, the 'a version of rowMedians' made me wonder what other version
> >>> there was, and it seems there is one in Biobase which looks a more
> >>> natural home.
> >> 
> >> The rowMedians() in Biobase utilizes rowQ() in ditto.  I actually
> >> started of by adding support for missing values to rowQ() resulting in
> >> the method rowQuantiles(), for which there are also internal functions
> >> for both integer and double matrices.  rowQuantiles() is in R.native
> >> too, but since it has much less CPU milage I wanted to wait with that.
> >> The rowMedians() is developed from my rowQuantiles() optimized for
> >> the 50% quantile.
> >> 
> >> Why do you think it is more natural to host rowMedians() in Biobase
> >> than in one of the core R packages?  Biobase comes with a lot of
> >> overhead for people not in the Bio-world.
> 
> BDR> Because that is where there seems to be a need for it, and having 
> multiple 
> BDR> functions of the same name in different packages is not ideal (and 
> even 
> BDR> with namespaces can cause confusion).
> 
> That's correct, of course.
> However, I still think that quantiles (and statistics derived
> from them) in general and medians in particular are under-used
> by many user groups. For some useRs, speed can be an important
> reason and for that I had made a big effort to provide runmed()
> in R, and I think it would be worthwhile to provide fast rowwise
> medians and quantiles, here as well.
> 
> Also, BTW, I think it will be worthwhile to provide (R<->C) API
> versions of median() and quantile() {with less options than the
> R functions, most probably!!}, 
> such that we'd hopefully see less re-invention of the wheel
> happening in every package that needs such quantiles in its C code.
> 
> Biobase is in quite active maintenance, and I'd assume its
> maintainers will remove rowMedians() from there (or first
> replace it with a wrapper in order to deal with the namespace
> issue you mentioned) as soon as R has its own function
> with the same (or better) functionality.  
> In order to facilitate the transition, we'd have to make sure
> that such a 'stats' function does behave " >= " to the bioBase
> one. 
> 
> Martin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] One for the wish list - var.default etc

2007-05-09 Thread Robert Gentleman

Jeffrey J. Hallman wrote:
> Prof Brian Ripley <[EMAIL PROTECTED]> writes:
> 
>> On Wed, 9 May 2007, S Ellison wrote:
>>
>>> Brian,
>>>
>>>> If we make functions generic, we rely on package writers implementing 
>>>> the documented semantics (and that is not easy to check).  That was 
>>>> deemed to be too easy to get wrong for var().
>>> Hard to argue with a considered decision, but the alternative facing 
>>> increasing numbers of package developers seems to me to be pretty bad 
>>> too ...
>>>
>>> There are two ways a package developer can currently get a function 
>>> tailored to their own new class. One is to rely on a generic function to 
>>> launch their class-specific instance, and write only the class-specific 
>>> instance. That may indeed be hard to check, though I would be inclined 
>>> to think that is the package developer's problem, not the core team's. 
>>> But it has (as far as I know today ...?) no wider impact.
>> But it does: it gives the method privileged access, in this case to the 
>> stats namespace, even allowing a user to change the default method
>> which namespaces to a very large extent protect against.
>>
>> If var is not generic, we can be sure that all uses within the stats 
>> namespace and any namespace that imports it are of stats::var.  That is 
>> not something to give up lightly.
> 
> No, but neither is the flexibility afforded by generics. What we have here is
> a false tradeoff between flexibility vs. the safety of locking stuff down. 

   Yes, that is precisely one of the points, and as some of us recently 
experienced, a reasonably dedicated programmer can over-ride any base 
function through an add-on package. It is, in my opinion a bad idea to 
become the police here.

   AFAIK, Brian's considered decision, was his, I am aware of no 
discussion of that particular point of view about var (and as noted 
above, it simply doesn't work), it also, AFAICS confuses what happens 
(implementation) from what should happen (which is easy to do, because 
with most of the methods, either S3 or S4 there is very little written 
about what should happen).

   That said, there has been some relatively open discussion on one 
solution to this problem, and I am hopeful that we will have something 
in place before the end of July.

   A big problem with S4 generics is who owns them, and what seems to be 
a reasonable medium term solution is to provide a package that lives 
slightly above base in the search path that will hold generic functions 
for any base functions that do not have them. Authors of add on packages 
can then at least share a common generic when that is appropriate. But 
do realize that there are lots of reasons to have generics with the same 
name, in different packages that are not compatible, and normal scoping 
rules apply. For example the XML package has a generic function addNode, 
as does the graph package, and they are not compatible, nor should they 
be. Anyone wanting to use both packages (and I often do) needs to manage 
the name conflicts (and that is where namespaces are essential).

best wishes
   Robert

> 
> The tradeoff is false because unit tests are a better way to assure safety.
> If the major packages (like stats) had a suite of tests, a package developer
> could load his own package, run all the unit tests, and see if he broke
> something.  If it turns out that he broke something that wasn't covered by the
> tests, he could create a new test for that and submit it somewhere, perhaps
> on the R Wiki. 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] boundary case anomaly

2007-04-08 Thread Robert Gentleman

Hi,
  Any reason these should be different?

  x=matrix(0, nr=0, nc=3)
  colnames(x) = letters[1:3]
  data.frame(x)
#[1] a b c
#<0 rows> (or 0-length row.names)
  y=vector("list", length=3)
  names(y) = letters[1:3]
  data.frame(y)
#NULL data frame with 0 rows


both should have names (the second one does not) and why print something 
different for y?

  The two of the last three examples refer to a "NULL data frame", eg
   (d00 <- d0[FALSE,])  # NULL data frame with 0 rows

  but there is no description of what a NULL data frame should be (zero 
rows or zero columns, or either or both - and why a special name?)


  best wishes
Robert


-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] buglet in terms calculations

2007-04-08 Thread Robert Gentleman

Hi,
   Vince and I have noticed a problem with non-syntactic names in data 
frames and some modeling code (but not all modeling code).

   The following, while almost surely as documented could be a bit more 
helpful:

  m = matrix(rnorm(100), nc=10)
  colnames(m) = paste(1:10, letters[1:10], sep="_")

  d = data.frame(m, check.names=FALSE)

  f = formula(`1_a` ~ ., data=d)

  tm = terms(f, data=d)

  ##failure here, as somehow back-ticks have become part of the name
  ##not a quoting mechanism
  d[attr(tm, "term.labels")]

   The variable attribute, in the terms object, keeps them as quotes, so 
modeling code that uses that attribute seems fine, but code that uses 
the term.labels fails. In particular, it seems (of those tested) that 
glm, lda, randomForest seem to work fine, while nnet, rpart can't
handle nonsyntactic names in formulae as such

   In particlar, rpart contains this code:

  lapply(m[attr(Terms, "term.labels")], tfun)

   which fails for the reasons given.


  One way to get around this, might be to modify the do_termsform code,
right now we have:
PROTECT(varnames = allocVector(STRSXP, nvar));
 for (v = CDR(varlist), i = 0; v != R_NilValue; v = CDR(v))
 SET_STRING_ELT(varnames, i++, STRING_ELT(deparse1line(CAR(v), 
 0), 0));

  and then for term.labels, we copy over the varnames (with :, as 
needed) and perhaps we need to save the unquoted names somewhere?

  Or is there some other approach that will get us there? Certainly 
cleaning up the names via
   cleanTick = function(x) gsub("`", "", x)

  works, but it seems a bit ugly, and it might be better if the modeling 
code was modified.

   best wishes



-- 

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] possible bug in model.frame.default

2007-03-07 Thread Robert Gentleman

Please update to the latest snapshot
R version 2.5.0 Under development (unstable) (2007-03-05 r40816)
where all is well,

Thibaut Jombart wrote:
> Dear list,
> 
> I may have found a bug in model.frame.default (called by the lm function).
> The problem arises in my R dev version but not in my R 2.4.0.
> Here is my config :
> 
>  > version
>
> _  
> platform   
> x86_64-unknown-linux-gnu   
> arch   
> x86_64 
> os 
> linux-gnu  
> system x86_64, 
> linux-gnu  
> status Under development 
> (unstable)   
> major  
> 2  
> minor  
> 5.0
> year   
> 2007   
> month  
> 03 
> day
> 04 
> svn rev
> 40813  
> language   
> R  
> version.string R version 2.5.0 Under development (unstable) (2007-03-04 
> r40813)
> 
> Now a simple example to (hopefully) reproduce the bug (after a 
> rm(list=ls())):
> 
>  > dat=data.frame(y=rnorm(10),x1=runif(10),x2=runif(10))
>  > weights=1:10/(sum(1:10))
>  > form <- as.formula("y~x1+x2")
> # here is the error
>  > lm(form,data=dat,weights=weights)
> Erreur dans model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
> type (closure) incorrect pour la variable '(weights)'
> 
> (sorry, error message is in French)
> 
> As I said, these commands works using R.2.4.0 (same machine, same OS).
> Moreover, the following commands work:
>  > temp=weights
>  > lm(form,data=dat,weights=temp)
> 
> This currently seems to cause a check fail in the ade4 package. I tried 
> to find out where the bug came from: all I found is the (potential) bug 
> comes from model.frame.default, and more precisely:
> debug: data <- .Internal(model.frame(formula, rownames, variables, 
> varnames,
> extras, extranames, subset, na.action))
> Browse[1]>
> Erreur dans model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
> type (closure) incorrect pour la variable '(weights)'
> 
> I couldn't go further because of the .Internal. I tried to googlise 
> this, but I found no such problem reported recently.
> 
> Can anyone tell if this is actually a bug? (In case not, please tell me 
> where I got wrong).
> 
> Regards,
> 
> Thibaut.
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wish list

2007-01-01 Thread Robert Gentleman

ructurally missing ones.
>> I think this is something that you need to do in a different way.  There
>> are tons of possible semantics for what NA should mean.  I don't think
>> this should be made more complicated for everyone.
>>
> 
> Although one does not want to overcomplicate things the fact is that
> there are two issues here: structural and non-structural and trying to
> force them into a single construct is not simplifying -- rather it
> fails to model
> what is required adequately.
> 
> 
>>> 9. bidirectional pipes in Windows
>>>
>>> 10. Create a log updated at a regular frequency (daily or real time)
>>> that tracks all changes on CRAN, e.g.
>>>
>>>   Date(GMT)   Package Version Action
>>>   2006-09-20 21:22:01 mypkg   1.0.1   new
>>>   2006-09-20 22:00:23 mypkg2  0.2.1   updated
>>>
>>> 11. make integrate generic.  Ryacas could use that.
>>>
>>> 12. Remove all R CMD dependencies on the find.exe command.  find is a built
>>> in command in Windows and having find.exe on my path causes
>>> problems with other programs.
>> A simpler fix for this would be for you to define a wrapper for R CMD
>> that installed the R tools path before executing, and uninstalls it
>> afterwards.  But this is unnecessary for most people, because
>> Microsoft's find.exe is pretty rarely used.
>>
> 
> Anyone who uses batch files will use it quite a bit.  It certainly causes
> me problems on an ongoing basis and is an unacceptable conflict in
> my opinion.
> 
> I realize that its not entirely of R's doing but it would be best if R did not
> make it worse by requiring the use of find.
> 
>>> 13. Make upper/lower case of simplify/SIMPLIFY consistent on all
>>> apply commands and add a simplify= arg to by.
>> It would have been good not to introduce the inconsistency years ago,
>> but it's too late to change now.
> 
> Its not too late to add it to by().
> 
> Also note that the gsubfn package does have a workaround for this.  In gsubfn
> one can preface any R function with fn$ and if that is done then the function
> can have a simplify= argument which fn$ intercepts and processes.  e.g.
> 
> library(gsubfn)
> fn$by(CO2[4:5], CO2[2], x ~ coef(lm(uptake ~ ., x)), simplify = rbind)
> 
> fn$ can also interpret formulas as functions (and does quasi perl 
> interpolation
> in strings) so the formula in the third argument is regarded to be the same
> as the anonymous function:  function(x) coef(lm(uptake ~., x)) .
> 
> More examples are in the gsubfn vignette.
> 
>>> 14. better reporting of location of errors and warnings in R CMD check.
>> This is in the works, but probably not for 2.5.x.
> 
> Great.  This will be very welcome.
> 
>>> 15. tcl tile library (needs tcl 8.5 or to be compiled in with 8.4)
>>>
>>> 16. extend aggregate to allow vector valued functions:
>>> aggregate(CO2[4:5], CO2[1:2], function(x) c(mean = mean(x), sd = sd(x)))
>>> [summaryBy in doBy package and cast in reshape package can already
>>> do similar things but this seems sufficiently fundamental that it
>>> ought to be in the base of R]
>>>
>>> 17. All OSes should support input= arg of system.
>>>
>>> My previous New Year wishlists are here:
>>>
>>> https://www.stat.math.ethz.ch/pipermail/r-devel/2006-January/035949.html
>>> https://www.stat.math.ethz.ch/pipermail/r-help/2005-January/061984.html
>>> https://www.stat.math.ethz.ch/pipermail/r-devel/2004-January/028465.html
>> To anyone still reading:
>>
>> Many of the suggestions above would improve R, but they're unlikely to
>> happen unless someone volunteers to do them.  I'd suggest picking
>> whichever one of these or some other list that you think is the highest
>> priority, and post a specific proposal to this list about how to do it.
>>  If you get a negative response or no response, move on to the next
>> one, or put it into a contributed package instead.
>>
> 
> I think it works best when contributors develop their software in
> contributed packages since it avoids squabbles with the core group.
> 
> The core group can then integrate these into R itself if it seems warranted.
> 
>> When you make the proposal, consider how much work you're asking other
>> people to do, and how much you're volunteering to do yourself.  If
>> you're asking others to do a lot, then the suggestion had better be
>> really valuable to *them*.
>>
> 
> The implementation effort should not be a significant consideration in
> generating wish lists.What should be considered is what is really needed.
> Its better to know what you need and then later decide whether to implement
> it or not than to suppress articulating the need.  Otherwise the development
> is driven by what is easy to do rather than what is needed.
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A possible improvement to apropos

2006-12-14 Thread Robert Gentleman

I would vastly prefer apropos to be case insensitive by default. The 
point of it is to find things similar to a string, not the same as, and 
given that capitalization in R is somewhat erratic (due to many authors, 
and some of those changing their minds over the years), I find the 
current apropos of little use.

I would also, personally prefer some sort of approximate matching since 
there are different ways to spell some words, and some folks abbreviate 
parts of words.


Martin Maechler wrote:
> Hi Seth,
> 
>>>>>> "Seth" == Seth Falcon <[EMAIL PROTECTED]>
>>>>>> on Wed, 13 Dec 2006 16:38:02 -0800 writes:
> 
> Seth> Hello all, I've had the following apropos alternative
> Seth> in my ~/.Rprofile for some time, and have found it
> Seth> more useful than the current version.  Basically, my
> Seth> version ignores case when searching.
> 
> Seth> If others find this useful, perhaps apropos could be
> Seth> suitably patched (and I'd be willing to create such a
> Seth> patch).
> 
> Could you live with typing 'i=T' (i.e.  ignore.case=TRUE)?
> 
> In principle, I'd like to keep the default  as ignore.case=FALSE,
> since we'd really should teach the users that R 
> *is* case sensitive.
> Ignoring case is the exception in the S/R/C world, not the rule
> 
> I have a patch ready which implements your suggestion
> (but not quite with the code below), but as said, not as
> default.
> 
> Martin
> 
> Seth> + seth
> 
> Seth> Here is my version of apropos:
> 
>>> APROPOS <- function (what, where = FALSE, mode = "any") 
>>> {
>>> if (!is.character(what))
>>>   stop("argument ", sQuote("what"), " must be a character vector")
>>> x <- character(0)
>>> check.mode <- mode != "any"
>>> for (i in seq(search())) {
>>> contents <- ls(pos = i, all.names = TRUE)
>>> found <- grep(what, contents, ignore.case = TRUE, value = TRUE)
>>> if (length(found)) {
>>> if (check.mode) {
>>> found <- found[sapply(found, function(x) {
>>> exists(x, where = i, mode = mode, inherits = FALSE)
>>>     })]
>>> }
>>> numFound <- length(found)
>>> x <- c(x, if (where)
>>>structure(found, names = rep.int(i, numFound)) else 
>>> found)
>>> }
>>> }
>>> x
>>> }
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] caching frequently used values

2006-12-13 Thread Robert Gentleman

e1 = new.env(hash=TRUE)

e1[["1"]] = whateveryouwant

ie. just transform to characters, but I don't see why you want to do 
that - surely there are more informative names to be used -



Tamas K Papp wrote:
> Hi Robert,
> 
> Thanks for your answer.  I would create and environment with
> new.env(), but how can I assign and retrieve values based on a
> numerical index (the derivative)?  The example of the help page of
> assign explicitly shows that assign("a[1]") does not work for this
> purpose.
> 
> Thanks,
> 
> Tamas
> 
> On Wed, Dec 13, 2006 at 01:54:28PM -0800, Robert Gentleman wrote:
> 
>> the idea you are considering is also, at times, referred to as 
>> memoizing. I would not use a list, but rather an environment, and 
>> basically you implement something that first looks to see if there is a 
>> value, and if not, compute and store. It can speed things up a lot in 
>> some examples (and slow them down a lot in others).
>>
>> Wikipedia amongst other sources:
>>  http://en.wikipedia.org/wiki/Memoization
>>
>> Environments have advantages over lists here (if there are lots of 
>> matrices the lookup can be faster - make sure you use hash=TRUE), and 
>> reference semantics, which you probably want.
>>
>> Tamas K Papp wrote:
>>> Hi,
>>>
>>> I am trying to find an elegant way to compute and store some
>>> frequently used matrices "on demand".  The Matrix package already uses
>>> something like this for storing decompositions, but I don't know how
>>> to do it.
>>>
>>> The actual context is the following:
>>>
>>> A list has information about a basis of a B-spline space (nodes,
>>> order) and gridpoints at which the basis functions would be evaluated
>>> (not necessarily the nodes).  Something like this:
>>>
>>> bsplinegrid <- list(nodes=1:8,order=4,grid=seq(2,5,by=.2))
>>>
>>> I need the design matrix (computed by splineDesign) for various
>>> derivatives (not necessarily known in advance), to be calculated by
>>> the function
>>>
>>> bsplinematrix <- function(bsplinegrid, deriv=0) {
>>>  x <- bsplinegrid$grid
>>>  Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order,
>>>  derivs = rep(deriv,length(x
>>> }
>>>
>>> However, I don't want to call splineDesign all the time.  A smart way
>>> would be storing the calculated matrices in a list inside bsplinegrid.
>>> Pseudocode would look like this:
>>>
>>> bsplinematrix <- function(bsplinegrid, deriv=0) {
>>>  if (is.null(bsplinegrid$matrices[[deriv+1]])) {
>>>## compute the matrix and put it in the list bsplinegrid$matrices,
>>>## but not of the local copy
>>>  }
>>>  bsplinegrid$matrices[[deriv+1]]
>>> }
>>>
>>> My problem is that I don't know how to modify bsplinegrid$matrices
>>> outside the function -- assignment inside would only modify the local
>>> copy.
>>>
>>> Any help would be appreciated -- I wanted to learn how Matrix does it,
>>> but don't know how to display the source with s3 methods (getAnywhere
>>> doesn't work).
>>>
>>> Tamas
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>> -- 
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> [EMAIL PROTECTED]
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] caching frequently used values

2006-12-13 Thread Robert Gentleman

the idea you are considering is also, at times, referred to as 
memoizing. I would not use a list, but rather an environment, and 
basically you implement something that first looks to see if there is a 
value, and if not, compute and store. It can speed things up a lot in 
some examples (and slow them down a lot in others).

Wikipedia amongst other sources:
  http://en.wikipedia.org/wiki/Memoization

Environments have advantages over lists here (if there are lots of 
matrices the lookup can be faster - make sure you use hash=TRUE), and 
reference semantics, which you probably want.

Tamas K Papp wrote:
> Hi,
> 
> I am trying to find an elegant way to compute and store some
> frequently used matrices "on demand".  The Matrix package already uses
> something like this for storing decompositions, but I don't know how
> to do it.
> 
> The actual context is the following:
> 
> A list has information about a basis of a B-spline space (nodes,
> order) and gridpoints at which the basis functions would be evaluated
> (not necessarily the nodes).  Something like this:
> 
> bsplinegrid <- list(nodes=1:8,order=4,grid=seq(2,5,by=.2))
> 
> I need the design matrix (computed by splineDesign) for various
> derivatives (not necessarily known in advance), to be calculated by
> the function
> 
> bsplinematrix <- function(bsplinegrid, deriv=0) {
>   x <- bsplinegrid$grid
>   Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order,
>   derivs = rep(deriv,length(x
> }
> 
> However, I don't want to call splineDesign all the time.  A smart way
> would be storing the calculated matrices in a list inside bsplinegrid.
> Pseudocode would look like this:
> 
> bsplinematrix <- function(bsplinegrid, deriv=0) {
>   if (is.null(bsplinegrid$matrices[[deriv+1]])) {
> ## compute the matrix and put it in the list bsplinegrid$matrices,
> ## but not of the local copy
>   }
>   bsplinegrid$matrices[[deriv+1]]
> }
> 
> My problem is that I don't know how to modify bsplinegrid$matrices
> outside the function -- assignment inside would only modify the local
> copy.
> 
> Any help would be appreciated -- I wanted to learn how Matrix does it,
> but don't know how to display the source with s3 methods (getAnywhere
> doesn't work).
> 
> Tamas
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] data frame subset patch, take 2

2006-12-13 Thread Robert Gentleman



Robert Gentleman wrote:
> Hi,
>We had the "names" discussion and, AFAIR, the idea that someone might 
> misinterpret the output as suggesting that one could index by number, 
> seemed to kill it. A more reasonable argument against is that names<- is 
> problematic.
> 
> You can use $, [[ (with character subscripts), and yes ls does sort of 
> do what you want (but sorts the values, not sure if that is good). I 
> think it is also inefficient in that I believe it copies the CHARSXP's 
> (not sure we really need to do that, but I have not had time to sort out 

  I misremembered - it does not copy CHARSXPs.

> the issues). And there is an eapply as well, so ls() is not always needed.
> 
> mget can be used to retrieve multiple values (and should be much more 
> efficient than multiple calls to get). There is no massign (no one seems 
> to have asked for it), and better design choice might be to vectorize 
> assign.
> 
> best wishes
>Robert
> 
> 
> 
> 
> 
> Vladimir Dergachev wrote:
>> On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote:
>>> Vladimir Dergachev wrote:
>>>> 2. It would be nice to have true hashed arrays in R (i.e. O(1) access
>>>> times). So far I have used named lists for this, but they are O(n):
>>> new.env(hash=TRUE) with get/assign/exists works ok.  But I suspect its
>>> just too easy to use named lists because it is easy, and that has bad
>>> performance ramifications for user code (perhaps the R developers are
>>> more vigilant about this for the R code itself).
>> Cool, thank you ! 
>>
>> I wonder whether environments could be extended to allow names() to work 
>> (altough I see that ls() does the same function) and to allow for(i in E) 
>> loops.
>>
>>thank you
>>
>>Vladimir Dergachev
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] data frame subset patch, take 2

2006-12-13 Thread Robert Gentleman

Hi,
   We had the "names" discussion and, AFAIR, the idea that someone might 
misinterpret the output as suggesting that one could index by number, 
seemed to kill it. A more reasonable argument against is that names<- is 
problematic.

You can use $, [[ (with character subscripts), and yes ls does sort of 
do what you want (but sorts the values, not sure if that is good). I 
think it is also inefficient in that I believe it copies the CHARSXP's 
(not sure we really need to do that, but I have not had time to sort out 
the issues). And there is an eapply as well, so ls() is not always needed.

mget can be used to retrieve multiple values (and should be much more 
efficient than multiple calls to get). There is no massign (no one seems 
to have asked for it), and better design choice might be to vectorize 
assign.

best wishes
   Robert

Vladimir Dergachev wrote:
> On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote:
>> Vladimir Dergachev wrote:
>>> 2. It would be nice to have true hashed arrays in R (i.e. O(1) access
>>> times). So far I have used named lists for this, but they are O(n):
>> new.env(hash=TRUE) with get/assign/exists works ok.  But I suspect its
>> just too easy to use named lists because it is easy, and that has bad
>> performance ramifications for user code (perhaps the R developers are
>> more vigilant about this for the R code itself).
> 
> Cool, thank you ! 
> 
> I wonder whether environments could be extended to allow names() to work 
> (altough I see that ls() does the same function) and to allow for(i in E) 
> loops.
> 
>thank you
> 
>Vladimir Dergachev
> 
> ______
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] data frame subset patch, take 2

2006-12-12 Thread Robert Gentleman

Hi,
  I tried take 1, and it failed. I have been traveling (and with 
Martin's changes also waiting for things to stabilize) before trying 
take 2, probably later this week and I will send an email if it goes in. 
Anyone wanting to try it and run R through check and check-all is 
welcome to do so and report success or failure.

  best wishes
Robert


Martin Maechler wrote:
>>>>>> "Marcus" == Marcus G Daniels <[EMAIL PROTECTED]>
>>>>>> on Tue, 12 Dec 2006 09:05:15 -0700 writes:
> 
> Marcus> Vladimir Dergachev wrote:
> >> Here is the second iteration of data frame subset patch.
> >> It now passes make check on both 2.4.0 and 2.5.0 (svn as
> >> of a few days ago).  Same speedup as before.
> >> 
> Marcus> Hi,
> 
> Marcus> I was wondering if this patch would make it into the
> Marcus> next release.  I don't see it in SVN, but it's hard
> Marcus> to be sure because the mailing list apparently
> Marcus> strips attachments.  If it isn't in, or going to be
> Marcus> in, is this patch available somewhere else?
> 
> I was wondering too.
>   http://www.r-project.org/mail.html
> explains what kind of attachments are allowed on R-devel.
> 
> I'm particularly interested, since during the last several days
> I've made (somewhat experimental) changes to R-devel,
> which makes some dealings with large data frames that have
> "trivial rownames" (those represented as  1:nrow(.))
> much more efficient.
> 
> Notably, as.matrix() of such data frames now no longer produces
> huge row names, and e.g.  dim(.) of such data frames has become
> lightning fast [compared to what it was].
> 
> Some measurements:
> 
> N <- 1e6
> set.seed(1)
> ## we round (for later dump().. reasons)
> x <- round(rnorm(N),2)
> y <- round(rnorm(N),2)
> mOrig <- cbind(x = x, y = y)
> df <- data.frame(x = x, y = y)
> mNew <- as.matrix(df)
> (sizes <- sapply(list(mOrig=mOrig, df=df, mNew=mNew), object.size))
> ## R-2.4.0 (64-bit):
> ##mOrig   df mNew
> ## 16000520 16000776 72000560
> 
> ## R-2.4.1 beta (32-bit):
> ##mOrig   df mNew
> ## 16000296 16000448 52000320
> 
> ## R-pre-2.5.0 (32-bit):
> ##mOrig   df mNew
> ## 16000296 16000448 16000296
> 
> ##
> 
> N <- 1e6
> df <- data.frame(x = 0+ 1:N, y = 1+ 1:N)
> system.time(for(i in 1:1000) d <- dim(df))
> 
> ## R-2.4.1 beta (32-bit) [deb1]:
> ## [1] 1.920 3.748 7.810 0.000 0.000
> 
> ## R-pre-2.5.0 (32-bit) [deb1]:
> ##user  system elapsed
> ##   0.012   0.000   0.011
> 
> 
> --- --- --- --- --- --- --- --- --- --- 
> 
> However, currently
> 
>   df[2,] ## still internally produces the  character(1e6)  row names!
> 
> something I think we should eliminate as well,
> i.e., at least make sure that only  seq_len(1e6) is internally
> produced and not the character vector.
> 
> Note however that some of these changes are backward
> incompatible. I do hope that the changes gaining efficiency
> for such large data frames are worth some adaption of
> current/old R source code..
> 
> Feedback on this topic is very welcome!
> 
> Martin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error condition in evaluating a promise

2006-10-18 Thread Robert Gentleman



Simon Urbanek wrote:
> Seth,
> 
> thanks for the suggestions.
> 
> On Oct 18, 2006, at 11:23 AM, Seth Falcon wrote:
> 
>> Simon Urbanek <[EMAIL PROTECTED]> writes:
>>> thanks, but this is not what I want (the symbols in the environment
>>> are invisible outside) and it has nothing to do with the question I
>>> posed: as I was saying in the previous e-mail the point is to have
>>> exported variables in a namespace, but their value is known only
>>> after the namespace was attached (to be precise I'm talking about
>>> rJava here and many variables are valid only after the VM was
>>> initialized - using them before is an error).
>> We have a similar use case and here is one workaround:
>>
>> Define an environment in your name space and use it to store the
>> information that you get after VM-init.
>>
>> There are a number of ways to expose this:
>>
>> * Export the env and use vmEnv$foo
>>
>> * Provide accessor functions, getVmFooInfo()
>>
>> * Or you can take the accessor function approach a bit further to make
>>   things look like a regular variable by using active bindings.  I can
>>   give more details if you want.  We are using this in the BSgenome
>>   package in BioC.
>>
> 
> I'm aware of all three solutions and I've tested all three of them  
> (there is in fact a fourth one I'm actually using, but I won't go  
> into detail on that one ;)). Active bindings are the closest you can  
> get, but then the value is retrieved each time which I would like to  
> avoid.
> 
> The solution with promises is very elegant, because it guarantees  
> that on success the final value will be locked. It also makes sense  
> semantically, because the value is determined by code bound to the  
> variable and premature evaluation is an error - just perfect.
> 
> Probably I should have been more clear in my original e-mail - the  
> question was not to find a work-around, I have plenty of them ;), the  
> question was whether the behavior of promises under error conditions  
> is desirable or not (see subject ;)). For the internal use of  
> promises it is irrelevant, because promises as function arguments are  
> discarded when an error condition arises. However, if used in the  
> "wild", the behavior as described would be IMHO more useful.
> 

   Promises were never intended for use at the user level, and I don't 
think that they can easily be made useful at that level without exposing 
a lot of stuff that cannot easily be explained/made bullet proof.  As 
Brian said, you have not told us what you want, and I am pretty sure 
that there are good solutions available at the R level for most problems.

   Although the discussion has not really started, things like dispatch 
in the S4 system are likely to make lazy evaluation a thing of the past 
since it is pretty hard to dispatch on class without knowing what the 
class is. That means, that as we move to more S4 methods/dispatch we 
will be doing more evaluation of arguments.

best wishes
  Robert


> Cheers,
> Simon
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: names(someEnv) same as ls(someEnv)

2006-10-15 Thread Robert Gentleman



Duncan Murdoch wrote:
> On 10/15/2006 2:48 PM, Seth Falcon wrote:
>> Hi,
>>
>> I would be nice if names() returned the equivalent of ls() for
>> environments.
> 
> Wouldn't that just confuse people into thinking that environments are 
> vectors?  Wouldn't it then be reasonable to assume that 
> env[[which(names(env) == "foo")]] would be a synonym for env$foo?

  absolutely not - environments can only be subscripted by name, not by 
logicals or integer subscripts - so I hope that most users would figure 
that one out


> 
> I don't see why this would be nice:  why not just use ls()?

   why? environments do get used, by many as vectors (well hash tables), 
modulo the restrictions on subscripting and the analogy is quite useful 
and should be encouraged IMHO.

  Robert

> 
> Duncan Murdoch
> 
>> --- a/src/main/attrib.c
>> +++ b/src/main/attrib.c
>> @@ -687,6 +687,8 @@ SEXP attribute_hidden do_names(SEXP call
>>  s = CAR(args);
>>  if (isVector(s) || isList(s) || isLanguage(s))
>> return getAttrib(s, R_NamesSymbol);
>> +if (isEnvironment(s))
>> +return R_lsInternal(s, 0);
>>  return R_NilValue;
>>  }
>>
>>
>> + seth
>>
>> --
>> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
>> http://bioconductor.org
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Strange behaviour of the [[ operator

2006-09-30 Thread Robert Gentleman

True, re name matching, but I think we might want to consider a warning 
if they are supplied as the user may not be getting what they expect, 
regardless of the documentation


Peter Dalgaard wrote:
> Seth Falcon <[EMAIL PROTECTED]> writes:
> 
>>> Similar things happen in many similar circumstances.
>> Here's a similar thing:
> 
> Not really, no?
>  
>>> v <- 1:5
>>> v
>> [1] 1 2 3 4 5
>>> v[mustBeDocumentedSomewhere=3]
>> [1] 3
>>
>> And this can be confusing if one thinks that subsetting is really a
>> function and behaves like other R functions w.r.t. to treatment of
>> named arguments:
>>
>>> m <- matrix(1:4, nrow=2)
>>> m
>>  [,1] [,2]
>> [1,]13
>> [2,]24
>>> m[j=2]
>> [1] 2
> 
> Or even
>> m[j=2,i=]
> [1] 2 4
> 
> However, what would the argument names be in the >2-dim case? i, j are
> used only in help("[") and that page is quite specific about
> explaining that named matching doesn't work. 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Question about substitute() and function def

2006-09-14 Thread Robert Gentleman



Duncan Murdoch wrote:
> On 9/14/2006 3:01 PM, Seth Falcon wrote:
>> Hi,
>>
>> Can someone help me understand why
>>
>>   substitute(function(a) a + 1, list(a=quote(foo)))
>>
>> gives
>>
>>   function(a) foo + 1
>>
>> and not
>>
>>   function(foo) foo + 1
>>
>> The man page leads me to believe this is related to lazy evaluation of
>> function arguments, but I'm not getting the big picture.
> 
> I think it's the same reason that this happens:
> 
>  > substitute(c( a = 1, b = a), list(a = quote(foo)))
> c(a = 1, b = foo)
> 
> The "a" in function(a) is the name of the arg, it's not the arg itself 

yes, but the logic seems to be broken. In Seth's case there seems to be 
no way to use substitute to globally change an argument and all 
instances throughout a function, which seems like a task that would be 
useful.

even here, I would have expected all instances of a to change, not some

> (which is missing).  Now a harder question to answer is why this happens:
> 
>  > substitute(function(a=a) 1, list(a=quote(foo)))
> function(a = a) 1

   a bug for sure


> I would have expected to get "function(a = foo) 1".
> 
> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] Bug/problem reporting: Possible to modify posting guide FAQ?

2006-08-28 Thread Robert Gentleman

Hi,
   I guess the question often comes down to whether it is a bug report, 
or a question. If you know it is a bug, and have a complete and correct 
example where the obviously incorrect behavior occurs and you are 
positive that the problem is the package then sending it to the 
maintainer is appropriate.  When I get these I try to deal with them. 
Real bug reports that go to the mailing list may be missed so in my 
opinion it would be best to cc the maintainer and we will amend the FAQ 
in that direction. If instead you are asking a question, of the form, is 
this a bug, or why is this happening, then for BioC at least, it is 
better to post directly to the list, as there are many folks who can 
help and you are more likely to get an answer.  When I get one of these 
emails I always refer the person to the mailing lists.  I see little 
problem with being redirected by a maintainer to the mailing list if 
they feel that the question is better asked there.

Bioconductor is different from R, clearly our mailing list has to be 
more about the constituent packages, since we will direct questions 
about R to the appropriate R mailing lists.  R mailing lists tend to be 
about R, so asking about a specific package there (among the 1000 or so) 
often does not get you very far, but sometimes it does.

  best wishes
Robert

Steven McKinney wrote:
> If users post a bug or problem issue to an R-based news group
> (R-devel, R-help, BioC - though BioC is far more forgiving)
> they get yelled at for not reading the posting guide
> and FAQ.
> 
> "Please *_do_* read the FAQ, the posting guide, ..."
> the yellers do say.  So I read the BioC FAQ and it says...
> 
> http://www.bioconductor.org/docs/faq/
> 
> "Bug reports on packages should perhaps be sent to the 
>  package maintainer rather than to r-bugs."
> 
> 
> So I send email to a maintainer, who I believe rightly points out
> 
>"best to send this kind of questions to the bioc mailing list, rather
> than to myself privately, because other people might (a) also have
> answers or (b) benefit from the questions & answers."
> 
> Could the FAQ possibly be revised to some sensible combination
> that generates less finger pointing, such as
> 
>"Bug reports on packages should be sent to the Bioconductor mailing list, 
> and sent or copied to the package maintainer, rather than to r-bugs."
> 
> or
> 
>"Bug reports on packages should be sent to the package maintainer, 
> and copied to the Bioconductor mailing list, rather than to r-bugs."
> 
> 
> Could the posting guides to R-help and R-devel do something
> similar?
> 
> 
> Sign me
> 
> 
> 
> http://www.r-project.org/posting-guide.html
> 
>  "If the question relates to a contributed package , e.g., one downloaded 
>   from CRAN, try contacting the package maintainer first. You can also 
>   use find("functionname") and packageDescription("packagename") to 
>   find this information. Only send such questions to R-help or R-devel if 
>   you get no reply or need further assistance. This applies to both 
>   requests for help and to bug reports."
> 
> 
> How about
> 
> If the question relates to a contributed package , e.g., one downloaded 
> from CRAN, email the list and be sure to additionally send to or copy to 
> the package maintainer as well. You can use find("functionname") 
> and packageDescription("packagename") to find this information. 
> Only send such questions to one of R-help or R-devel. This applies to both 
> requests for help and to bug reports.
> 
> ______
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 'object is not subsettable' in prototype

2006-08-21 Thread Robert Gentleman

Prof Brian Ripley wrote:
> On Mon, 21 Aug 2006, Seth Falcon wrote:
> 
>> John Chambers <[EMAIL PROTECTED]> writes:
>>
>>> When I was introducing the special type for S4 objects, my first
>>> inclination was to have length(x) for those objects be either NA or an
>>> error, along the lines that intuitively length(x) means "the number of
>>> elements in the vector-style object x".  However, that change quickly
>>> was demonstrated to need MANY revisions to the current code.
>> Perhaps some details on the required changes will help me see the
>> light, but I would really like to see length(foo) be an error (no such
>> method) when foo is an arbitary S4 class.
> 
> According to the Blue Book p.96 every S object has a length and 'An 
> Introduction to R' repeats this.  So I believe an error is not an option.  
> Indeed, from the wording, I think code could legitimately assume length(x) 
> works and 0 <= length(x) and it is an integer (but not necessarily of type 
> 'integer').
> 
> Certainly functions and formulae have a length (different for functions in 
> S and R, as I recall), and they are not 'vector-style'.

   Yes, but that is because that in S(-Plus), and not in R, virtually 
every object was an instance of a "generic vector", including functions 
(formulas were white book, not blue, and I'm still not sure that 
indexing of them makes sense, but I am sure that indexing functions does 
not; it suggests, at least to me, that we want to emphasize 
implementation over semantics).

   Now, in R, since not everything is a generic vector, it is less clear 
what to do in some cases, and I am not going to argue too hard against 
everything having a length, but I think the number 1 is a much better 
choice than the number 0.  (the compromise solution of 0.5 has some 
charm :-)

   I am also scared that such reasoning will lead one to believe that 
indexing these things using [, or similar should work, and that leads to 
major problems, since I lost the argument about not indexing outside of 
array bounds some years ago. What would be sensible in that case? 
Certainly not what currently happens with S4 objects (in R release).

   best wishes
 Robert

> 
>> I have encountered bugs due to accidental dispatch -- functions
>> returning something other than an error because of the zero-length
>> list implementation of S4.  It would not be surprising if some of the
>> breakage caused by removing this "feature" identifies real bugs.
>>
>> I was thinking that one of the main advatnages of the new S4 type was
>> to get away from this sort of accidental dispatch.  Not trying to be
>> snide, but what is useful about getting a zero for length(foo)?  The
>> main use I can think of is in trying to identify S4 instances, but
>> happily, that is no longer needed.
>>
>> + seth
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] configure on mac

2006-08-13 Thread Robert Gentleman

Hi,
  I think Simon and Stefano are both off line for a little while. I can 
confirm that an upgrade of Xcode to either 2.3 or the very recent 2.4 is 
needed in most cases, either seems to work so probably 2.4 is the better 
choice.

   best wishes
 Robert


Prof Brian Ripley wrote:
> I gather you need to update your Xtools: others have had similar problems.
> (If they are online you will no doubt get more complete information.)
> 
> On Sat, 12 Aug 2006, roger koenker wrote:
> 
>> I'm having trouble making yesterday's R-devel on my macs.
>>
>> ./configure seems fine, but eventually in make I get:
>>
>> gcc -dynamiclib -Wl,-macosx_version_min -Wl,10.3 -undefined  
>> dynamic_lookup -single_module -multiply_defined suppress -L/sw/lib -L/ 
>> usr/local/lib -install_name libR.dylib -compatibility_version 2.4.0  - 
>> current_version 2.4.0  -headerpad_max_install_names -o libR.dylib  
>> Rembedded.o CConverters.o CommandLineArgs.o Rdynload.o Renviron.o  
>> RNG.o apply.o arithmetic.o apse.o array.o attrib.o base.o bind.o  
>> builtin.o character.o coerce.o colors.o complex.o connections.o  
>> context.o cov.o cum.o dcf.o datetime.o debug.o deparse.o deriv.o  
>> dotcode.o dounzip.o dstruct.o duplicate.o engine.o envir.o errors.o  
>> eval.o format.o fourier.o gevents.o gram.o gram-ex.o graphics.o  
>> identical.o internet.o iosupport.o lapack.o list.o localecharset.o  
>> logic.o main.o mapply.o match.o memory.o model.o names.o objects.o  
>> optim.o optimize.o options.o par.o paste.o pcre.o platform.o plot.o  
>> plot3d.o plotmath.o print.o printarray.o printvector.o printutils.o  
>> qsort.o random.o regex.o registration.o relop.o rlocale.o saveload.o  
>> scan.o seq.o serialize.o size.o sort.o source.o split.o sprintf.o  
>> startup.o subassign.o subscript.o subset.o summary.o sysutils.o  
>> unique.o util.o version.o vfonts.o xxxpr.o   `ls ../appl/*.o ../nmath/ 
>> *.o ../unix/*.o  2>/dev/null|grep -v /ext-` -framework vecLib - 
>> lgfortran -lgcc_s -lSystemStubs -lmx -lSystem  ../extra/zlib/ 
>> libz.a ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a  -lintl - 
>> liconv -Wl,-framework -Wl,CoreFoundation -lreadline  -lm -liconv
>> /usr/bin/libtool: unknown option character `m' in: -macosx_version_min
>> Usage: /usr/bin/libtool -static [-] file [...] [-filelist listfile 
>> [,dirname]] [-arch_only arch] [-sacLT]
>> Usage: /usr/bin/libtool -dynamic [-] file [...] [-filelist listfile 
>> [,dirname]] [-arch_only arch] [-o output] [-install_name name] [- 
>> compatibility_version #] [-current_version #] [-seg1addr 0x#] [- 
>> segs_read_only_addr 0x#] [-segs_read_write_addr 0x#] [-seg_addr_table  
>> ] [-seg_addr_table_filename ] [-all_load]  
>> [-noall_load]
>> make[3]: *** [libR.dylib] Error 1
>> make[2]: *** [R] Error 2
>> make[1]: *** [R] Error 1
>> make: *** [R] Error 1
>>
>> This was ok  as of my last build which was:
>>
>>  > version
>> _
>> platform   powerpc-apple-darwin8.7.0
>> arch   powerpc
>> os darwin8.7.0
>> system powerpc, darwin8.7.0
>> status Under development (unstable)
>> major  2
>> minor  4.0
>> year   2006
>> month  07
>> day28
>> svn rev38710
>> language   R
>> version.string R version 2.4.0 Under development (unstable)  
>> (2006-07-28 r38710)
>>
>> url:www.econ.uiuc.edu/~roger    Roger Koenker
>> email   [EMAIL PROTECTED]   Department of Economics
>> vox:217-333-4558University of Illinois
>> fax:217-244-6678Champaign, IL 61820
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] HTTP User-Agent header

2006-07-31 Thread Robert Gentleman

should appear at an R-devel near you...
thanks Seth


Seth Falcon wrote:
> Robert Gentleman <[EMAIL PROTECTED]> writes:
>> OK, that suggests setting at the options level would solve both of your 
>> problems and that seems like the best approach. I don't really want to 
>> pass this around as a parameter through the maze of functions that might 
>> actually download something if we don't have to.
> 
> I have an updated patch that adds an HTTPUserAgent option.  The
> default is a string like:
> 
> R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
> 
> If the HTTPUserAgent option is NULL, no user agent header is added to
> HTTP requests (this is the current behavior).  This option allows R to
> use an arbitrary user agent header.
> 
> The patch adds two non-exported functions to utils: 
>1) defaultUserAgent - returns a string like above
>2) makeUserAgent - formats content of HTTPUserAgent option for use
>   as part of an HTTP request header.
> 
> I've tested on OSX and Linux, but not on Windows.  When USE_WININET is
> defined, a user agent string of "R" was already being used.  With this
> patch, the HTTPUserAgent options is used.  I'm unsure if NULL is
> allowed.
> 
> Also, in src/main/internet.c there is a comment:
>   "Next 6 are for use by libxml, only"
> and then a definition for R_HTTPOpen.  Not sure how/when these get
> used.  The user agent for these calls remains unspecified with this
> patch.
> 
> + seth
> 
> 
> Patch summary:
>  src/include/R_ext/R-ftp-http.h   |2 +-
>  src/include/Rmodules/Rinternet.h |2 +-
>  src/library/base/man/options.Rd  |5 +
>  src/library/utils/R/readhttp.R   |   25 +
>  src/library/utils/R/zzz.R|3 ++-
>  src/main/internet.c  |2 +-
>  src/modules/internet/internet.c  |   37 +
>  src/modules/internet/nanohttp.c  |8 ++--
>  8 files changed, 66 insertions(+), 18 deletions(-)
> 
> 
> 
> Index: src/include/R_ext/R-ftp-http.h
> ===
> --- src/include/R_ext/R-ftp-http.h(revision 38715)
> +++ src/include/R_ext/R-ftp-http.h(working copy)
> @@ -36,7 +36,7 @@
>  int   R_FTPRead(void *ctx, char *dest, int len);
>  void  R_FTPClose(void *ctx);
>  
> -void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, int 
> cacheOK);
> +void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, const 
> char *headers, int cacheOK);
>  int  RxmlNanoHTTPRead(void *ctx, void *dest, int len);
>  void RxmlNanoHTTPClose(void *ctx);
>  int  RxmlNanoHTTPReturnCode(void *ctx);
> Index: src/include/Rmodules/Rinternet.h
> ===
> --- src/include/Rmodules/Rinternet.h  (revision 38715)
> +++ src/include/Rmodules/Rinternet.h  (working copy)
> @@ -9,7 +9,7 @@
>  typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
>  typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, 
> char *mode); 
>  
> -typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
> +typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, 
> const int cacheOK);
>  typedef int(*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
>  typedef void   (*R_HTTPCloseRoutine)(void *ctx);
> 
> Index: src/main/internet.c
> ===
> --- src/main/internet.c   (revision 38715)
> +++ src/main/internet.c   (working copy)
> @@ -129,7 +129,7 @@
>  {
>  if(!initialized) internet_Init();
>  if(initialized > 0)
> - return (*ptr->HTTPOpen)(url, 0);
> + return (*ptr->HTTPOpen)(url, NULL, 0);
>  else {
>   error(_("internet routines cannot be loaded"));
>   return NULL;
> Index: src/library/utils/R/zzz.R
> ===
> --- src/library/utils/R/zzz.R (revision 38715)
> +++ src/library/utils/R/zzz.R (working copy)
> @@ -9,7 +9,8 @@
>   internet.info = 2,
>   pkgType = .Platform$pkgType,
>   str = list(strict.width = "no"),
> - example.ask = "default")
> + example.ask = "default",
> + HTTPUserAgent = defaultUserAgent())
>  extra <-
>  if(.Platform$OS.type == "windows") {
>  list(mailer = "none",
> Index: src/library/utils/R/readhttp.R
> ===
> --- src/

Re: [Rd] [R] HTTP User-Agent header

2006-07-28 Thread Robert Gentleman

OK, that suggests setting at the options level would solve both of your 
problems and that seems like the best approach. I don't really want to 
pass this around as a parameter through the maze of functions that might 
actually download something if we don't have to.

I think we can provide something early next week on R-devel for folks to 
test. But I suspect that as Henrik also does, the set of sites that will 
refuse us with a User-Agent header will be much larger than those that 
James has found that refuse us without it.

best wishes
   Robert


Henrik Bengtsson wrote:
> On 7/28/06, Robert Gentleman <[EMAIL PROTECTED]> wrote:
>> I wonder if it would not be better to make the user agent string
>> something that is configurable (at the time R is built) rather than at
>> run time. This would make Seth's patch about 1% as long. Or this could
>> be handled as an option. The patches are pretty extensive and allow for
>> setting the agent header by setting parameters in function calls (eg
>> download.files). I am not sure there is a good use case for that level
>> of flexibility and the additional code is substantial.
>>
>>
>> The issue that I think arises is that there are potentially other
>> systems that will be unhappy with R's identification of itself and so
>> some users may also need to turn it off.
>>
>> Any strong opinions?
> 
> Actually two:
> 
> 1) If you wish to pull down (read extract from HTML or similar) live
> data from the web, you might want to be able to "immitate" a certain
> browser.  For instance, if you tell some webserver you're a simple
> "mobile phone" or "lynx", you might be able get back very clean data.
> Some servers might also block unknown web browsers.
> 
> 2) If the webserver of a package reprocitory decided to make use of
> the user-agent string to decide what version of the reprocitory it
> should deliver, I would like to be able to trick the server.  Why?
> Many times I found myself working on a system where I do not have the
> rights to update to the latest or the developers version of R.
> However, although I have not the very latest version of R you can do
> work.  For instance, in Bioconductor the biocLite() & co gives you
> either the stable or the developers of Bioconductor depending on your
> R version, but looking into the biocLite() code and beyond, you find
> that you actually can install a Bioconductor v1.9 package in R v2.3.1.
>  It can be risky business, but if you know what you're doing, it can
> save your day (or week).
> 
> Cheers
> 
> Henrik
> 
>>
>> James P. Howard, II wrote:
>>> On 7/28/06, Seth Falcon <[EMAIL PROTECTED]> wrote:
>>>
>>>> I have a rough draft patch, see below, that adds a User-Agent header
>>>> to HTTP requests made in R via download.file.  If there is interest, I
>>>> will polish it.
>>> It looks right, but I am running under Windows without a compiler.
>>>
>> --
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> [EMAIL PROTECTED]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] HTTP User-Agent header

2006-07-28 Thread Robert Gentleman



Prof Brian Ripley wrote:
> On Fri, 28 Jul 2006, Robert Gentleman wrote:
> 
>> I wonder if it would not be better to make the user agent string 
>> something that is configurable (at the time R is built) rather than at 
>> run time. This would make Seth's patch about 1% as long. Or this could 
>> be handled as an option. The patches are pretty extensive and allow for 
>> setting the agent header by setting parameters in function calls (eg 
>> download.files). I am not sure there is a good use case for that level 
>> of flexibility and the additional code is substantial.
>>
>>
>> The issue that I think arises is that there are potentially other 
>> systems that will be unhappy with R's identification of itself and so 
>> some users may also need to turn it off.
> 
> I also thought that there was no need for this level of complexity. 
> (BTW, some of the patch is changes Seth has made for other purposes, e.g. 
> that to memory.c, so please no one apply all of it.)
> 
> I'd be happy for R to just identify itself as 'R', which seems allowed:
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html).  But I am a bit 
> concerned that sites may not just require the field but also require a 
> particular format (even though W3C does not).

   For those of use that want to monitor downloads and get an idea of 
the size of the user base for different platforms (which helps to 
allocate resources) I think that we should try to include a bit more 
information.
  I could probably live with as little as R version, but would like to 
have OS there as well...

   best wishes
Robert


> 
>> Any strong opinions?
>>
>>
>>
>> James P. Howard, II wrote:
>>> On 7/28/06, Seth Falcon <[EMAIL PROTECTED]> wrote:
>>>
>>>> I have a rough draft patch, see below, that adds a User-Agent header
>>>> to HTTP requests made in R via download.file.  If there is interest, I
>>>> will polish it.
>>> It looks right, but I am running under Windows without a compiler.
>>>
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] HTTP User-Agent header

2006-07-28 Thread Robert Gentleman

I wonder if it would not be better to make the user agent string 
something that is configurable (at the time R is built) rather than at 
run time. This would make Seth's patch about 1% as long. Or this could 
be handled as an option. The patches are pretty extensive and allow for 
setting the agent header by setting parameters in function calls (eg 
download.files). I am not sure there is a good use case for that level 
of flexibility and the additional code is substantial.

The issue that I think arises is that there are potentially other 
systems that will be unhappy with R's identification of itself and so 
some users may also need to turn it off.

Any strong opinions?

James P. Howard, II wrote:
> On 7/28/06, Seth Falcon <[EMAIL PROTECTED]> wrote:
> 
>> I have a rough draft patch, see below, that adds a User-Agent header
>> to HTTP requests made in R via download.file.  If there is interest, I
>> will polish it.
> 
> It looks right, but I am running under Windows without a compiler.
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] proposed modifications to deprecated

2006-04-27 Thread Robert Gentleman

Hi,
   Over the past six months we have had a few problems with deprecation 
and Seth Falcon and I want to propose a few additions to the mechanism 
that will help deal with cases other than the deprecation of functions.

  In the last release one of the arguments to La.svd was deprecated, but 
the warning message was very unclear and suggested that in fact La.svd 
was deprecated.
   Adding a third argument to .Deprecated, msg say (to be consistent 
with the internal naming mechanism) that contains the message string 
would allow for handling the La.svd issue in a more informative way. It 
is a strict addition so no existing code is likely to be broken.

   We also need to deprecate data from time to time. Since the field of 
genomics is moving fast as good example from five years ago is often no 
longer a good example today. This one is a bit harder, but we can modify
   tools:::.make_file_exts("data")

   to first look for a ".DEP" extension (this does not seem to be a 
widely used extension), and if such a file exists, ie NameofData.DEP
  one of two things happens: if it contains a character string we use 
that for the message (we could source it for the message?), if not print 
a standard message (just as .Deprecated does) and then continue with the 
search using the other file extensions.

   Defunct could be handled similarly.

  Comments, alternative suggestions?

  thanks
    Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-21 Thread Robert Gentleman

Kurt Hornik wrote:
>>>>>>Simon Urbanek writes:
> 
> 
>>On Apr 20, 2006, at 1:23 PM, Henrik Bengtsson (max 7Mb) wrote:
>>
>>>Is it a general consensus on R-devel that *.tar.gz distributions  
>>>should only be treated as a distribution for *building* packages  
>>>and not for developing them?
> 
> 
> [Actually, distributing so that they can be installed and used.]
> 
> 
>>I don't know whether this is a general consensus, but it definitely  
>>an important distinction. Some authors put their own Makefiles in src  
>>although they are not needed and in fact harmful, preventing the  
>>package to build on other systems - only because they are too lazy to  
>>use R building mechanism for development and don't make the above  
>>distinction.
> 
> 
> Right :-)
> 
> Henrik, as I think I mentioned the last time you asked about this: of
> course you can basically do everything you want.  But it comes at a
> price.  For external sources, you need to write a Makefile of your own,
> so as to make it clear that you provide a mechanism which is different
> from the standard one.  And, as Simon said, the gain in flexibility
> comes at a price.
> 
> Personally and as one of the CRAN maintainers, I'd be very unhappy if
> package maintainers would start flooding their source .tar.gz packages
> with full development environment material.  (I am also rather unhappy
> about shipping large data sets which are only used for instructional
> purposes [rather than providing the data set "on its own"].)  It is
> simply not true that bandwidth does not matter.

   I can see the problem with large packages, but the current system 
does nothing about that AFAIC. And as Simon indicated, his biggest 
problem is the one set of files that we are allowed - so the argument is 
that the current approach is neither necessary nor sufficient and it 
imposes a structure on people that seems to be unneccearily restrictive. 
I don't see how excluding README (or any thing else that a package 
maintainer has put there) makes life better, but maybe I am missing 
something here. These are precisely the sorts of things that have helped 
me to figure out what was intended when it didn't work. So this approach 
is regressive, IMHO.

  If the size is not large, who cares what is in a package, and things 
releated to source should be in src. I see that a similar approach is 
being taken with the R directory (and probably other directories).  This 
is, in my opinion, unfortunate, imposing restrictions that don't solve 
the problem mentioned in some general way are not useful.

  For BioC, we manually check the size etc and ask people to reduce and 
remove. You could easily do the same at CRAN (and even automate it). 
BioC packages can be enormous relative to those on CRAN and I don't 
think we have ever had a serious complaint about it. But then the data 
sets tend to be large, so maybe people are just more forgiving.

  As for the difference between source packages and built packages, yes 
it would be nice at some time to enter into a discussion on that topic. 
There are lots of things that can be done at build time (that are not 
currently being done) that would speed up package installation etc. But 
they come at the price that Henrik has mentioned. The built package is 
no longer suitable for development. And hence we may usefully consider 
another format (something between source and binary, .Rgz?)

  best wishes
Robert

> 
> If there is need, we could start having developer-package repositories.
> However, I'd prefer a different approach.  We're currently in the
> process of updating the CRAN server infrastructure, and should be able
> to start deploying an R-forge project hosting service "eventually"
> (hopefully, we can set things up during the summer).  This should
> provide us with an ideal infrastructure for sharing developer resources,
> in particular as we could add QC testing et al to the standard community
> services.
> 
> Best
> -k
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-20 Thread Robert Gentleman

Hi,

  Well, I guess if someone thinks they know how I am going to configure 
and build the sources needed to construct appropriate dynamic libraries 
so well that they can feel free to exclude files at their whim at 
install time, perhaps they could feel just as free to exclude them at 
build time?

This makes no sense to me and certainly does not solve the size problem 
mentioned by Brian. If there is a single example of something that was 
better this way, I would be interested to hear it. I can think of 
several things that are worse.

best wishes
   Robert


Roger Bivand wrote:
> On Thu, 20 Apr 2006, Robert Gentleman wrote:
> 
> 
>>I disagree, things like README files and other objects are important and 
>>should be included. I don't see the real advantage to such warnings, if 
>>someone wants them they could be turned on optionally.
> 
> 
> Isn't the point at least partly that all those files are lost on 
> installation? If the README is to be accessible after installation, it can 
> be placed under inst/, so that both users reading the source and installed 
> versions can access it. So maybe the warning could be re-phrased to 
> suggest use of the inst/ tree for files with important content?
> 
> Best wishes,
> 
> Roger
> 
> 
>>If size is an issue then authors should be warned that their package is 
>>large (in the top 1% at CRAN would be useful to some). I also find it 
>>helpful to know whose packages take forever to build, which we don't do.
>>
>>Just because someone put something in TFM doesn't mean it is either a 
>>good idea or sensible, in my experience.
>>
>>best wishes
>>   Robert
>>
>>
>>Prof Brian Ripley wrote:
>>
>>>On Wed, 19 Apr 2006, James Bullard wrote:
>>>
>>>
>>>
>>>>Hello, I am having an issue with R CMD check with the nightly build of
>>>>RC 2.3.0 (listed in the subject.)
>>>
>>>
>>>This is all explained in TFM, `Writing R Extensions'.
>>>
>>>
>>>
>>>>The problem is this warning:
>>>>
>>>>* checking if this is a source package ... WARNING
>>>>Subdirectory 'src' contains:
>>>> README _Makefile
>>>>These are unlikely file names for src files.
>>>>
>>>>In fact, they are not source files, but I do not see any reason why they
>>>>cannot be there, or why I need to be warned of their presence.
>>>>Potentially I could be informed of their presence, but that is another
>>>>matter.
>>>
>>>
>>>Having unnecessary files in other people's packages just waste space and 
>>>download bandwidth for each one of the users.
>>>
>>>
>>>
>>>>Now, I only get this warning when I do:
>>>>
>>>>R CMD build affxparser
>>>>R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz
>>>>
>>>>If I do:
>>>>
>>>>R CMD check -l ~/R-packages affxparser
>>>>
>>>>I do not get the warning. Is this inconsistent, or is there rationale
>>>>behind this? I think the warning is inappropriate, or at the least a
>>>>little restrictive. It seems as if I should be able to put whatever I
>>>>want in there, especially the _Makefile as I like to build test programs
>>>>directly and I want to be able to build exactly what I check out from
>>>>my source code repository without having to copy files in and out.
>>>
>>>
>>>All described in TFM, including how to set defaults for what is checked.
>>>
>>>
>>>
>>>>The output from R CMD check is below. Any insight would be appreciated.
>>>>As always thanks for your patience.
>>>
>>>
>>>[...]
>>>
>>>
>>
>>
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-20 Thread Robert Gentleman

I disagree, things like README files and other objects are important and 
should be included. I don't see the real advantage to such warnings, if 
someone wants them they could be turned on optionally.

If size is an issue then authors should be warned that their package is 
large (in the top 1% at CRAN would be useful to some). I also find it 
helpful to know whose packages take forever to build, which we don't do.

Just because someone put something in TFM doesn't mean it is either a 
good idea or sensible, in my experience.

best wishes
   Robert


Prof Brian Ripley wrote:
> On Wed, 19 Apr 2006, James Bullard wrote:
> 
> 
>>Hello, I am having an issue with R CMD check with the nightly build of
>>RC 2.3.0 (listed in the subject.)
> 
> 
> This is all explained in TFM, `Writing R Extensions'.
> 
> 
>>The problem is this warning:
>>
>>* checking if this is a source package ... WARNING
>>Subdirectory 'src' contains:
>>  README _Makefile
>>These are unlikely file names for src files.
>>
>>In fact, they are not source files, but I do not see any reason why they
>>cannot be there, or why I need to be warned of their presence.
>>Potentially I could be informed of their presence, but that is another
>>matter.
> 
> 
> Having unnecessary files in other people's packages just waste space and 
> download bandwidth for each one of the users.
> 
> 
>>Now, I only get this warning when I do:
>>
>>R CMD build affxparser
>>R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz
>>
>>If I do:
>>
>>R CMD check -l ~/R-packages affxparser
>>
>>I do not get the warning. Is this inconsistent, or is there rationale
>>behind this? I think the warning is inappropriate, or at the least a
>>little restrictive. It seems as if I should be able to put whatever I
>>want in there, especially the _Makefile as I like to build test programs
>>directly and I want to be able to build exactly what I check out from
>>my source code repository without having to copy files in and out.
> 
> 
> All described in TFM, including how to set defaults for what is checked.
> 
> 
>>The output from R CMD check is below. Any insight would be appreciated.
>>As always thanks for your patience.
> 
> 
> [...]
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] SaveImage, LazyLoad, S4 and all that {was "install.R ... files"}

2006-02-03 Thread Robert Gentleman

My understanding, and John or others may correct that, is that you need 
SaveImage if you want to have the class hierarchy and generic functions, 
plus associated methods all created and saved at build time. This is 
basically a sort of compilation step, and IMHO, should always be done 
since it only needs to be done once, rather than every time a package is 
loaded. Note that attaching your methods to other people's generics has 
to happen at load time, since you won't necessarily know where they are 
or even what they are until then (using an import directive may 
alleviate some of those issues but I have not tested just what does and 
does not work currently).

I hope that LazyLoad does what it says it does, that is dissociates the 
value from the symbol in such a way that the value lives on disk until 
it is wanted, but the symbol is available at package load time. I do not 
see how this relates to precomputing an image, and would not be very 
happy if the two ideas became one, they really are different and can be 
used to solve very different problems.

best wishes
  Robert




Prof Brian Ripley wrote:
> The short answer is that there are no known (i.e. documented) differences, 
> and no examples on CRAN which do not work with lazy-loading (except party, 
> which loads the saved image in a test).  And that includes examples of 
> packages which share S4 classes.  But my question was to tease things like 
> this out.
> 
> You do need either SaveImage or LazyLoad in a package that defines S4 
> classes and methods, since SetClass etc break the `rules' for R files in 
> packages in `Writing R Extensions'.
> 
> When I have time I will take a closer look at this example.
> 
> 
> On Fri, 3 Feb 2006, Martin Maechler wrote:
> 
> 
>>>>>>>"Seth" == Seth Falcon <[EMAIL PROTECTED]>
>>>>>>>on Thu, 02 Feb 2006 11:32:42 -0800 writes:
>>
>>   Seth> Thanks for the explaination of LazyLoad, that's very helpful.
>>   Seth> On  1 Feb 2006, [EMAIL PROTECTED] wrote:
>>   >> There is no intention to withdraw SaveImage: yes.  Rather, if
>>   >> lazy-loading is not doing a complete job, we could see if it could
>>   >> be improved.
>>
>>   Seth> It seems to me that LazyLoad does something different with respect to
>>   Seth> packages listed in Depends and/or how it interacts with namespaces.
>>
>>   Seth> I'm testing using the Bioconductor package graph and find that if I
>>   Seth> change SaveImage to LazyLoad I get the following:
>>
>>Interesting.
>>
>>I had also the vague feeling that  saveImage  was said to be
>>important when using  S4 classes and methods; particularly when
>>some methods are for generics from a different package/Namespace
>>and other methods for `base' classes (or other classes defined
>>elsewhere).
>>This is the case of 'Matrix', my primary experience here.
>>OTOH, we now only use 'LazyLoad: yes' , not (any more?)
>>'SaveImage: yes' -- and honestly I don't know / remember why.
>>
>>Martin
>>
>>
>>   Seth> ** preparing package for lazy loading
>>   Seth> Error in makeClassRepresentation(Class, properties, superClasses, 
>> prototype,  :
>>   Seth> couldn't find function "getuuid"
>>
>>   Seth> Looking at the NAMESPACE for the graph package, it looks like it is
>>   Seth> missing some imports.  I added lines:
>>   Seth> import(Ruuid)
>>   Seth> exportClasses(Ruuid)
>>
>>   Seth> Aside: am I correct in my reading of the extension manual that if one
>>   Seth> uses S4 classes from another package with a namespace, one
>>   Seth> must import the classes and *also* export them?
>>
>>   Seth> Now I see this:
>>
>>   Seth> ** preparing package for lazy loading
>>   Seth> Error in getClass("Ruuid") : "Ruuid" is not a defined class
>>   Seth> Error: unable to load R code in package 'graph'
>>   Seth> Execution halted
>>
>>   Seth> But Ruuid _is_ defined and exported in the Ruuid package.
>>
>>   Seth> Is there a known difference in how dependencies and imports are
>>   Seth> handled with LazyLoad as opposed to SaveImage?
>>
>>   Seth> Thanks,
>>
>>   Seth> + seth
>>
>>
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006-02-01 Thread Robert Gentleman

Should be patched in R-devel, will be available shortly

[EMAIL PROTECTED] wrote:
> Full_Name: Stefan Th. Gries
> Version: 2.2.1
> OS: Windows XP (Home and Professional)
> Submission from: (NULL) (68.6.34.104)
> 
> 
> The problem is this: I have a vector of two character strings.
> 
> 
>>text<-c("This is a first example sentence.", "And this is a second example
> 
>  sentence.")
> 
> If I now look for word boundaries with regexpr, this is what I get:
> 
>>regexpr("\\b", text, perl=TRUE)
> 
> [1] 1 1
> attr(,"match.length")
> [1] 0 0
> 
> So far, so good. But with gregexpr I get:
> 
> 
>>gregexpr("\\b", text, perl=TRUE)
> 
> Error: cannot allocate vector of size 524288 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 1015Mb: see help(memory.size)
> 2: Reached total allocation of 1015Mb: see help(memory.size)
> 
> Why don't I get the locations and extensions of all word boundaries?
> 
> I am using R 2.2.1 on a machine running Windows XP:
> 
>>R.version
> 
> _
> platform i386-pc-mingw32
> arch i386
> os   mingw32
> system   i386, mingw32
> status
> major2
> minor2.1
> year 2005
> month    12
> day  20
> svn rev  36812
> language R
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A patch for do_sample: check replace arg

2006-01-24 Thread Robert Gentleman

should be there now

Seth Falcon wrote:
> A colleague sent me the following:
> 
> If you specify probabilities in the 'sample' function and forget
> to type 'prob=...', then you get nonsense. E.g.
> 
> sample(1:10,1,c(0,0,0,0,1,0,0,0,0,0)) 
> 
> does not filter '5', while 
> 
> sample(1:10,1,prob=c(0,0,0,0,1,0,0,0,0,0)) 
> 
> does it correctly.  I wish this would return an error because the
> 'replace' argument should only take logical args. Anyway, it is
> easy to make this mistake and having it produce an error would be
> nice.
> 
> Assuming there is not a use-case for specifying a logical vector for
> the 'replace' argument, I like the idea of raising an error if replace
> is not length one.  The following patch provides an implementation.
> 
> + seth
> 
> 
> Diff is against svn Revision: 37141
> --- a/src/main/random.c Sat Jan 21 10:54:11 2006 -0800
> +++ b/src/main/random.c Sat Jan 21 11:17:20 2006 -0800
> @@ -453,15 +453,18 @@
>  /* with/without replacement according to r. */
>  SEXP attribute_hidden do_sample(SEXP call, SEXP op, SEXP args, SEXP rho)
>  {
> -SEXP x, y, prob;
> +SEXP x, y, prob, sreplace;
>  int k, n, replace;
>  double *p;
>  
>  checkArity(op, args);
>  n = asInteger(CAR(args)); args = CDR(args);
>  k = asInteger(CAR(args)); args = CDR(args);
> -replace = asLogical(CAR(args)); args = CDR(args);
> +sreplace = CAR(args); args = CDR(args);
>  prob = CAR(args);
> +if (length(sreplace) != 1)
> +errorcall(call, _("invalid '%s' argument"), "replace");
> +replace = asLogical(sreplace);
>  if (replace == NA_LOGICAL)
> errorcall(call, _("invalid '%s' argument"), "replace");
>  if (n == NA_INTEGER || n < 1)
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] clarification of library/require semantics

2005-11-04 Thread Robert Gentleman

Recently I have added a lib.loc argument to require, so that
it is more consistent with library. However, there are some oddities 
that folks have pointed out, and we do not have a documented description 
of the semantics for what should happen when the lib.loc parameter is 
provided.

   Proposal: the most common use case seems to be one where any other 
dependencies, or calls to library/require should also see the library 
specified in the lib.loc parameter for the duration of the initial call 
to library. Hence, we should modify the library search path for the 
duration of the call (via .libPaths).

  The alternative, is to not do that. Which is what happens now.

  Both have costs, automatically setting the library search path, of 
course, means that users that do not want that behavior have to manually 
remove things from their library. But if almost no one does that, and 
most folks I have asked have said they want the lib.loc parameter to be 
used for other loading.

   Comments?

  Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug loading libraries with winXP and 2.2.0 but not 2.1.1 (PR#8200)

2005-10-10 Thread Robert Gentleman


Hi,
  thanks for the report - but it should have gone to the Bioconductor 
list, and the package maintainer, thanks.

  New versions of multtest and a few other packages will percolate up in 
the next few days to resolve the problem.

  Best wishes,
Robert

[EMAIL PROTECTED] wrote:
> Full_Name: Ken Kompass
> Version: 2.2.0
> OS: winXP pro (2002 version, SP2)
> Submission from: (NULL) (128.252.149.244)
> 
> 
> Using R2.2.0 I get this error msg when loading certain bioconductor libraries
> (depending on whether the library contains file named "all.rda" in R folder of
> library) :
> 
> 
>>library(multtest)
> 
> Error in open.connection(con, "rb") : unable to open connection
> In addition: Warning message:
> cannot open compressed file 
> 'C:/PROGRA~1/R/R-22~1.0/library/multtest/R/all.rda'
> 
> Error in library(multtest) : .First.lib failed for 'multtest'
> 
> 
> The error occurs with loading any library that doesn't have an "all.rda" file 
> in
> its R folder...
> 
> 
> ...With R2.1.1 the exact same version of multtest loads fine :
> 
> 
>>library(multtest)
> 
> Loading required package: survival
> Loading required package: splines
> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Shy Suggestion?

2005-09-20 Thread Robert Gentleman

Prof Brian Ripley wrote:
> On Tue, 20 Sep 2005, Jari Oksanen wrote:
> 
> 
>>On Tue, 2005-09-20 at 09:42 -0400, Roger D. Peng wrote:
>>
>>>I think this needs to fail because packages listed in 'Suggests:' may, for
>>>example, be needed in the examples.  How can 'R CMD check' run the examples 
>>>and
>>>verify that they are executable if those packages are not available?  I 
>>>suppose
>>>you could put the examples in a \dontrun{}.
>>>
>>
>>Yes, that's what I do, and exactly for that reason: if something is not
>>necessarily needed (= 'suggestion' in this culture), it should not be
>>required in tests. However, if I don't use \dontrun{} for a
>>non-recommended package, the check would fail and I would get the needed
>>information: so why should the check fail already when checking
>>DESCRIPTION?
> 
> 
> Because it is a `check', and it assembles all the information needed at 
> the beginning.  I'd certainly prefer to know at the beginning rather than 
> 20 minutes into running the tests.
> 
> R CMD check is not really for end users: it is for package writers, 
> repository maintainers and for people checking proposed R changes.  Those 
> people want all the checks possible to be done.
> 

   Some of us also want a mechanism similar to this proposal. There are 
situations where the usage is of a minimal nature, the package may not 
be available on all architectures and the package developer is perfectly 
capable of setting up their tests to deal with the presence or lack 
there of. What happens now is that in these sorts of situations 
developers are tending to simply not list the dependency anywhere, and 
that is not a particularly good solution either. I would also point out, 
to those who believe that forcing all dependencies to be declared and 
enforced that name spaces provide a rather large hole.

My understanding of the original intent of Suggests was that it not be 
quite so rigid, but as that has not been how others interpreted it, it 
seems we should have another level of dependency (Uses has been bandied 
about).

  As I recall the discussion it was something like
  Depends:  major functionality in the package will not
  work without other packages listed here

  Suggests:  minor functionality (eg. some functions and or options will 
fail) if these packages are not available

  Uses: package is used for an example, or the current package provides 
an interface to the other package (where else do I put that code?) which
  will be used by anyone wanting to use both

  As I said above, and will try to emphasize, I really do not want R CMD 
check to do any checking of Uses (unless asked to do so). Developers 
that use Uses need to make sure that their package works and passes R 
CMD check whether the package is there or not.

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

58 matches

Mail list logo