Re: [Rd] The default behaviour of a missing entry in an environment

2009-11-16 Thread Robert Gentleman
Hi,

On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:

 On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch murd...@stats.uwo.ca
 wrote:

 On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:

 Note that one should use inherits = FALSE argument on get and exists
 to avoid returning objects from the parent, the parent of the parent,
 etc.

 I disagree.  Normally you would want to receive those objects.  If you
 didn't, why didn't you set the parent of the environment to emptyenv()
 when
 you created it?


 $ does not look into the parent so if you are trying to get those
 semantics you must use inherits = FALSE.

 Whoops, yes.  That's another complaint about $ on environments.

 That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.

  Robert


 Duncan Murdoch


 x - 3
 e - new.env()
 x %in% names(e)

 [1] FALSE

 get(x, e) # oops

 [1] 3

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-- 
Robert Gentleman
rgent...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Non-GPL packages for R

2009-09-11 Thread Robert Gentleman

Hi,

Peter Dalgaard wrote:

Prof. John C Nash wrote:

The responses to my posting yesterday seem to indicate more consensus
than I expected:


 Umm, I had thought that it was well established that responders need 
not represent the population being surveyed.  I doubt that there is 
consensus at the level you are suggesting (certainly I don't agree) and 
as Peter indicates below the issue is: what is maintainable with the 
resources we have, not what is the best solution given unlimited resources.


  Personally, I would like to see something that was a bit easier to 
deal with programmatically that indicated when a package was GPL (or 
Open source actually) compatible and when it is not.  This could then be 
used to write a decent function to identify suspect packages so that 
users would know when they should be concerned.


  It is also the case that things are not so simple, as dependencies 
can make a package unusable even if it is itself GPL-compatible.  This 
also makes the notion of some simple split into free and non-free (or 
what ever split you want) less trivial than is being suggested.


  Robert



1) CRAN should be restricted to GPL-equivalent licensed packages


GPL-_compatible_ would be the word. However, this is not what has been
done in the past. There are packages with non-commercial use licences,
and the survival package was among them for quite a while. As far as I
know, the CRAN policy has been to ensure only that redistribution is
legal and that whatever license is used is visible to the user. People
who have responded on the list do not necessarily speak for CRAN. In the
final analysis, the maintainers must decide what is maintainable.

The problem with Rdonlp2 seems to have been that the interface packages
claimed to be LGPL2 without the main copyright holder's consent (and it
seems that he cannot grant consent for reasons of TU-Darmstadt
policies). It is hard to safeguard agaist that sort of thing. CRAN
maintainers must assume that legalities have been cleared and accept the
license in good faith.

(Even within the Free Software world there are current issues with,
e.g., incompatibilities between GPL v.2 and v.3, and also with the
Eclipse license. Don't get me started...)


2) r-forge could be left buyer beware using DESCRIPTION information
3) We may want a specific repository for restricted packages (RANC?)

How to proceed? A short search on Rseek did not turn up a chain of
command for CRAN.

I'm prepared to help out with documentation etc. to move changes
forward. They are not, in my opinion, likely to cause a lot of trouble
for most users, and should simplify things over time.

JN

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] debug

2009-07-27 Thread Robert Gentleman

Hi,
  I just committed a change to R-devel so that if debug is called on an 
S3 generic function, all methods will also automatically have debug 
turned on for them (if they are dispatched to from the generic).


  I hope to be able to extend this to S4 and a few other cases that are 
currently not being handled over the the next few weeks.


  Please let me know if you have problems, or suggested improvements.

 Robert
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tabulate can accept NA values?

2009-07-20 Thread Robert Gentleman
should be in devel now, NAs are ignored (as are non-integers and things 
outside the nbin argument)


Martin Morgan wrote:

tabulate has

.C(R_tabulate, as.integer(bin), as.integer(length(bin)),
   as.integer(nbins), ans = integer(nbins), PACKAGE=base)$ans

The implementation of R_tabulate has

if(x[i] != R_NaInt  x[i]  0  x[i] = *nbin)

and so copes with (silently drops) NA. Perhaps the .C could have
NAOK=TRUE? This is useful in apply'ing tabulate to the rows or columns
of a (large) matrix, where the work-around involves introducing some
artificial NA value (and consequently copying the matrix) outside the
range of tabulate's nbin argument.

Martin  


--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] step by step debugger in R?

2009-05-23 Thread Robert Gentleman
Hi,
  I stripped the cc's as I believe that all read this list.

Romain Francois wrote:
 [moving this to r-devel]
 
 Robert Gentleman wrote:
 Hi,

 Romain Francois wrote:
  
 Duncan Murdoch wrote:

 On 5/22/2009 10:59 AM, Michael wrote:
  
 Really I think if there is a Visual Studio strength debugger, our
 collective time spent in developing R code will be greatly reduced.
 
 If someone who knows how to write a debugger plugin for Eclipse wants
 to help, we could have that fairly easily.  All the infrastructure is
 there; it's the UI part that's missing.

 Duncan Murdoch
   
 [I've copied Mark Bravington and Robert Gentleman to the list as they
 are likely to have views here, and I am not sure they monitor R-help]

 Hello,

 Making a front-end to debugging was one of the proposed google summer of
 code for this year [1], it was not retained eventually, but I am still
 motivated.

 Pretty much all infrastructure is there, and some work has been done
 __very recently__ in R's debugging internals (ability to step up). As I
 see it, the ability to call some sort of hook each time the debugger
 waits for input would make it much easier for someone to write
 

  I have still not come to an understanding of what this is supposed to
 do? When
 you have the browser prompt you can call any function or code you want
 to. There
 is no need for something special to allow you to do that.
   
 Sure. What I have in mind is something that gets __automatically__
 called, similar to the task callback but happening right before the user
 is given the browser prompt.

 I am trying to understand the scenario you have in mind. Is it that the user is
running R directly and your debugger is essentially a helper function that gets
updated etc as R runs?

 If so, then I don't think that works very well and given the constraints we
have with R I don't think it will be able to solve many of the problems that an
IDE should.  The hook you want will give you some functionality, but no where
near enough.

 Let me suggest instead that the IDE should be running the show. It should
initialize an instance of R, but it controls all communication and hence
controls what is rendered on the client side.  If that is what you mean by
embedding R, then yes that is what is needed. There is no way that I can see to
support most of the things that IDE type debuggers support without the IDE
controlling the communication with R.

 And if I am wrong about what your debugger will look like please let me know.

 best wishes
   Robert


 
 front-ends. A recent post of mine (patch included) [2] on R-devel
 suggested a custom prompt for browser which would do the trick, but I
 now think that a hook would be more appropriate. Without something
 similar to that, there is no way that I know of for making a front-end,
 unless maybe if you embed R ... (please let me know how I am wrong)
 

  I think you are wrong. I can't see why it is needed. The external
 debugger has
 lots of options for handling debugging. It can rewrite code (see
 examples in
 trace for how John Chambers has done this to support tracing at a
 location),
 which is AFAIK a pretty standard approach to writing debuggers. It can
 figure
 out where the break point is (made a bit easier by allowing it to put
 in pieces
 of text in the call to browser).  These are things the internal
 debugger can't do.

   
 Thanks. I'll have another look into that.
 
 There is also the debug package [3,4] which does __not__ work with R
 internals but rather works with instrumenting tricks at the R level.
 debug provides a tcl/tk front-end. It is my understanding that it does
 not work using R internals (do_browser, ...) because it was not possible
 at the time, and I believe this is still not possible today, but I might
 be wrong. I'd prefer to be wrong actually.
 

   I don't understand this statement. It has always been possible to
 work with
 the internal version - but one can also take the approach of rewriting
 code.
 There are some difficulties supporting all the operations that one
 would like by
 rewriting code and I think a combination of external controls and the
 internal
 debugger will get most of the functionality that anyone wants.

   There are somethings that are hard and once I have a more complete
 list I will
 be adding this to the appropriate manual. I will also be documenting
 the changes
 that I have been making, but that project is in flux and won't be done
 until the
 end of August, so people who want to look at it are welcome (it is in
 R-devel),
 but it is in development and could change pretty much without notice.
   Romain noted that we now support stepping out from one place to another
 function.  We also have a debugonce flag that lets you get close to
 step in, but
 step in is very hard in R.

   I am mostly interested in writing tools in R that can be used by
 anyone that
 wants to write an external debugger and am not that interested in any

Re: [Rd] Can a function know what other function called it?

2009-05-23 Thread Robert Gentleman
Hi Kynn,


Kynn Jones wrote:
 Suppose function foo calls function bar.  Is there any way in which
 bar can find out the name of the function that called it, foo?

 essentially yes. You can find out about the call stack by using sys.calls and
sys.parents etc. The man page plus additional manuals should be sufficient, but
let us know if there are things that are not clear.

 
 There are two generalization to this question that interest me.
 First, can this query go farther up the call stack?  I.e. if bar now
 calls baz, can baz find out the name of the function that called the
 function that called it, i.e. foo?  Second, what other information,

 yes - you can (at least currently) get access to the entire calling stack and
some manipulations can be performed.


 beside its name, can bar find about the environment where it was
 called?  E.g. can it find out the file name and line number of the

 there is no real concept of file and line number associated with a function
definition (nor need their even be a name - functions can be anonymous).

 If you want to map back to source files then I think that currently we do not
keep quite enough information when a function is sourced. Others may be able to
elaborate more (or correct my mistakes).  I think we currently store the actual
text for the body of the function so that it can be used for printing, but we
don't store a file name/location/line number or anything of that sort. It could
probably be added, but would be a lot of work, so it would need someone who
really wanted it to do that.

 However, you can find out lots of other things if you want.  Do note that while
 it is possible to determine which function initiated the call, it is not
necessarily possible to figure out which of the calls (if there is more than one
in the body of the function) is active.  R does not keep track of things in that
way. To be clear if foo looks like:

  foo - function(x) {
bar(x)
x = sqrt(x)
bar(x)
  }
  and you have a breakpoint in bar, you could not (easily) distinguish which of
the two calls to bar was active. There is no line counter or anything of that
sort available.

 best wishes
   Robert

 function call?
 
 Thanks!
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] a statement about package licenses

2009-05-01 Thread Robert Gentleman
We are writing on behalf of the R Foundation, to clarify our position on
the licenses under which developers may distribute R packages.
Readers should also see FAQ 2.11: this message is not legal advice,
which we never offer.  Readers should also be aware that besides
the R Foundation, R has many other copyright holders, listed in the
copyright notices in the source.  Each of those copyright holders may
have a different opinion on the issues discussed here.

We welcome packages that extend the capabilities of R, and believe
that their value to the community is increased if they can be offered
with open-source licenses.  At the same time, we have no desire to
discourage other license forms that developers feel are required. Of
course, such licenses as well as the contents of the package and the
way in which it is distributed must respect the rights of the copyright
holders and the terms of the R license.

When we think that a package is in violation of these rights, we
contact the author directly, and so far package authors have always agreed to
comply with our license (or convinced us that they are already in compliance).
We have no desire to be involved in legal actions---our interest is in providing
good software.  However, everyone should understand that there are conceivable
circumstances in which we would be obliged to take action. Our experience to
date and the assurances of some fine commercial developers make us optimistic
that these circumstances will not arise.

The R Foundation

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages and dependency version checking

2008-12-15 Thread Robert Gentleman
Hi,

Prof Brian Ripley wrote:
 I've started to implement checks for package versions on dependencies in
 install.packages().  However, this is revealing a number of
 problems/misconceptions.
 
 
 (A) We do not check versions when loading namespaces, ahd the namespace
 registry does not contain version information.  So that for example
 (rtracklayer)
 
 Depends: R (= 2.7.0), Biobase, methods, RCurl
 Imports: XML (= 1.98-0), IRanges, Biostrings
 
 will never check the version of namespace XML that is loaded, either
 already loaded or resulting from loading this package's namespace.  For
 this to be operational we would need to extend the syntax of the
 imports() and importsFrom() directive in a NAMESPACE file to allow
 version restrictions. I am not sure this is worth doing, as an
 alternative is to put the imported package in Depends.
 
 The version dependence will in a future release cause an update of XML
 when rtracklayer is installed, if needed (and available).
 
 

  I think we need to have this functionality in both Imports and Depends,
  see my response to another point for why.

 (B) Things like (package stam)
 
 Depends: R (= 2.7.0), GO.db (= 2.1.3), Biobase (= 1.99.5), pamr (=
 1.37.0), cluster (= 1.11.10), annaffy (= 1.11.5), methods (=
 2.7.0), utils (= 2.7.0)
 
 are redundant: the versions of method and utils are always the same as
 that of R.
 
 And there is no point in having a package in both Depends: and Imports:,
 as Biostrings has.

  I don't think that is true.  There are cases where both Imports and Depends
are reasonable.  The purpose of importing is to ensure correct resolution of
symbols in the internal functions of a package. I would do that in almost all
cases.  In some instances I want users to see functionality from another package
- and I can then either a) (re)export those functions, or if there are lots of
them, then b) just put the package also in Depends.  Now, a) is a bit less
useful than it could be since R CMD check gets annoyed about these re-exported
functions (I don't think it should care, the man page exists and is findable).

 
 
 (C) There is no check on the version of a package suggested by
 Suggests:, unless the package itself provides one (and I found no
 instances).

  It may be worthwhile, but this is a less frequent use case and I would
prioritize it lower than having that functionality in Imports.

 
 
 (D) We can really only handle = dependencies on package versions (but
 then I can see no other ops in use).  install.packages() will find the
 latest version available on the repositories, and we possibly need to
 check version requirements on the same dependency many times.  Given
 that BioC has a penchant for having version dependencies on unavailable
 versions (e.g. last week on IRanges (= 1.1.7) with 1.1.4 available), we
 may be able to satisfy the requirements of some packages and not others.
 (In that case the strategy used is to install the latest available
 version if the one installed does not suffice for those we can satisfy,
 and report the problem(s).)
 

  I suspect one needs = (basically as Gabor pointed out, some packages have 
issues).

 
 (E) One of the arguments that has been used to do this version checking
 at install time is to avoid installing packages that cannot work. It
 would be possible to extend the approach to do so, but I am going to
 leave that to those who advocated it.
 
 
 The net effect of the current changes will be that if there is a
 dependence that is already installed but a later version is available
 and will help satisfy a = dependence, it will be added to the list of
 packages to be installed.  As we have seen with Matrix this last week,
 that can have downsides in stopping previously functional packages working.
 
 This is work in progress: there is no way to write a test suite that
 will encapsulate all the possible scenarios so weneed to get experience
 until 2.9.0 is released.  Please report any quirks to R-devel if they
 are completely reproducible (and preferably with the code change needed
 to fix them, since the chance of anyone else being able to reproduce
 them are fairly slim).
 
  thanks
Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgent...@fhcrc.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] wish: exportClassPattern

2008-12-04 Thread Robert Gentleman
should be in the most recent devel,


Prof Brian Ripley wrote:
 Michael,
 
 This seems a reasonable request, but most of us will not have enough
 classes in a package for it to make a difference.
 
 A patch to do this would certainly speed up implementation.
 
 On Fri, 21 Nov 2008, Michael Lawrence wrote:
 
 It would be nice to have a more convenient means of exporting multiple
 classes from a package namespace. Why not have something like
 exportClassPattern() that worked like exportPattern() except for classes?

 Thanks,
 Michael

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion for the optimization code

2008-08-08 Thread Robert Gentleman



Duncan Murdoch wrote:

On 8/8/2008 8:56 AM, Mathieu Ribatet wrote:

Dear list,

Here's a suggestion about the different optimization code. There are 
several optimization procedures in the base package (optim, optimize, 
nlm, nlminb, ..). However, the output of these functions are slightly 
different. For instance,


   1. optim returns a list with arguments par (the estimates), value the
  minimum (maxima) of the objective function, convergence (optim
  .convergence)
   2. optimize returns a list with arguments minimum (or maximum) giving
  the estimates, objective the value of the obj. function
   3. nlm returns a list with arguments minimum giving the minimum of
  the obj. function, minimum the estimates, code the optim. 
convergence

   4. nlminb returns a list with arguments par (the estimates),
  objective, convergence (conv. code), evaluations

Furthermore, optim keeps the names of the parameters while nlm, nlminb 
don't.

s
I believe it would be nice if all these optimizers have a kind of 
homogenized output. This will help in writing functions that can call 
different optimizers. Obviously, we can write our own function that 
homogenized the output after calling the optimizer, but I still 
believe this will be more user-friendly.


Unfortunately, changing the names within the return value would break a 
lot of existing uses of those functions.  Writing a wrapper to 
homogenize the output is probably the right thing to do.


  And potentially to harmonize inputs. The MLInterfaces package 
(Bioconductor) has done this for many machine learning algorithms, 
should you want an example to look at.


  Robert




Duncan Murdoch

Do you think this is a reasonable feature to implement - despite it 
isn't an important point?

Best,
Mathieu

* BTW, if this is relevant, I could try to do it.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: What should ?foo do?

2008-04-25 Thread Robert Gentleman


Duncan Murdoch wrote:
 Currently ?foo does help(foo), which looks for a man page with alias 
 foo.  If foo happens to be a function call, it will do a bit more, so
 
 ?mean(something)
 
 will find the mean method for something if mean happens to be an S4 
 generic.  There are also the type?foo variations, e.g. methods?foo, or 
 package?foo.
 
 I think these are all too limited.
 
 The easiest search should be the most permissive.  Users should need to 
 do extra work to limit their search to man pages, with exact matches, as 
 ? does.

   While I like the idea, I don't really agree with the sentiment above. 
I think that the easiest search should be the one that you want the 
result of most often.
And at least for me that is the man page for the function, so I can 
check some detail; and it works pretty well.  I use site searches much 
less frequently and would be happy to type more for them.

 
 We don't currently have a general purpose search for foo, or something 
 like it.  We come close with RSiteSearch, and so possibly ?foo should 
 mean RSiteSearch(foo), but
 there are problems with that: it can't limit itself to the current 
 version of R, and it doesn't work when you're offline (or when 
 search.r-project.org is down.)  We also have help.search(foo), but it 
 is too limited. I'd like to have a local search that looks through the 
 man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., specific 
 to the current R installation, and I think ? should be attached to that 
 search.

  I think that would be very useful (although there will be some 
decisions on which tool to use to achieve this). But, it will also be 
problematic, as one will get tons of hits for some things, and then 
selecting the one you really want will be a pain.

  I would rather see that be one of the dyadic forms, say

   site?foo

  or
   all?foo

  one could even imagine refining that for different subsets of the docs 
you have mentioned;

   help?foo #only man pages
   guides?foo #the manuals, R Extensions etc

and so on.

   You did not, make a suggestion as to how we would get the equivalent 
of ?foo now, if a decision to move were taken.


 
 Comments, please.
 
 Duncan Murdoch
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: What should ?foo do?

2008-04-25 Thread Robert Gentleman


Duncan Murdoch wrote:
 On 4/25/2008 10:16 AM, Robert Gentleman wrote:

 Duncan Murdoch wrote:
 Currently ?foo does help(foo), which looks for a man page with 
 alias foo.  If foo happens to be a function call, it will do a bit 
 more, so

 ?mean(something)

 will find the mean method for something if mean happens to be an S4 
 generic.  There are also the type?foo variations, e.g. methods?foo, 
 or package?foo.

 I think these are all too limited.

 The easiest search should be the most permissive.  Users should need 
 to do extra work to limit their search to man pages, with exact 
 matches, as ? does.

While I like the idea, I don't really agree with the sentiment 
 above. I think that the easiest search should be the one that you want 
 the result of most often.
 And at least for me that is the man page for the function, so I can 
 check some detail; and it works pretty well.  I use site searches much 
 less frequently and would be happy to type more for them.
 
 That's true.
 
 What's your feeling about what should happen when ?foo fails?

   present of list of man pages with spellings close to foo (we have the 
tools to do this in many places right now, and it would be a great help, 
IMHO, as spellings and capitalization behavior varies both between and 
within individuals), so the user can select one

 
 


 We don't currently have a general purpose search for foo, or 
 something like it.  We come close with RSiteSearch, and so possibly 
 ?foo should mean RSiteSearch(foo), but
 there are problems with that: it can't limit itself to the current 
 version of R, and it doesn't work when you're offline (or when 
 search.r-project.org is down.)  We also have help.search(foo), but 
 it is too limited. I'd like to have a local search that looks through 
 the man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., 
 specific to the current R installation, and I think ? should be 
 attached to that search.

   I think that would be very useful (although there will be some 
 decisions on which tool to use to achieve this). But, it will also be 
 problematic, as one will get tons of hits for some things, and then 
 selecting the one you really want will be a pain.

   I would rather see that be one of the dyadic forms, say

site?foo

   or
all?foo

   one could even imagine refining that for different subsets of the 
 docs you have mentioned;

help?foo #only man pages
guides?foo #the manuals, R Extensions etc

 and so on.

You did not, make a suggestion as to how we would get the 
 equivalent of ?foo now, if a decision to move were taken.
 
 I didn't say, but I would assume there would be a way to do it, and it 
 shouldn't be hard to invoke.  Maybe help?foo as you suggested, or man?foo.

   If not then I would be strongly opposed -- I really think we want to 
make the most common thing the easiest to do.  And if we really think 
that might be different for different people, then disambiguate the 
short-cut, ? in this case, from the command so that users have some 
freedom to customize, would be my favored alternative.

   I also wonder if one could not also provide some mechanism to provide 
distinct information on what is local vs what is on the internet. 
Something that would make tools like spotlight much more valuable, IMHO, 
is to tell me what I have on my computer, and what I can get, if I want 
to; at least as some form of option.


   Robert

 
 Duncan Murdoch
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check should check date in description

2008-04-04 Thread Robert Gentleman


Kurt Hornik wrote:
 hadley wickham writes:
 
 I recently thought about this.  I see several issues.

 * How can we determine if it is old?  Relative to the time when the
 package was uploaded to a repository?

 * Some developers might actually want a different date for a variety of
 reasons ...

 * What we currently say in R-exts is

 The optional `Date' field gives the release date of the current
 version of the package.  It is strongly recommended to use the
 -mm-dd format conforming to the ISO standard.

 Many packages do not comply with the latter (but I have some code to
 sanitize most of these), and release date may be a moving target.

 The best that I could think of is to teach R CMD build to *add* a Date
 field if there was none.
 
 That sounds like a good solution to me.
 
 Ok.  However, 2.7.0 feature freeze soon ...

   Please no.  If people want one then they should add it manually. It 
is optional, and some of us have explicitly opted out and would like to 
continue to do so.


 
 Otherwise, maybe just a message from R CMD check?  i.e. just like
 failing the codetools checks, it might be perfectly ok, but you should
 be doing it consciously, not by mistake.
 
 I am working on that, too (e.g. a simple NOTE in case the date spec
 cannot be canonicalized, etc.).  If file time stamps were realiable, we
 could compare these to the given date.  This is I guess all we can do
 for e.g. CRAN's daily checking (where comparing to the date the check
 is run is not too useful) ...

   But definitely not a warning.

   Robert

 
 Best
 -k
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check should check date in description

2008-04-04 Thread Robert Gentleman


hadley wickham wrote:
   Please no.  If people want one then they should add it manually. It is
 optional, and some of us have explicitly opted out and would like to
 continue to do so.
 
 To clarify, do you mean you have decided not to provide a date field
 in the DESCRIPTION file?  If so, would you mind elaborating why?

  Sure: The date of what?


 
 Hadley
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem with new(externalptr)

2008-01-29 Thread Robert Gentleman
Hi,

Herve Pages wrote:
 Hi,
 
 It seems that new(externalptr) is always returning the same instance, and
 not a new one as one would expect from a call to new(). Of course this is hard
 to observe:
 
new(externalptr)
   pointer: (nil)
new(externalptr)
   pointer: (nil)
 
 since not a lot of details are displayed.
 
 For example, it's easy to see that 2 consecutive calls to new(environment)
 create different instances:
 
new(environment)
   environment: 0xc89d10
new(environment)
   environment: 0xc51248

   getMethod(initialize, environment)
and

   getMethod(initialize, externalptr)

  will give some hints about the difference.
 
 But for new(externalptr), I had to use the following C routine:
 
   SEXP sexp_address(SEXP s)
   {
 SEXP ans;
 char buf[40];
 
 snprintf(buf, sizeof(buf), %p, s);
 PROTECT(ans = NEW_CHARACTER(1));
 SET_STRING_ELT(ans, 0, mkChar(buf));
 UNPROTECT(1);
 return ans;
   }
 
 Then I get:
 
.Call(sexp_address, new(externalptr))
   [1] 0xde2ce0
.Call(sexp_address, new(externalptr))
   [1] 0xde2ce0
 
 Isn't that wrong?

   Not what you want, but not wrong. In the absence of an initialize 
method all calls to new are guaranteed to return the prototype; so I 
think it behaves as documented.

   new(environment) would also always return the same environment, 
were it not for the initialize method.  So you might want to contribute 
an initialize method for externalptr, but as you said, they are not 
useful at the R level so I don't know just what problem is being solved.

   This piece of code might be useful in such a construction
.Call(R_externalptr_prototype_object, PACKAGE = methods)

  which does what you would like.

  best wishes
Robert

 
 I worked around this problem by writing the following C routine:
 
   SEXP xp_new()
   {
 return R_MakeExternalPtr(NULL, R_NilValue, R_NilValue);
   }
 
 so I can create new externalptr instances from R with:
 
   .Call(xp_new)
 
 I understand that there is not much you can do from R with an externalptr
 instance and that you will have to manipulate them at the C level anyway.
 But since new(externalptr) exists and seems to work, wouldn't that be
 better if it was really creating a new instance at each call?
 
 Thanks!
 H.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] course announcement

2008-01-07 Thread Robert Gentleman
Hi,
   We will be holding an advanced course in R programming at the FHCRC 
(Seattle), Feb 13-15. There will be some emphasis on Bioinformatic 
applications, but not much.

   Sign up at:
https://secure.bioconductor.org/SeattleFeb08/index.php

   please note space is very limited so make sure you have a 
registration before making any travel plans. Also, this is definitely 
not a course for beginners.

   Best wishes
 Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Evaluating R expressions from C

2008-01-04 Thread Robert Gentleman
Hi Terry,

Terry Therneau wrote:
 I am currently puzzled by a passage in the R Extensions manual, section 5.10:
 
 SEXP lapply(SEXP list, SEXP expr, SEXP rho)
  {
R_len_t i, n = length(list);
SEXP ans;
  
if(!isNewList(list)) error(`list' must be a list);
if(!isEnvironment(rho)) error(`rho' should be an environment);
PROTECT(ans = allocVector(VECSXP, n));
for(i = 0; i  n; i++) {
  defineVar(install(x), VECTOR_ELT(list, i), rho);
  SET_VECTOR_ELT(ans, i, eval(expr, rho));
}
 
 I'm trying to understand this code beyond just copying it, and don't find 
 definitions for many of the calls.  PROTECT and SEXP have been well discussed 
 previously in the document, but what exactly are
   R_len_t
   defineVar

this function defines the variable (SYMSXP; one type of SEXP) of 
its first argument, to have the value given by its second argument, in 
the environment defined by its third argument. There are lots of 
variants, these are largely in envir.c


   install

   all symbols in R are unique (there is only one symbol named x, even 
though it might have bindings in many different environments). So to get 
the unique thing (a SYMSXP) you call install (line 1067 in names.c has 
a pretty brief comment to this effect). This makes it efficient to do 
variable look up, as we only need to compare pointers (within an 
environment), not compare names.

   VECTOR_ELT

 access the indicated element (2nd arg) of the vector (first arg)

   SET_VECTOR_ELT

  set the indicated element (2nd arg), of the vector (1st arg) to 
the value (3rd arg)

   
 The last I also found in 5.7.4, but it's not defined there either.  
 
 So:
What do these macros do?  Some I could guess, like is.Environment; and I'm 
 fairly confident of R_len_t.  Others I need some help.
Perhaps they are elswhere in the document?  (My version of acrobat can't 
 do 
 searches.)  Is there another document that I should look at first?
Why isNewList?  I would have guessed isList.  What's the difference?

old lists are of the CAR-CDR variant, and largely only used 
internally these days.  new lists, are generic vectors, and are what 
users will almost always encounter (even users that program internals, 
you pretty much need to be messing with the language itself to run into 
the CAR-CDR variety).

best wishes
 Robert


   Thanks for any help,
   Terry Therneau
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages() and configure.args

2007-10-22 Thread Robert Gentleman
since, in Herve's example only one package was named, it would be nice 
to either, make sure configure args are associated with it, or to force 
only named configure.args parameters, and possibly check the names?


Duncan Temple Lang wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi Herve
 
  The best way to specify configure.args when there are multiple
 packages (either directly or via dependencies) is to use names on
 the character vector, i.e.
 
install.packages(Rgraphviz,
 rep=http://bioconductor.org/packages/2.1/bioc;,
 
 configure.args=c(Rgraphviz=--with-graphviz=/some/non/standard/place))
 
 
 This allows one to specify command line arguments for many packages
 simultaneously and unambiguously.
 
 install.packages() only uses configure.args when there are no names
 if there is only one package being installed.  It could be made
 smarter to apply this to the first of the pkgs only, or
 to identify the packages as direct and dependent.  But it is not
 obvious it is worth the effort as using names on configure.args
 provides a complete solution and is more informative.
 
 Thanks for pointing this out.
 
  D.
 
 Herve Pages wrote:
 Hi,

 In the case where install.packages(packageA) also needs to install
 required package packageB, then what is passed thru the 'configure.args'
 argument seems to be lost when it's the turn of packageA to be installed
 (the last package to get installed).

 This is not easy to reproduce but let's say you have the graphviz libraries
 installed on your system, but you don't have the graph package installed yet.
 Then this

   install.packages(Rgraphviz,
rep=http://bioconductor.org/packages/2.1/bioc;,
configure.args=--with-graphviz=/some/non/standard/place)

 will fail because --with-graphviz=/some/non/standard/place doesn't seem to be
 passed to Rgraphviz's configure script. But if you already have the graph 
 package,
 then it will work.

 Cheers,
 H.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (Darwin)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHGnBg9p/Jzwa2QP4RAp0NAJ9Qe/thxdrX8CpFVcRP2UoHk1txFACeL9uM
 twmID5hsclilHhIfPsuFt7A=
 =vCz1
 -END PGP SIGNATURE-
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] gregexpr (PR#9965)

2007-10-11 Thread Robert Gentleman
Yes, we had originally wanted it to find all matches, but user 
complaints that it did not perform as Perl does were taken to prevail. 
There are different ways to do this, but it seems the notion that one 
not start looking for the next match until after the previous one is 
more common.  I did consciously decide not to have a switch, and instead 
we wrote something that does what we wanted it to do and put it in the 
Biostrings package (from Bioconductor) as geregexpr2 (sorry but only 
fixed = TRUE is supported, since that is all we needed).

best wishes
   Robert


Prof Brian Ripley wrote:
 This was a deliberate change for R 2.4.0 with SVN log:
 
 r38145 | rgentlem | 2006-05-20 23:58:14 +0100 (Sat, 20 May 2006) | 2 lines
 fixing gregexpr infelicity
 
 So it seems the author of gregexpr believed that the bug was in 2.3.1, not 
 2.5.1.
 
 On Wed, 10 Oct 2007, [EMAIL PROTECTED] wrote:
 
 Full_Name: Peter Dolan
 Version: 2.5.1
 OS: Windows
 Submission from: (NULL) (128.193.227.43)


 gregexpr does not find all matching substrings if the substrings overlap:

 gregexpr(abab,ababab)
 [[1]]
 [1] 1
 attr(,match.length)
 [1] 4

 It does work correctly in Version 2.3.1 under linux.
 
 'correctly' is a matter of definition, I believe: this could be considered 
 to be vaguely worded in the help.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'load' does not properly add 'show' methods for classes extending 'list'

2007-09-25 Thread Robert Gentleman
I think that it would be best then, not to load the package, as loading 
it in this way, means that it is almost impossible to get the methods 
registered correctly. That does seem to be a bug, or at least a major 
inconvenience.  And one might wonder at the purpose of attaching if not 
to make methods available.

That said the documentation, indeed does not state that anything good 
will happen. It also does not state that something bad will happen either.

best wishes
   Robert


Prof Brian Ripley wrote:
 I am not sure why you expected this to work: I did not expect it to and 
 could not find relevant documentation to suggest it should.
 
 Loading an object created from a non-attached package does not in general 
 attach that package and make the methods for the class of 'x' available. 
 We have talked about attaching the package defining the class when an S4 
 object is loaded, and that is probably possible now S4 objects can be 
 unambiguously distinguished (although I still worry about multiple 
 packages with the same generic and their order on the search path).
 
 In your example there is no specific 'show' method on the search path when 
 'show' is called via autoprinting in the second session, so 'showDefault' 
 is called.  Package GSEABase gets attached as an (undocumented) side 
 effect of calling 'getClassDef' from 'showDefault'.  I can see no 
 documentation (and in particular not in ?showDefault) that 'showDefault' 
 is supposed to attach the package defining the class and re-dispatch to a 
 'show' method that package contains.  Since attaching packages behind the 
 user's back can have nasty side effects (the order of the search path does 
 mattter), I think the pros and cons need careful consideration: a warning 
 along the lines of
 
'object 'x' is of class GeneSetCollection from package 'GSEABase'
which is not on the search path
 
 might be more appropriate.  Things would potentially be a lot smoother if 
 namespaces could be assumed, as loading a namespace has few side effects 
 (and if loading a namespace registered methods for visible S4 generics 
 smoothly).
 
 Until I see documentation otherwise, I will continue to assume that I do 
 need to attach the class-defining package(s) for things to work correctly.
 
 
 On Mon, 24 Sep 2007, Martin Morgan wrote:
 
 The GeneSetCollection class in the Bioconductor package GSEABase
 extends 'list'

 library(GSEABase)
 showClass(GeneSetCollection)
 Slots:

 Name:  .Data
 Class:  list

 Extends:
 Class list, from data part
 Class vector, by class list, distance 2
 Class AssayData, by class list, distance 2

 If I create an instance of this class and serialize it

 x - GeneSetCollection(GeneSet(X))
 x
 GeneSetCollection
  names: NA (1 total)
 save(x, file=/tmp/x.rda)
 and then start a new R session and load the data object (without first
 library(GSEABase)), the 'show' method is not added to the appropriate
 method table.

 load(/tmp/x.Rda)
 x
 Loading required package: GSEABase
 Loading required package: Biobase
 Loading required package: tools

 Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation(Biobase)' and for packages 'citation(pkgname)'.

 Loading required package: AnnotationDbi
 Loading required package: DBI
 Loading required package: RSQLite
 An object of class GeneSetCollection
 [[1]]
 setName: NA
 geneIds: X (total: 1)
 geneIdType: Null
 collectionType: Null
 details: use 'details(object)'

 Actually, the behavior is more complicate than appears; in a new R
 session after loading /tmp/x.Rda, if I immediately do x[[1]] I get the
 show,GeneSetCollection-method but not show,GeneSet-method.

 Sorry for the somewhat obscure example.

 Martin

 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Jobs in Seattle

2007-09-13 Thread Robert Gentleman
Hi,
   As many of you will realize, Seth is going to be leaving us (pretty 
much immediately).  So we will be looking to replace him. In addition,
Martin Morgan is going to be moving into another role as well, one that 
will require an assistant.  In addition, I am looking for at least one 
post-doc (preferably with an interest in sequence related work).

   If any of these interest you, please check out the job descriptions
at:
http://www.fhcrc.org/about/jobs/

  and you can get some idea of salary level as well

  You can feel free to ask me about either the lead programmer job, or 
the post doc and should probably direct questions about the 
bioinformatics position to Martin.

   All applications must go through the FHCRC web site.

  thanks
Robert



-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] HTML vignette browser

2007-06-04 Thread Robert Gentleman


Deepayan Sarkar wrote:
 On 6/4/07, Seth Falcon [EMAIL PROTECTED] wrote:
 Friedrich Leisch [EMAIL PROTECTED] writes:
 Looks good to me, and certainly something worth being added to R.

 2 quick (related) comments:

 1) I am not sure if we want to include links to the Latex-Sources by
default, those might confuse unsuspecting novices a lot. Perhaps
make those optional using an argument to browseVignettes(), which
is FALSE by default?
 I agree that the Rnw could confuse folks.  But I'm not sure it needs
 to be hidden or turned off by default...  If the .R file was also
 included then it would be less confusing I suspect as the curious
 could deduce what Rnw is about by triangulation.

 2) Instead links to .Rnw files we may want to include links to the R
code - should we R CMD INSTALL a tangled version of each vignette
such that we can link to it? Of course it is redundant information
given the .Rnw, but we also have the help pages in several formats
ready.
 Including, by default, links to the tangled .R code seems like a
 really nice idea.  I think a lot of users who find vignettes don't
 realize that all of the code used to generate the entire document is
 available to them -- I just had a question from someone who wanted to
 know how to make a plot that appeared in a vignette, for example.
 
 I agree that having a Stangled .R file would be a great idea (among
 other things, it would have the complete code, which many PDFs will
 not).
 
 I don't have a strong opinion either way about linking to the .Rnw
 file. It should definitely be there if the PDF file is absent (e.g.
 for grid, and other packages installed with --no-vignettes, which I
 always do for local installation). Maybe we can keep them, but change
 the name to something more scary than source, e.g. LaTeX/Noweb
 source.

   I would very much prefer to keep the source, with some name, scary or 
not...

 
 -Deepayan
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Calling R_PolledEvents from R_CheckUserInterrupt

2007-05-31 Thread Robert Gentleman
should be there shortly - I have no way of testing windows (right now, 
at least), so hopefully Duncan M will have time to take a look


Deepayan Sarkar wrote:
 On 5/5/07, Luke Tierney [EMAIL PROTECTED] wrote:
 
 [...]
 
 However, R_PolledEvents is only called from a limited set of places
 now (including the socket reading code to keep things responsive
 during blocking reads).  But it is not called from the interupt
 checking code, which means if a user does something equivalent to

 while (TRUE) {}

 there is not point where events get looked at to see a user interrupt
 action. The current definition of R_CheckUserInterrupt is

 void R_CheckUserInterrupt(void)
 {
  R_CheckStack();
  /* This is the point where GUI systems need to do enough event
 processing to determine whether there is a user interrupt event
 pending.  Need to be careful not to do too much event
 processing though: if event handlers written in R are allowed
 to run at this point then we end up with concurrent R
 evaluations and that can cause problems until we have proper
 concurrency support. LT */
 #if  ( defined(HAVE_AQUA) || defined(Win32) )
  R_ProcessEvents();
 #else
  if (R_interrupts_pending)
  onintr();
 #endif /* Win32 */
 }

 So only on Windows or Mac do we do event processing.  We could add a
 R_PolledEvents() call in the #else bit to support this, though the
 cautions in the comment do need to be kept in mind.
 
 I have been using the following patch to src/main/errors.c for a while
 without any obvious ill effects. Could we add this to r-devel (with
 necessary changes for Windows, if any)?
 
 -Deepayan
 
 Index: errors.c
 ===
 --- errors.c(revision 41764)
 +++ errors.c(working copy)
 @@ -39,6 +39,8 @@
  #include R_ext/GraphicsEngine.h /* for GEonExit */
  #include Rmath.h /* for imax2 */
 
 +#include R_ext/eventloop.h
 +
  #ifndef min
  #define min(a, b) (ab?a:b)
  #endif
 @@ -117,6 +119,8 @@
  #if  ( defined(HAVE_AQUA) || defined(Win32) )
  R_ProcessEvents();
  #else
 +R_PolledEvents();
  if (R_interrupts_pending)
 onintr();
  #endif /* Win32 */
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native implementation of rowMedians()

2007-05-14 Thread Robert Gentleman
We did think about this a lot, and decided it was better to have 
something like rowQ, which really returns requested order statistics, 
letting the user manipulate them on the return for their own version of 
median, or other quantiles, was a better approach. I would be happy to 
have this in R itself, if there is sufficient interest and we can remove 
the one in Biobase (without the need for deprecation/defunct as long  as 
the args are compatible). But, if the decision is to return a particular 
estimate of a quantile, then we would probably want to keep our function 
around, with its current name.

best wishes
   Robert


Martin Maechler wrote:
 BDR == Prof Brian Ripley [EMAIL PROTECTED]
 on Mon, 14 May 2007 11:39:18 +0100 (BST) writes:
 
 BDR On Mon, 14 May 2007, Henrik Bengtsson wrote:
  On 5/14/07, Prof Brian Ripley [EMAIL PROTECTED] wrote:
  
   Hi Henrik,
   HenrikB == Henrik Bengtsson [EMAIL PROTECTED]
   on Sun, 13 May 2007 21:14:24 -0700 writes:
  
  HenrikB Hi,
  HenrikB I've got a version of rowMedians(x, na.rm=FALSE) for 
  matrices that
  HenrikB handles missing values implemented in C.  It has been
 
 BDR [...]
 
  Also, the 'a version of rowMedians' made me wonder what other version
  there was, and it seems there is one in Biobase which looks a more
  natural home.
  
  The rowMedians() in Biobase utilizes rowQ() in ditto.  I actually
  started of by adding support for missing values to rowQ() resulting in
  the method rowQuantiles(), for which there are also internal functions
  for both integer and double matrices.  rowQuantiles() is in R.native
  too, but since it has much less CPU milage I wanted to wait with that.
  The rowMedians() is developed from my rowQuantiles() optimized for
  the 50% quantile.
  
  Why do you think it is more natural to host rowMedians() in Biobase
  than in one of the core R packages?  Biobase comes with a lot of
  overhead for people not in the Bio-world.
 
 BDR Because that is where there seems to be a need for it, and having 
 multiple 
 BDR functions of the same name in different packages is not ideal (and 
 even 
 BDR with namespaces can cause confusion).
 
 That's correct, of course.
 However, I still think that quantiles (and statistics derived
 from them) in general and medians in particular are under-used
 by many user groups. For some useRs, speed can be an important
 reason and for that I had made a big effort to provide runmed()
 in R, and I think it would be worthwhile to provide fast rowwise
 medians and quantiles, here as well.
 
 Also, BTW, I think it will be worthwhile to provide (R-C) API
 versions of median() and quantile() {with less options than the
 R functions, most probably!!}, 
 such that we'd hopefully see less re-invention of the wheel
 happening in every package that needs such quantiles in its C code.
 
 Biobase is in quite active maintenance, and I'd assume its
 maintainers will remove rowMedians() from there (or first
 replace it with a wrapper in order to deal with the namespace
 issue you mentioned) as soon as R has its own function
 with the same (or better) functionality.  
 In order to facilitate the transition, we'd have to make sure
 that such a 'stats' function does behave  =  to the bioBase
 one. 
 
 Martin
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] One for the wish list - var.default etc

2007-05-09 Thread Robert Gentleman


Jeffrey J. Hallman wrote:
 Prof Brian Ripley [EMAIL PROTECTED] writes:
 
 On Wed, 9 May 2007, S Ellison wrote:

 Brian,

 If we make functions generic, we rely on package writers implementing 
 the documented semantics (and that is not easy to check).  That was 
 deemed to be too easy to get wrong for var().
 Hard to argue with a considered decision, but the alternative facing 
 increasing numbers of package developers seems to me to be pretty bad 
 too ...

 There are two ways a package developer can currently get a function 
 tailored to their own new class. One is to rely on a generic function to 
 launch their class-specific instance, and write only the class-specific 
 instance. That may indeed be hard to check, though I would be inclined 
 to think that is the package developer's problem, not the core team's. 
 But it has (as far as I know today ...?) no wider impact.
 But it does: it gives the method privileged access, in this case to the 
 stats namespace, even allowing a user to change the default method
 which namespaces to a very large extent protect against.

 If var is not generic, we can be sure that all uses within the stats 
 namespace and any namespace that imports it are of stats::var.  That is 
 not something to give up lightly.
 
 No, but neither is the flexibility afforded by generics. What we have here is
 a false tradeoff between flexibility vs. the safety of locking stuff down. 

   Yes, that is precisely one of the points, and as some of us recently 
experienced, a reasonably dedicated programmer can over-ride any base 
function through an add-on package. It is, in my opinion a bad idea to 
become the police here.

   AFAIK, Brian's considered decision, was his, I am aware of no 
discussion of that particular point of view about var (and as noted 
above, it simply doesn't work), it also, AFAICS confuses what happens 
(implementation) from what should happen (which is easy to do, because 
with most of the methods, either S3 or S4 there is very little written 
about what should happen).

   That said, there has been some relatively open discussion on one 
solution to this problem, and I am hopeful that we will have something 
in place before the end of July.

   A big problem with S4 generics is who owns them, and what seems to be 
a reasonable medium term solution is to provide a package that lives 
slightly above base in the search path that will hold generic functions 
for any base functions that do not have them. Authors of add on packages 
can then at least share a common generic when that is appropriate. But 
do realize that there are lots of reasons to have generics with the same 
name, in different packages that are not compatible, and normal scoping 
rules apply. For example the XML package has a generic function addNode, 
as does the graph package, and they are not compatible, nor should they 
be. Anyone wanting to use both packages (and I often do) needs to manage 
the name conflicts (and that is where namespaces are essential).

best wishes
   Robert



 
 The tradeoff is false because unit tests are a better way to assure safety.
 If the major packages (like stats) had a suite of tests, a package developer
 could load his own package, run all the unit tests, and see if he broke
 something.  If it turns out that he broke something that wasn't covered by the
 tests, he could create a new test for that and submit it somewhere, perhaps
 on the R Wiki. 
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] buglet in terms calculations

2007-04-08 Thread Robert Gentleman
Hi,
   Vince and I have noticed a problem with non-syntactic names in data 
frames and some modeling code (but not all modeling code).

   The following, while almost surely as documented could be a bit more 
helpful:

  m = matrix(rnorm(100), nc=10)
  colnames(m) = paste(1:10, letters[1:10], sep=_)

  d = data.frame(m, check.names=FALSE)

  f = formula(`1_a` ~ ., data=d)

  tm = terms(f, data=d)

  ##failure here, as somehow back-ticks have become part of the name
  ##not a quoting mechanism
  d[attr(tm, term.labels)]

   The variable attribute, in the terms object, keeps them as quotes, so 
modeling code that uses that attribute seems fine, but code that uses 
the term.labels fails. In particular, it seems (of those tested) that 
glm, lda, randomForest seem to work fine, while nnet, rpart can't
handle nonsyntactic names in formulae as such

   In particlar, rpart contains this code:

  lapply(m[attr(Terms, term.labels)], tfun)

   which fails for the reasons given.


  One way to get around this, might be to modify the do_termsform code,
right now we have:
PROTECT(varnames = allocVector(STRSXP, nvar));
 for (v = CDR(varlist), i = 0; v != R_NilValue; v = CDR(v))
 SET_STRING_ELT(varnames, i++, STRING_ELT(deparse1line(CAR(v), 
 0), 0));

  and then for term.labels, we copy over the varnames (with :, as 
needed) and perhaps we need to save the unquoted names somewhere?

  Or is there some other approach that will get us there? Certainly 
cleaning up the names via
   cleanTick = function(x) gsub(`, , x)

  works, but it seems a bit ugly, and it might be better if the modeling 
code was modified.

   best wishes



-- 

Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] boundary case anomaly

2007-04-08 Thread Robert Gentleman
Hi,
  Any reason these should be different?

  x=matrix(0, nr=0, nc=3)
  colnames(x) = letters[1:3]
  data.frame(x)
#[1] a b c
#0 rows (or 0-length row.names)
  y=vector(list, length=3)
  names(y) = letters[1:3]
  data.frame(y)
#NULL data frame with 0 rows


both should have names (the second one does not) and why print something 
different for y?

  The two of the last three examples refer to a NULL data frame, eg
   (d00 - d0[FALSE,])  # NULL data frame with 0 rows

  but there is no description of what a NULL data frame should be (zero 
rows or zero columns, or either or both - and why a special name?)


  best wishes
Robert


-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] possible bug in model.frame.default

2007-03-07 Thread Robert Gentleman
Please update to the latest snapshot
R version 2.5.0 Under development (unstable) (2007-03-05 r40816)
where all is well,

Thibaut Jombart wrote:
 Dear list,
 
 I may have found a bug in model.frame.default (called by the lm function).
 The problem arises in my R dev version but not in my R 2.4.0.
 Here is my config :
 
   version

 _  
 platform   
 x86_64-unknown-linux-gnu   
 arch   
 x86_64 
 os 
 linux-gnu  
 system x86_64, 
 linux-gnu  
 status Under development 
 (unstable)   
 major  
 2  
 minor  
 5.0
 year   
 2007   
 month  
 03 
 day
 04 
 svn rev
 40813  
 language   
 R  
 version.string R version 2.5.0 Under development (unstable) (2007-03-04 
 r40813)
 
 Now a simple example to (hopefully) reproduce the bug (after a 
 rm(list=ls())):
 
   dat=data.frame(y=rnorm(10),x1=runif(10),x2=runif(10))
   weights=1:10/(sum(1:10))
   form - as.formula(y~x1+x2)
 # here is the error
   lm(form,data=dat,weights=weights)
 Erreur dans model.frame(formula, rownames, variables, varnames, extras, 
 extranames,  :
 type (closure) incorrect pour la variable '(weights)'
 
 (sorry, error message is in French)
 
 As I said, these commands works using R.2.4.0 (same machine, same OS).
 Moreover, the following commands work:
   temp=weights
   lm(form,data=dat,weights=temp)
 
 This currently seems to cause a check fail in the ade4 package. I tried 
 to find out where the bug came from: all I found is the (potential) bug 
 comes from model.frame.default, and more precisely:
 debug: data - .Internal(model.frame(formula, rownames, variables, 
 varnames,
 extras, extranames, subset, na.action))
 Browse[1]
 Erreur dans model.frame(formula, rownames, variables, varnames, extras, 
 extranames,  :
 type (closure) incorrect pour la variable '(weights)'
 
 I couldn't go further because of the .Internal. I tried to googlise 
 this, but I found no such problem reported recently.
 
 Can anyone tell if this is actually a bug? (In case not, please tell me 
 where I got wrong).
 
 Regards,
 
 Thibaut.
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish list

2007-01-01 Thread Robert Gentleman
 programs.
 A simpler fix for this would be for you to define a wrapper for R CMD
 that installed the R tools path before executing, and uninstalls it
 afterwards.  But this is unnecessary for most people, because
 Microsoft's find.exe is pretty rarely used.

 
 Anyone who uses batch files will use it quite a bit.  It certainly causes
 me problems on an ongoing basis and is an unacceptable conflict in
 my opinion.
 
 I realize that its not entirely of R's doing but it would be best if R did not
 make it worse by requiring the use of find.
 
 13. Make upper/lower case of simplify/SIMPLIFY consistent on all
 apply commands and add a simplify= arg to by.
 It would have been good not to introduce the inconsistency years ago,
 but it's too late to change now.
 
 Its not too late to add it to by().
 
 Also note that the gsubfn package does have a workaround for this.  In gsubfn
 one can preface any R function with fn$ and if that is done then the function
 can have a simplify= argument which fn$ intercepts and processes.  e.g.
 
 library(gsubfn)
 fn$by(CO2[4:5], CO2[2], x ~ coef(lm(uptake ~ ., x)), simplify = rbind)
 
 fn$ can also interpret formulas as functions (and does quasi perl 
 interpolation
 in strings) so the formula in the third argument is regarded to be the same
 as the anonymous function:  function(x) coef(lm(uptake ~., x)) .
 
 More examples are in the gsubfn vignette.
 
 14. better reporting of location of errors and warnings in R CMD check.
 This is in the works, but probably not for 2.5.x.
 
 Great.  This will be very welcome.
 
 15. tcl tile library (needs tcl 8.5 or to be compiled in with 8.4)

 16. extend aggregate to allow vector valued functions:
 aggregate(CO2[4:5], CO2[1:2], function(x) c(mean = mean(x), sd = sd(x)))
 [summaryBy in doBy package and cast in reshape package can already
 do similar things but this seems sufficiently fundamental that it
 ought to be in the base of R]

 17. All OSes should support input= arg of system.

 My previous New Year wishlists are here:

 https://www.stat.math.ethz.ch/pipermail/r-devel/2006-January/035949.html
 https://www.stat.math.ethz.ch/pipermail/r-help/2005-January/061984.html
 https://www.stat.math.ethz.ch/pipermail/r-devel/2004-January/028465.html
 To anyone still reading:

 Many of the suggestions above would improve R, but they're unlikely to
 happen unless someone volunteers to do them.  I'd suggest picking
 whichever one of these or some other list that you think is the highest
 priority, and post a specific proposal to this list about how to do it.
  If you get a negative response or no response, move on to the next
 one, or put it into a contributed package instead.

 
 I think it works best when contributors develop their software in
 contributed packages since it avoids squabbles with the core group.
 
 The core group can then integrate these into R itself if it seems warranted.
 
 When you make the proposal, consider how much work you're asking other
 people to do, and how much you're volunteering to do yourself.  If
 you're asking others to do a lot, then the suggestion had better be
 really valuable to *them*.

 
 The implementation effort should not be a significant consideration in
 generating wish lists.What should be considered is what is really needed.
 Its better to know what you need and then later decide whether to implement
 it or not than to suppress articulating the need.  Otherwise the development
 is driven by what is easy to do rather than what is needed.
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A possible improvement to apropos

2006-12-14 Thread Robert Gentleman
I would vastly prefer apropos to be case insensitive by default. The 
point of it is to find things similar to a string, not the same as, and 
given that capitalization in R is somewhat erratic (due to many authors, 
and some of those changing their minds over the years), I find the 
current apropos of little use.

I would also, personally prefer some sort of approximate matching since 
there are different ways to spell some words, and some folks abbreviate 
parts of words.


Martin Maechler wrote:
 Hi Seth,
 
 Seth == Seth Falcon [EMAIL PROTECTED]
 on Wed, 13 Dec 2006 16:38:02 -0800 writes:
 
 Seth Hello all, I've had the following apropos alternative
 Seth in my ~/.Rprofile for some time, and have found it
 Seth more useful than the current version.  Basically, my
 Seth version ignores case when searching.
 
 Seth If others find this useful, perhaps apropos could be
 Seth suitably patched (and I'd be willing to create such a
 Seth patch).
 
 Could you live with typing 'i=T' (i.e.  ignore.case=TRUE)?
 
 In principle, I'd like to keep the default  as ignore.case=FALSE,
 since we'd really should teach the users that R 
 *is* case sensitive.
 Ignoring case is the exception in the S/R/C world, not the rule
 
 I have a patch ready which implements your suggestion
 (but not quite with the code below), but as said, not as
 default.
 
 Martin
 
 Seth + seth
 
 Seth Here is my version of apropos:
 
 APROPOS - function (what, where = FALSE, mode = any) 
 {
 if (!is.character(what))
   stop(argument , sQuote(what),  must be a character vector)
 x - character(0)
 check.mode - mode != any
 for (i in seq(search())) {
 contents - ls(pos = i, all.names = TRUE)
 found - grep(what, contents, ignore.case = TRUE, value = TRUE)
 if (length(found)) {
 if (check.mode) {
 found - found[sapply(found, function(x) {
 exists(x, where = i, mode = mode, inherits = FALSE)
 })]
 }
 numFound - length(found)
 x - c(x, if (where)
structure(found, names = rep.int(i, numFound)) else 
 found)
 }
 }
 x
 }
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] data frame subset patch, take 2

2006-12-13 Thread Robert Gentleman
Hi,
   We had the names discussion and, AFAIR, the idea that someone might 
misinterpret the output as suggesting that one could index by number, 
seemed to kill it. A more reasonable argument against is that names- is 
problematic.

You can use $, [[ (with character subscripts), and yes ls does sort of 
do what you want (but sorts the values, not sure if that is good). I 
think it is also inefficient in that I believe it copies the CHARSXP's 
(not sure we really need to do that, but I have not had time to sort out 
the issues). And there is an eapply as well, so ls() is not always needed.

mget can be used to retrieve multiple values (and should be much more 
efficient than multiple calls to get). There is no massign (no one seems 
to have asked for it), and better design choice might be to vectorize 
assign.

best wishes
   Robert





Vladimir Dergachev wrote:
 On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote:
 Vladimir Dergachev wrote:
 2. It would be nice to have true hashed arrays in R (i.e. O(1) access
 times). So far I have used named lists for this, but they are O(n):
 new.env(hash=TRUE) with get/assign/exists works ok.  But I suspect its
 just too easy to use named lists because it is easy, and that has bad
 performance ramifications for user code (perhaps the R developers are
 more vigilant about this for the R code itself).
 
 Cool, thank you ! 
 
 I wonder whether environments could be extended to allow names() to work 
 (altough I see that ls() does the same function) and to allow for(i in E) 
 loops.
 
thank you
 
Vladimir Dergachev
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] data frame subset patch, take 2

2006-12-13 Thread Robert Gentleman


Robert Gentleman wrote:
 Hi,
We had the names discussion and, AFAIR, the idea that someone might 
 misinterpret the output as suggesting that one could index by number, 
 seemed to kill it. A more reasonable argument against is that names- is 
 problematic.
 
 You can use $, [[ (with character subscripts), and yes ls does sort of 
 do what you want (but sorts the values, not sure if that is good). I 
 think it is also inefficient in that I believe it copies the CHARSXP's 
 (not sure we really need to do that, but I have not had time to sort out 

  I misremembered - it does not copy CHARSXPs.

 the issues). And there is an eapply as well, so ls() is not always needed.
 
 mget can be used to retrieve multiple values (and should be much more 
 efficient than multiple calls to get). There is no massign (no one seems 
 to have asked for it), and better design choice might be to vectorize 
 assign.
 
 best wishes
Robert
 
 
 
 
 
 Vladimir Dergachev wrote:
 On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote:
 Vladimir Dergachev wrote:
 2. It would be nice to have true hashed arrays in R (i.e. O(1) access
 times). So far I have used named lists for this, but they are O(n):
 new.env(hash=TRUE) with get/assign/exists works ok.  But I suspect its
 just too easy to use named lists because it is easy, and that has bad
 performance ramifications for user code (perhaps the R developers are
 more vigilant about this for the R code itself).
 Cool, thank you ! 

 I wonder whether environments could be extended to allow names() to work 
 (altough I see that ls() does the same function) and to allow for(i in E) 
 loops.

thank you

Vladimir Dergachev

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] caching frequently used values

2006-12-13 Thread Robert Gentleman
the idea you are considering is also, at times, referred to as 
memoizing. I would not use a list, but rather an environment, and 
basically you implement something that first looks to see if there is a 
value, and if not, compute and store. It can speed things up a lot in 
some examples (and slow them down a lot in others).

Wikipedia amongst other sources:
  http://en.wikipedia.org/wiki/Memoization

Environments have advantages over lists here (if there are lots of 
matrices the lookup can be faster - make sure you use hash=TRUE), and 
reference semantics, which you probably want.

Tamas K Papp wrote:
 Hi,
 
 I am trying to find an elegant way to compute and store some
 frequently used matrices on demand.  The Matrix package already uses
 something like this for storing decompositions, but I don't know how
 to do it.
 
 The actual context is the following:
 
 A list has information about a basis of a B-spline space (nodes,
 order) and gridpoints at which the basis functions would be evaluated
 (not necessarily the nodes).  Something like this:
 
 bsplinegrid - list(nodes=1:8,order=4,grid=seq(2,5,by=.2))
 
 I need the design matrix (computed by splineDesign) for various
 derivatives (not necessarily known in advance), to be calculated by
 the function
 
 bsplinematrix - function(bsplinegrid, deriv=0) {
   x - bsplinegrid$grid
   Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order,
   derivs = rep(deriv,length(x
 }
 
 However, I don't want to call splineDesign all the time.  A smart way
 would be storing the calculated matrices in a list inside bsplinegrid.
 Pseudocode would look like this:
 
 bsplinematrix - function(bsplinegrid, deriv=0) {
   if (is.null(bsplinegrid$matrices[[deriv+1]])) {
 ## compute the matrix and put it in the list bsplinegrid$matrices,
 ## but not of the local copy
   }
   bsplinegrid$matrices[[deriv+1]]
 }
 
 My problem is that I don't know how to modify bsplinegrid$matrices
 outside the function -- assignment inside would only modify the local
 copy.
 
 Any help would be appreciated -- I wanted to learn how Matrix does it,
 but don't know how to display the source with s3 methods (getAnywhere
 doesn't work).
 
 Tamas
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] caching frequently used values

2006-12-13 Thread Robert Gentleman
e1 = new.env(hash=TRUE)

e1[[1]] = whateveryouwant

ie. just transform to characters, but I don't see why you want to do 
that - surely there are more informative names to be used -



Tamas K Papp wrote:
 Hi Robert,
 
 Thanks for your answer.  I would create and environment with
 new.env(), but how can I assign and retrieve values based on a
 numerical index (the derivative)?  The example of the help page of
 assign explicitly shows that assign(a[1]) does not work for this
 purpose.
 
 Thanks,
 
 Tamas
 
 On Wed, Dec 13, 2006 at 01:54:28PM -0800, Robert Gentleman wrote:
 
 the idea you are considering is also, at times, referred to as 
 memoizing. I would not use a list, but rather an environment, and 
 basically you implement something that first looks to see if there is a 
 value, and if not, compute and store. It can speed things up a lot in 
 some examples (and slow them down a lot in others).

 Wikipedia amongst other sources:
  http://en.wikipedia.org/wiki/Memoization

 Environments have advantages over lists here (if there are lots of 
 matrices the lookup can be faster - make sure you use hash=TRUE), and 
 reference semantics, which you probably want.

 Tamas K Papp wrote:
 Hi,

 I am trying to find an elegant way to compute and store some
 frequently used matrices on demand.  The Matrix package already uses
 something like this for storing decompositions, but I don't know how
 to do it.

 The actual context is the following:

 A list has information about a basis of a B-spline space (nodes,
 order) and gridpoints at which the basis functions would be evaluated
 (not necessarily the nodes).  Something like this:

 bsplinegrid - list(nodes=1:8,order=4,grid=seq(2,5,by=.2))

 I need the design matrix (computed by splineDesign) for various
 derivatives (not necessarily known in advance), to be calculated by
 the function

 bsplinematrix - function(bsplinegrid, deriv=0) {
  x - bsplinegrid$grid
  Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order,
  derivs = rep(deriv,length(x
 }

 However, I don't want to call splineDesign all the time.  A smart way
 would be storing the calculated matrices in a list inside bsplinegrid.
 Pseudocode would look like this:

 bsplinematrix - function(bsplinegrid, deriv=0) {
  if (is.null(bsplinegrid$matrices[[deriv+1]])) {
## compute the matrix and put it in the list bsplinegrid$matrices,
## but not of the local copy
  }
  bsplinegrid$matrices[[deriv+1]]
 }

 My problem is that I don't know how to modify bsplinegrid$matrices
 outside the function -- assignment inside would only modify the local
 copy.

 Any help would be appreciated -- I wanted to learn how Matrix does it,
 but don't know how to display the source with s3 methods (getAnywhere
 doesn't work).

 Tamas

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 -- 
 Robert Gentleman, PhD
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M2-B876
 PO Box 19024
 Seattle, Washington 98109-1024
 206-667-7700
 [EMAIL PROTECTED]
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] data frame subset patch, take 2

2006-12-12 Thread Robert Gentleman
Hi,
  I tried take 1, and it failed. I have been traveling (and with 
Martin's changes also waiting for things to stabilize) before trying 
take 2, probably later this week and I will send an email if it goes in. 
Anyone wanting to try it and run R through check and check-all is 
welcome to do so and report success or failure.

  best wishes
Robert


Martin Maechler wrote:
 Marcus == Marcus G Daniels [EMAIL PROTECTED]
 on Tue, 12 Dec 2006 09:05:15 -0700 writes:
 
 Marcus Vladimir Dergachev wrote:
  Here is the second iteration of data frame subset patch.
  It now passes make check on both 2.4.0 and 2.5.0 (svn as
  of a few days ago).  Same speedup as before.
  
 Marcus Hi,
 
 Marcus I was wondering if this patch would make it into the
 Marcus next release.  I don't see it in SVN, but it's hard
 Marcus to be sure because the mailing list apparently
 Marcus strips attachments.  If it isn't in, or going to be
 Marcus in, is this patch available somewhere else?
 
 I was wondering too.
   http://www.r-project.org/mail.html
 explains what kind of attachments are allowed on R-devel.
 
 I'm particularly interested, since during the last several days
 I've made (somewhat experimental) changes to R-devel,
 which makes some dealings with large data frames that have
 trivial rownames (those represented as  1:nrow(.))
 much more efficient.
 
 Notably, as.matrix() of such data frames now no longer produces
 huge row names, and e.g.  dim(.) of such data frames has become
 lightning fast [compared to what it was].
 
 Some measurements:
 
 N - 1e6
 set.seed(1)
 ## we round (for later dump().. reasons)
 x - round(rnorm(N),2)
 y - round(rnorm(N),2)
 mOrig - cbind(x = x, y = y)
 df - data.frame(x = x, y = y)
 mNew - as.matrix(df)
 (sizes - sapply(list(mOrig=mOrig, df=df, mNew=mNew), object.size))
 ## R-2.4.0 (64-bit):
 ##mOrig   df mNew
 ## 16000520 16000776 72000560
 
 ## R-2.4.1 beta (32-bit):
 ##mOrig   df mNew
 ## 16000296 16000448 52000320
 
 ## R-pre-2.5.0 (32-bit):
 ##mOrig   df mNew
 ## 16000296 16000448 16000296
 
 ##
 
 N - 1e6
 df - data.frame(x = 0+ 1:N, y = 1+ 1:N)
 system.time(for(i in 1:1000) d - dim(df))
 
 ## R-2.4.1 beta (32-bit) [deb1]:
 ## [1] 1.920 3.748 7.810 0.000 0.000
 
 ## R-pre-2.5.0 (32-bit) [deb1]:
 ##user  system elapsed
 ##   0.012   0.000   0.011
 
 
 --- --- --- --- --- --- --- --- --- --- 
 
 However, currently
 
   df[2,] ## still internally produces the  character(1e6)  row names!
 
 something I think we should eliminate as well,
 i.e., at least make sure that only  seq_len(1e6) is internally
 produced and not the character vector.
 
 Note however that some of these changes are backward
 incompatible. I do hope that the changes gaining efficiency
 for such large data frames are worth some adaption of
 current/old R source code..
 
 Feedback on this topic is very welcome!
 
 Martin
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error condition in evaluating a promise

2006-10-18 Thread Robert Gentleman


Simon Urbanek wrote:
 Seth,
 
 thanks for the suggestions.
 
 On Oct 18, 2006, at 11:23 AM, Seth Falcon wrote:
 
 Simon Urbanek [EMAIL PROTECTED] writes:
 thanks, but this is not what I want (the symbols in the environment
 are invisible outside) and it has nothing to do with the question I
 posed: as I was saying in the previous e-mail the point is to have
 exported variables in a namespace, but their value is known only
 after the namespace was attached (to be precise I'm talking about
 rJava here and many variables are valid only after the VM was
 initialized - using them before is an error).
 We have a similar use case and here is one workaround:

 Define an environment in your name space and use it to store the
 information that you get after VM-init.

 There are a number of ways to expose this:

 * Export the env and use vmEnv$foo

 * Provide accessor functions, getVmFooInfo()

 * Or you can take the accessor function approach a bit further to make
   things look like a regular variable by using active bindings.  I can
   give more details if you want.  We are using this in the BSgenome
   package in BioC.

 
 I'm aware of all three solutions and I've tested all three of them  
 (there is in fact a fourth one I'm actually using, but I won't go  
 into detail on that one ;)). Active bindings are the closest you can  
 get, but then the value is retrieved each time which I would like to  
 avoid.
 
 The solution with promises is very elegant, because it guarantees  
 that on success the final value will be locked. It also makes sense  
 semantically, because the value is determined by code bound to the  
 variable and premature evaluation is an error - just perfect.
 
 Probably I should have been more clear in my original e-mail - the  
 question was not to find a work-around, I have plenty of them ;), the  
 question was whether the behavior of promises under error conditions  
 is desirable or not (see subject ;)). For the internal use of  
 promises it is irrelevant, because promises as function arguments are  
 discarded when an error condition arises. However, if used in the  
 wild, the behavior as described would be IMHO more useful.
 

   Promises were never intended for use at the user level, and I don't 
think that they can easily be made useful at that level without exposing 
a lot of stuff that cannot easily be explained/made bullet proof.  As 
Brian said, you have not told us what you want, and I am pretty sure 
that there are good solutions available at the R level for most problems.

   Although the discussion has not really started, things like dispatch 
in the S4 system are likely to make lazy evaluation a thing of the past 
since it is pretty hard to dispatch on class without knowing what the 
class is. That means, that as we move to more S4 methods/dispatch we 
will be doing more evaluation of arguments.

best wishes
  Robert


 Cheers,
 Simon
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: names(someEnv) same as ls(someEnv)

2006-10-15 Thread Robert Gentleman


Duncan Murdoch wrote:
 On 10/15/2006 2:48 PM, Seth Falcon wrote:
 Hi,

 I would be nice if names() returned the equivalent of ls() for
 environments.
 
 Wouldn't that just confuse people into thinking that environments are 
 vectors?  Wouldn't it then be reasonable to assume that 
 env[[which(names(env) == foo)]] would be a synonym for env$foo?

  absolutely not - environments can only be subscripted by name, not by 
logicals or integer subscripts - so I hope that most users would figure 
that one out


 
 I don't see why this would be nice:  why not just use ls()?

   why? environments do get used, by many as vectors (well hash tables), 
modulo the restrictions on subscripting and the analogy is quite useful 
and should be encouraged IMHO.

  Robert

 
 Duncan Murdoch
 
 --- a/src/main/attrib.c
 +++ b/src/main/attrib.c
 @@ -687,6 +687,8 @@ SEXP attribute_hidden do_names(SEXP call
  s = CAR(args);
  if (isVector(s) || isList(s) || isLanguage(s))
 return getAttrib(s, R_NamesSymbol);
 +if (isEnvironment(s))
 +return R_lsInternal(s, 0);
  return R_NilValue;
  }


 + seth

 --
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
 http://bioconductor.org

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strange behaviour of the [[ operator

2006-09-30 Thread Robert Gentleman
True, re name matching, but I think we might want to consider a warning 
if they are supplied as the user may not be getting what they expect, 
regardless of the documentation


Peter Dalgaard wrote:
 Seth Falcon [EMAIL PROTECTED] writes:
 
 Similar things happen in many similar circumstances.
 Here's a similar thing:
 
 Not really, no?
  
 v - 1:5
 v
 [1] 1 2 3 4 5
 v[mustBeDocumentedSomewhere=3]
 [1] 3

 And this can be confusing if one thinks that subsetting is really a
 function and behaves like other R functions w.r.t. to treatment of
 named arguments:

 m - matrix(1:4, nrow=2)
 m
  [,1] [,2]
 [1,]13
 [2,]24
 m[j=2]
 [1] 2
 
 Or even
 m[j=2,i=]
 [1] 2 4
 
 However, what would the argument names be in the 2-dim case? i, j are
 used only in help([) and that page is quite specific about
 explaining that named matching doesn't work. 
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about substitute() and function def

2006-09-14 Thread Robert Gentleman


Duncan Murdoch wrote:
 On 9/14/2006 3:01 PM, Seth Falcon wrote:
 Hi,

 Can someone help me understand why

   substitute(function(a) a + 1, list(a=quote(foo)))

 gives

   function(a) foo + 1

 and not

   function(foo) foo + 1

 The man page leads me to believe this is related to lazy evaluation of
 function arguments, but I'm not getting the big picture.
 
 I think it's the same reason that this happens:
 
   substitute(c( a = 1, b = a), list(a = quote(foo)))
 c(a = 1, b = foo)
 
 The a in function(a) is the name of the arg, it's not the arg itself 

yes, but the logic seems to be broken. In Seth's case there seems to be 
no way to use substitute to globally change an argument and all 
instances throughout a function, which seems like a task that would be 
useful.

even here, I would have expected all instances of a to change, not some

 (which is missing).  Now a harder question to answer is why this happens:
 
   substitute(function(a=a) 1, list(a=quote(foo)))
 function(a = a) 1

   a bug for sure


 I would have expected to get function(a = foo) 1.
 
 Duncan Murdoch
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Bug/problem reporting: Possible to modify posting guide FAQ?

2006-08-28 Thread Robert Gentleman
Hi,
   I guess the question often comes down to whether it is a bug report, 
or a question. If you know it is a bug, and have a complete and correct 
example where the obviously incorrect behavior occurs and you are 
positive that the problem is the package then sending it to the 
maintainer is appropriate.  When I get these I try to deal with them. 
Real bug reports that go to the mailing list may be missed so in my 
opinion it would be best to cc the maintainer and we will amend the FAQ 
in that direction. If instead you are asking a question, of the form, is 
this a bug, or why is this happening, then for BioC at least, it is 
better to post directly to the list, as there are many folks who can 
help and you are more likely to get an answer.  When I get one of these 
emails I always refer the person to the mailing lists.  I see little 
problem with being redirected by a maintainer to the mailing list if 
they feel that the question is better asked there.

Bioconductor is different from R, clearly our mailing list has to be 
more about the constituent packages, since we will direct questions 
about R to the appropriate R mailing lists.  R mailing lists tend to be 
about R, so asking about a specific package there (among the 1000 or so) 
often does not get you very far, but sometimes it does.


  best wishes
Robert


Steven McKinney wrote:
 If users post a bug or problem issue to an R-based news group
 (R-devel, R-help, BioC - though BioC is far more forgiving)
 they get yelled at for not reading the posting guide
 and FAQ.
 
 Please *_do_* read the FAQ, the posting guide, ...
 the yellers do say.  So I read the BioC FAQ and it says...
 
 http://www.bioconductor.org/docs/faq/
 
 Bug reports on packages should perhaps be sent to the 
  package maintainer rather than to r-bugs.
 
 
 So I send email to a maintainer, who I believe rightly points out
 
best to send this kind of questions to the bioc mailing list, rather
 than to myself privately, because other people might (a) also have
 answers or (b) benefit from the questions  answers.
 
 Could the FAQ possibly be revised to some sensible combination
 that generates less finger pointing, such as
 
Bug reports on packages should be sent to the Bioconductor mailing list, 
 and sent or copied to the package maintainer, rather than to r-bugs.
 
 or
 
Bug reports on packages should be sent to the package maintainer, 
 and copied to the Bioconductor mailing list, rather than to r-bugs.
 
 
 Could the posting guides to R-help and R-devel do something
 similar?
 
 
 Sign me
 Tired of all the finger pointing
 
 
 http://www.r-project.org/posting-guide.html
 
  If the question relates to a contributed package , e.g., one downloaded 
   from CRAN, try contacting the package maintainer first. You can also 
   use find(functionname) and packageDescription(packagename) to 
   find this information. Only send such questions to R-help or R-devel if 
   you get no reply or need further assistance. This applies to both 
   requests for help and to bug reports.
 
 
 How about
 
 If the question relates to a contributed package , e.g., one downloaded 
 from CRAN, email the list and be sure to additionally send to or copy to 
 the package maintainer as well. You can use find(functionname) 
 and packageDescription(packagename) to find this information. 
 Only send such questions to one of R-help or R-devel. This applies to both 
 requests for help and to bug reports.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4 'object is not subsettable' in prototype

2006-08-21 Thread Robert Gentleman


Prof Brian Ripley wrote:
 On Mon, 21 Aug 2006, Seth Falcon wrote:
 
 John Chambers [EMAIL PROTECTED] writes:

 When I was introducing the special type for S4 objects, my first
 inclination was to have length(x) for those objects be either NA or an
 error, along the lines that intuitively length(x) means the number of
 elements in the vector-style object x.  However, that change quickly
 was demonstrated to need MANY revisions to the current code.
 Perhaps some details on the required changes will help me see the
 light, but I would really like to see length(foo) be an error (no such
 method) when foo is an arbitary S4 class.
 
 According to the Blue Book p.96 every S object has a length and 'An 
 Introduction to R' repeats this.  So I believe an error is not an option.  
 Indeed, from the wording, I think code could legitimately assume length(x) 
 works and 0 = length(x) and it is an integer (but not necessarily of type 
 'integer').
 
 Certainly functions and formulae have a length (different for functions in 
 S and R, as I recall), and they are not 'vector-style'.

   Yes, but that is because that in S(-Plus), and not in R, virtually 
every object was an instance of a generic vector, including functions 
(formulas were white book, not blue, and I'm still not sure that 
indexing of them makes sense, but I am sure that indexing functions does 
not; it suggests, at least to me, that we want to emphasize 
implementation over semantics).

   Now, in R, since not everything is a generic vector, it is less clear 
what to do in some cases, and I am not going to argue too hard against 
everything having a length, but I think the number 1 is a much better 
choice than the number 0.  (the compromise solution of 0.5 has some 
charm :-)

   I am also scared that such reasoning will lead one to believe that 
indexing these things using [, or similar should work, and that leads to 
major problems, since I lost the argument about not indexing outside of 
array bounds some years ago. What would be sensible in that case? 
Certainly not what currently happens with S4 objects (in R release).

   best wishes
 Robert



 
 I have encountered bugs due to accidental dispatch -- functions
 returning something other than an error because of the zero-length
 list implementation of S4.  It would not be surprising if some of the
 breakage caused by removing this feature identifies real bugs.

 I was thinking that one of the main advatnages of the new S4 type was
 to get away from this sort of accidental dispatch.  Not trying to be
 snide, but what is useful about getting a zero for length(foo)?  The
 main use I can think of is in trying to identify S4 instances, but
 happily, that is no longer needed.

 + seth
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] configure on mac

2006-08-13 Thread Robert Gentleman
Hi,
  I think Simon and Stefano are both off line for a little while. I can 
confirm that an upgrade of Xcode to either 2.3 or the very recent 2.4 is 
needed in most cases, either seems to work so probably 2.4 is the better 
choice.

   best wishes
 Robert


Prof Brian Ripley wrote:
 I gather you need to update your Xtools: others have had similar problems.
 (If they are online you will no doubt get more complete information.)
 
 On Sat, 12 Aug 2006, roger koenker wrote:
 
 I'm having trouble making yesterday's R-devel on my macs.

 ./configure seems fine, but eventually in make I get:

 gcc -dynamiclib -Wl,-macosx_version_min -Wl,10.3 -undefined  
 dynamic_lookup -single_module -multiply_defined suppress -L/sw/lib -L/ 
 usr/local/lib -install_name libR.dylib -compatibility_version 2.4.0  - 
 current_version 2.4.0  -headerpad_max_install_names -o libR.dylib  
 Rembedded.o CConverters.o CommandLineArgs.o Rdynload.o Renviron.o  
 RNG.o apply.o arithmetic.o apse.o array.o attrib.o base.o bind.o  
 builtin.o character.o coerce.o colors.o complex.o connections.o  
 context.o cov.o cum.o dcf.o datetime.o debug.o deparse.o deriv.o  
 dotcode.o dounzip.o dstruct.o duplicate.o engine.o envir.o errors.o  
 eval.o format.o fourier.o gevents.o gram.o gram-ex.o graphics.o  
 identical.o internet.o iosupport.o lapack.o list.o localecharset.o  
 logic.o main.o mapply.o match.o memory.o model.o names.o objects.o  
 optim.o optimize.o options.o par.o paste.o pcre.o platform.o plot.o  
 plot3d.o plotmath.o print.o printarray.o printvector.o printutils.o  
 qsort.o random.o regex.o registration.o relop.o rlocale.o saveload.o  
 scan.o seq.o serialize.o size.o sort.o source.o split.o sprintf.o  
 startup.o subassign.o subscript.o subset.o summary.o sysutils.o  
 unique.o util.o version.o vfonts.o xxxpr.o   `ls ../appl/*.o ../nmath/ 
 *.o ../unix/*.o  2/dev/null|grep -v /ext-` -framework vecLib - 
 lgfortran -lgcc_s -lSystemStubs -lmx -lSystem  ../extra/zlib/ 
 libz.a ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a  -lintl - 
 liconv -Wl,-framework -Wl,CoreFoundation -lreadline  -lm -liconv
 /usr/bin/libtool: unknown option character `m' in: -macosx_version_min
 Usage: /usr/bin/libtool -static [-] file [...] [-filelist listfile 
 [,dirname]] [-arch_only arch] [-sacLT]
 Usage: /usr/bin/libtool -dynamic [-] file [...] [-filelist listfile 
 [,dirname]] [-arch_only arch] [-o output] [-install_name name] [- 
 compatibility_version #] [-current_version #] [-seg1addr 0x#] [- 
 segs_read_only_addr 0x#] [-segs_read_write_addr 0x#] [-seg_addr_table  
 filename] [-seg_addr_table_filename file_system_path] [-all_load]  
 [-noall_load]
 make[3]: *** [libR.dylib] Error 1
 make[2]: *** [R] Error 2
 make[1]: *** [R] Error 1
 make: *** [R] Error 1

 This was ok  as of my last build which was:

   version
 _
 platform   powerpc-apple-darwin8.7.0
 arch   powerpc
 os darwin8.7.0
 system powerpc, darwin8.7.0
 status Under development (unstable)
 major  2
 minor  4.0
 year   2006
 month  07
 day28
 svn rev38710
 language   R
 version.string R version 2.4.0 Under development (unstable)  
 (2006-07-28 r38710)

 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email   [EMAIL PROTECTED]   Department of Economics
 vox:217-333-4558University of Illinois
 fax:217-244-6678Champaign, IL 61820

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] HTTP User-Agent header

2006-07-31 Thread Robert Gentleman
should appear at an R-devel near you...
thanks Seth


Seth Falcon wrote:
 Robert Gentleman [EMAIL PROTECTED] writes:
 OK, that suggests setting at the options level would solve both of your 
 problems and that seems like the best approach. I don't really want to 
 pass this around as a parameter through the maze of functions that might 
 actually download something if we don't have to.
 
 I have an updated patch that adds an HTTPUserAgent option.  The
 default is a string like:
 
 R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
 
 If the HTTPUserAgent option is NULL, no user agent header is added to
 HTTP requests (this is the current behavior).  This option allows R to
 use an arbitrary user agent header.
 
 The patch adds two non-exported functions to utils: 
1) defaultUserAgent - returns a string like above
2) makeUserAgent - formats content of HTTPUserAgent option for use
   as part of an HTTP request header.
 
 I've tested on OSX and Linux, but not on Windows.  When USE_WININET is
 defined, a user agent string of R was already being used.  With this
 patch, the HTTPUserAgent options is used.  I'm unsure if NULL is
 allowed.
 
 Also, in src/main/internet.c there is a comment:
   Next 6 are for use by libxml, only
 and then a definition for R_HTTPOpen.  Not sure how/when these get
 used.  The user agent for these calls remains unspecified with this
 patch.
 
 + seth
 
 
 Patch summary:
  src/include/R_ext/R-ftp-http.h   |2 +-
  src/include/Rmodules/Rinternet.h |2 +-
  src/library/base/man/options.Rd  |5 +
  src/library/utils/R/readhttp.R   |   25 +
  src/library/utils/R/zzz.R|3 ++-
  src/main/internet.c  |2 +-
  src/modules/internet/internet.c  |   37 +
  src/modules/internet/nanohttp.c  |8 ++--
  8 files changed, 66 insertions(+), 18 deletions(-)
 
 
 
 Index: src/include/R_ext/R-ftp-http.h
 ===
 --- src/include/R_ext/R-ftp-http.h(revision 38715)
 +++ src/include/R_ext/R-ftp-http.h(working copy)
 @@ -36,7 +36,7 @@
  int   R_FTPRead(void *ctx, char *dest, int len);
  void  R_FTPClose(void *ctx);
  
 -void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, int 
 cacheOK);
 +void *   RxmlNanoHTTPOpen(const char *URL, char **contentType, const 
 char *headers, int cacheOK);
  int  RxmlNanoHTTPRead(void *ctx, void *dest, int len);
  void RxmlNanoHTTPClose(void *ctx);
  int  RxmlNanoHTTPReturnCode(void *ctx);
 Index: src/include/Rmodules/Rinternet.h
 ===
 --- src/include/Rmodules/Rinternet.h  (revision 38715)
 +++ src/include/Rmodules/Rinternet.h  (working copy)
 @@ -9,7 +9,7 @@
  typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
  typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, 
 char *mode); 
  
 -typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
 +typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, 
 const int cacheOK);
  typedef int(*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
  typedef void   (*R_HTTPCloseRoutine)(void *ctx);
 
 Index: src/main/internet.c
 ===
 --- src/main/internet.c   (revision 38715)
 +++ src/main/internet.c   (working copy)
 @@ -129,7 +129,7 @@
  {
  if(!initialized) internet_Init();
  if(initialized  0)
 - return (*ptr-HTTPOpen)(url, 0);
 + return (*ptr-HTTPOpen)(url, NULL, 0);
  else {
   error(_(internet routines cannot be loaded));
   return NULL;
 Index: src/library/utils/R/zzz.R
 ===
 --- src/library/utils/R/zzz.R (revision 38715)
 +++ src/library/utils/R/zzz.R (working copy)
 @@ -9,7 +9,8 @@
   internet.info = 2,
   pkgType = .Platform$pkgType,
   str = list(strict.width = no),
 - example.ask = default)
 + example.ask = default,
 + HTTPUserAgent = defaultUserAgent())
  extra -
  if(.Platform$OS.type == windows) {
  list(mailer = none,
 Index: src/library/utils/R/readhttp.R
 ===
 --- src/library/utils/R/readhttp.R(revision 38715)
 +++ src/library/utils/R/readhttp.R(working copy)
 @@ -6,3 +6,28 @@
  stop(transfer failure)
  file.show(file, delete.file = delete.file, title = title, ...)
  }
 +
 +
 +
 +defaultUserAgent - function()
 +{
 +Rver - paste(R.version$major, R.version$minor, sep=.)
 +Rdetails - paste(Rver, R.version$platform, R.version$arch,
 +  R.version$os)
 +paste(R (, Rdetails, ), sep=)
 +}
 +
 +
 +makeUserAgent - function(format = TRUE) {
 +agent - getOption(HTTPUserAgent)
 +if (is.null(agent

Re: [Rd] [R] HTTP User-Agent header

2006-07-28 Thread Robert Gentleman
I wonder if it would not be better to make the user agent string 
something that is configurable (at the time R is built) rather than at 
run time. This would make Seth's patch about 1% as long. Or this could 
be handled as an option. The patches are pretty extensive and allow for 
setting the agent header by setting parameters in function calls (eg 
download.files). I am not sure there is a good use case for that level 
of flexibility and the additional code is substantial.


The issue that I think arises is that there are potentially other 
systems that will be unhappy with R's identification of itself and so 
some users may also need to turn it off.

Any strong opinions?



James P. Howard, II wrote:
 On 7/28/06, Seth Falcon [EMAIL PROTECTED] wrote:
 
 I have a rough draft patch, see below, that adds a User-Agent header
 to HTTP requests made in R via download.file.  If there is interest, I
 will polish it.
 
 It looks right, but I am running under Windows without a compiler.
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] HTTP User-Agent header

2006-07-28 Thread Robert Gentleman
OK, that suggests setting at the options level would solve both of your 
problems and that seems like the best approach. I don't really want to 
pass this around as a parameter through the maze of functions that might 
actually download something if we don't have to.

I think we can provide something early next week on R-devel for folks to 
test. But I suspect that as Henrik also does, the set of sites that will 
refuse us with a User-Agent header will be much larger than those that 
James has found that refuse us without it.

best wishes
   Robert


Henrik Bengtsson wrote:
 On 7/28/06, Robert Gentleman [EMAIL PROTECTED] wrote:
 I wonder if it would not be better to make the user agent string
 something that is configurable (at the time R is built) rather than at
 run time. This would make Seth's patch about 1% as long. Or this could
 be handled as an option. The patches are pretty extensive and allow for
 setting the agent header by setting parameters in function calls (eg
 download.files). I am not sure there is a good use case for that level
 of flexibility and the additional code is substantial.


 The issue that I think arises is that there are potentially other
 systems that will be unhappy with R's identification of itself and so
 some users may also need to turn it off.

 Any strong opinions?
 
 Actually two:
 
 1) If you wish to pull down (read extract from HTML or similar) live
 data from the web, you might want to be able to immitate a certain
 browser.  For instance, if you tell some webserver you're a simple
 mobile phone or lynx, you might be able get back very clean data.
 Some servers might also block unknown web browsers.
 
 2) If the webserver of a package reprocitory decided to make use of
 the user-agent string to decide what version of the reprocitory it
 should deliver, I would like to be able to trick the server.  Why?
 Many times I found myself working on a system where I do not have the
 rights to update to the latest or the developers version of R.
 However, although I have not the very latest version of R you can do
 work.  For instance, in Bioconductor the biocLite()  co gives you
 either the stable or the developers of Bioconductor depending on your
 R version, but looking into the biocLite() code and beyond, you find
 that you actually can install a Bioconductor v1.9 package in R v2.3.1.
  It can be risky business, but if you know what you're doing, it can
 save your day (or week).
 
 Cheers
 
 Henrik
 

 James P. Howard, II wrote:
 On 7/28/06, Seth Falcon [EMAIL PROTECTED] wrote:

 I have a rough draft patch, see below, that adds a User-Agent header
 to HTTP requests made in R via download.file.  If there is interest, I
 will polish it.
 It looks right, but I am running under Windows without a compiler.

 --
 Robert Gentleman, PhD
 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M2-B876
 PO Box 19024
 Seattle, Washington 98109-1024
 206-667-7700
 [EMAIL PROTECTED]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] proposed modifications to deprecated

2006-04-27 Thread Robert Gentleman
Hi,
   Over the past six months we have had a few problems with deprecation 
and Seth Falcon and I want to propose a few additions to the mechanism 
that will help deal with cases other than the deprecation of functions.

  In the last release one of the arguments to La.svd was deprecated, but 
the warning message was very unclear and suggested that in fact La.svd 
was deprecated.
   Adding a third argument to .Deprecated, msg say (to be consistent 
with the internal naming mechanism) that contains the message string 
would allow for handling the La.svd issue in a more informative way. It 
is a strict addition so no existing code is likely to be broken.

   We also need to deprecate data from time to time. Since the field of 
genomics is moving fast as good example from five years ago is often no 
longer a good example today. This one is a bit harder, but we can modify
   tools:::.make_file_exts(data)

   to first look for a .DEP extension (this does not seem to be a 
widely used extension), and if such a file exists, ie NameofData.DEP
  one of two things happens: if it contains a character string we use 
that for the message (we could source it for the message?), if not print 
a standard message (just as .Deprecated does) and then continue with the 
search using the other file extensions.

   Defunct could be handled similarly.

  Comments, alternative suggestions?

  thanks
Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-21 Thread Robert Gentleman


Kurt Hornik wrote:
Simon Urbanek writes:
 
 
On Apr 20, 2006, at 1:23 PM, Henrik Bengtsson (max 7Mb) wrote:

Is it a general consensus on R-devel that *.tar.gz distributions  
should only be treated as a distribution for *building* packages  
and not for developing them?
 
 
 [Actually, distributing so that they can be installed and used.]
 
 
I don't know whether this is a general consensus, but it definitely  
an important distinction. Some authors put their own Makefiles in src  
although they are not needed and in fact harmful, preventing the  
package to build on other systems - only because they are too lazy to  
use R building mechanism for development and don't make the above  
distinction.
 
 
 Right :-)
 
 Henrik, as I think I mentioned the last time you asked about this: of
 course you can basically do everything you want.  But it comes at a
 price.  For external sources, you need to write a Makefile of your own,
 so as to make it clear that you provide a mechanism which is different
 from the standard one.  And, as Simon said, the gain in flexibility
 comes at a price.
 
 Personally and as one of the CRAN maintainers, I'd be very unhappy if
 package maintainers would start flooding their source .tar.gz packages
 with full development environment material.  (I am also rather unhappy
 about shipping large data sets which are only used for instructional
 purposes [rather than providing the data set on its own].)  It is
 simply not true that bandwidth does not matter.


   I can see the problem with large packages, but the current system 
does nothing about that AFAIC. And as Simon indicated, his biggest 
problem is the one set of files that we are allowed - so the argument is 
that the current approach is neither necessary nor sufficient and it 
imposes a structure on people that seems to be unneccearily restrictive. 
I don't see how excluding README (or any thing else that a package 
maintainer has put there) makes life better, but maybe I am missing 
something here. These are precisely the sorts of things that have helped 
me to figure out what was intended when it didn't work. So this approach 
is regressive, IMHO.

  If the size is not large, who cares what is in a package, and things 
releated to source should be in src. I see that a similar approach is 
being taken with the R directory (and probably other directories).  This 
is, in my opinion, unfortunate, imposing restrictions that don't solve 
the problem mentioned in some general way are not useful.

  For BioC, we manually check the size etc and ask people to reduce and 
remove. You could easily do the same at CRAN (and even automate it). 
BioC packages can be enormous relative to those on CRAN and I don't 
think we have ever had a serious complaint about it. But then the data 
sets tend to be large, so maybe people are just more forgiving.

  As for the difference between source packages and built packages, yes 
it would be nice at some time to enter into a discussion on that topic. 
There are lots of things that can be done at build time (that are not 
currently being done) that would speed up package installation etc. But 
they come at the price that Henrik has mentioned. The built package is 
no longer suitable for development. And hence we may usefully consider 
another format (something between source and binary, .Rgz?)

  best wishes
Robert


 
 If there is need, we could start having developer-package repositories.
 However, I'd prefer a different approach.  We're currently in the
 process of updating the CRAN server infrastructure, and should be able
 to start deploying an R-forge project hosting service eventually
 (hopefully, we can set things up during the summer).  This should
 provide us with an ideal infrastructure for sharing developer resources,
 in particular as we could add QC testing et al to the standard community
 services.
 
 Best
 -k
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-20 Thread Robert Gentleman
I disagree, things like README files and other objects are important and 
should be included. I don't see the real advantage to such warnings, if 
someone wants them they could be turned on optionally.

If size is an issue then authors should be warned that their package is 
large (in the top 1% at CRAN would be useful to some). I also find it 
helpful to know whose packages take forever to build, which we don't do.

Just because someone put something in TFM doesn't mean it is either a 
good idea or sensible, in my experience.

best wishes
   Robert


Prof Brian Ripley wrote:
 On Wed, 19 Apr 2006, James Bullard wrote:
 
 
Hello, I am having an issue with R CMD check with the nightly build of
RC 2.3.0 (listed in the subject.)
 
 
 This is all explained in TFM, `Writing R Extensions'.
 
 
The problem is this warning:

* checking if this is a source package ... WARNING
Subdirectory 'src' contains:
  README _Makefile
These are unlikely file names for src files.

In fact, they are not source files, but I do not see any reason why they
cannot be there, or why I need to be warned of their presence.
Potentially I could be informed of their presence, but that is another
matter.
 
 
 Having unnecessary files in other people's packages just waste space and 
 download bandwidth for each one of the users.
 
 
Now, I only get this warning when I do:

R CMD build affxparser
R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz

If I do:

R CMD check -l ~/R-packages affxparser

I do not get the warning. Is this inconsistent, or is there rationale
behind this? I think the warning is inappropriate, or at the least a
little restrictive. It seems as if I should be able to put whatever I
want in there, especially the _Makefile as I like to build test programs
directly and I want to be able to build exactly what I check out from
my source code repository without having to copy files in and out.
 
 
 All described in TFM, including how to set defaults for what is checked.
 
 
The output from R CMD check is below. Any insight would be appreciated.
As always thanks for your patience.
 
 
 [...]
 
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

2006-04-20 Thread Robert Gentleman
Hi,

  Well, I guess if someone thinks they know how I am going to configure 
and build the sources needed to construct appropriate dynamic libraries 
so well that they can feel free to exclude files at their whim at 
install time, perhaps they could feel just as free to exclude them at 
build time?

This makes no sense to me and certainly does not solve the size problem 
mentioned by Brian. If there is a single example of something that was 
better this way, I would be interested to hear it. I can think of 
several things that are worse.

best wishes
   Robert


Roger Bivand wrote:
 On Thu, 20 Apr 2006, Robert Gentleman wrote:
 
 
I disagree, things like README files and other objects are important and 
should be included. I don't see the real advantage to such warnings, if 
someone wants them they could be turned on optionally.
 
 
 Isn't the point at least partly that all those files are lost on 
 installation? If the README is to be accessible after installation, it can 
 be placed under inst/, so that both users reading the source and installed 
 versions can access it. So maybe the warning could be re-phrased to 
 suggest use of the inst/ tree for files with important content?
 
 Best wishes,
 
 Roger
 
 
If size is an issue then authors should be warned that their package is 
large (in the top 1% at CRAN would be useful to some). I also find it 
helpful to know whose packages take forever to build, which we don't do.

Just because someone put something in TFM doesn't mean it is either a 
good idea or sensible, in my experience.

best wishes
   Robert


Prof Brian Ripley wrote:

On Wed, 19 Apr 2006, James Bullard wrote:



Hello, I am having an issue with R CMD check with the nightly build of
RC 2.3.0 (listed in the subject.)


This is all explained in TFM, `Writing R Extensions'.



The problem is this warning:

* checking if this is a source package ... WARNING
Subdirectory 'src' contains:
 README _Makefile
These are unlikely file names for src files.

In fact, they are not source files, but I do not see any reason why they
cannot be there, or why I need to be warned of their presence.
Potentially I could be informed of their presence, but that is another
matter.


Having unnecessary files in other people's packages just waste space and 
download bandwidth for each one of the users.



Now, I only get this warning when I do:

R CMD build affxparser
R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz

If I do:

R CMD check -l ~/R-packages affxparser

I do not get the warning. Is this inconsistent, or is there rationale
behind this? I think the warning is inappropriate, or at the least a
little restrictive. It seems as if I should be able to put whatever I
want in there, especially the _Makefile as I like to build test programs
directly and I want to be able to build exactly what I check out from
my source code repository without having to copy files in and out.


All described in TFM, including how to set defaults for what is checked.



The output from R CMD check is below. Any insight would be appreciated.
As always thanks for your patience.


[...]




 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006-02-01 Thread Robert Gentleman
Should be patched in R-devel, will be available shortly

[EMAIL PROTECTED] wrote:
 Full_Name: Stefan Th. Gries
 Version: 2.2.1
 OS: Windows XP (Home and Professional)
 Submission from: (NULL) (68.6.34.104)
 
 
 The problem is this: I have a vector of two character strings.
 
 
text-c(This is a first example sentence., And this is a second example
 
  sentence.)
 
 If I now look for word boundaries with regexpr, this is what I get:
 
regexpr(\\b, text, perl=TRUE)
 
 [1] 1 1
 attr(,match.length)
 [1] 0 0
 
 So far, so good. But with gregexpr I get:
 
 
gregexpr(\\b, text, perl=TRUE)
 
 Error: cannot allocate vector of size 524288 Kb
 In addition: Warning messages:
 1: Reached total allocation of 1015Mb: see help(memory.size)
 2: Reached total allocation of 1015Mb: see help(memory.size)
 
 Why don't I get the locations and extensions of all word boundaries?
 
 I am using R 2.2.1 on a machine running Windows XP:
 
R.version
 
 _
 platform i386-pc-mingw32
 arch i386
 os   mingw32
 system   i386, mingw32
 status
 major2
 minor2.1
 year 2005
 month12
 day  20
 svn rev  36812
 language R
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] clarification of library/require semantics

2005-11-04 Thread Robert Gentleman
Recently I have added a lib.loc argument to require, so that
it is more consistent with library. However, there are some oddities 
that folks have pointed out, and we do not have a documented description 
of the semantics for what should happen when the lib.loc parameter is 
provided.

   Proposal: the most common use case seems to be one where any other 
dependencies, or calls to library/require should also see the library 
specified in the lib.loc parameter for the duration of the initial call 
to library. Hence, we should modify the library search path for the 
duration of the call (via .libPaths).

  The alternative, is to not do that. Which is what happens now.

  Both have costs, automatically setting the library search path, of 
course, means that users that do not want that behavior have to manually 
remove things from their library. But if almost no one does that, and 
most folks I have asked have said they want the lib.loc parameter to be 
used for other loading.

   Comments?

  Robert

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel