Re: [Rd] The default behaviour of a missing entry in an environment
Hi, On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 13/11/2009 7:26 PM, Gabor Grothendieck wrote: On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 13/11/2009 6:39 PM, Gabor Grothendieck wrote: Note that one should use inherits = FALSE argument on get and exists to avoid returning objects from the parent, the parent of the parent, etc. I disagree. Normally you would want to receive those objects. If you didn't, why didn't you set the parent of the environment to emptyenv() when you created it? $ does not look into the parent so if you are trying to get those semantics you must use inherits = FALSE. Whoops, yes. That's another complaint about $ on environments. That was an intentional choice. AFAIR neither $ nor [[ on environments was not meant to mimic get, but rather to work on the current environment as if it were a hash-like object. One can always get the inherits semantics by simple programming, but under the model you seem to be suggesting, preventing such behavior when you don't own the environments in question is problematic. Robert Duncan Murdoch x - 3 e - new.env() x %in% names(e) [1] FALSE get(x, e) # oops [1] 3 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman rgent...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Non-GPL packages for R
Hi, Peter Dalgaard wrote: Prof. John C Nash wrote: The responses to my posting yesterday seem to indicate more consensus than I expected: Umm, I had thought that it was well established that responders need not represent the population being surveyed. I doubt that there is consensus at the level you are suggesting (certainly I don't agree) and as Peter indicates below the issue is: what is maintainable with the resources we have, not what is the best solution given unlimited resources. Personally, I would like to see something that was a bit easier to deal with programmatically that indicated when a package was GPL (or Open source actually) compatible and when it is not. This could then be used to write a decent function to identify suspect packages so that users would know when they should be concerned. It is also the case that things are not so simple, as dependencies can make a package unusable even if it is itself GPL-compatible. This also makes the notion of some simple split into free and non-free (or what ever split you want) less trivial than is being suggested. Robert 1) CRAN should be restricted to GPL-equivalent licensed packages GPL-_compatible_ would be the word. However, this is not what has been done in the past. There are packages with non-commercial use licences, and the survival package was among them for quite a while. As far as I know, the CRAN policy has been to ensure only that redistribution is legal and that whatever license is used is visible to the user. People who have responded on the list do not necessarily speak for CRAN. In the final analysis, the maintainers must decide what is maintainable. The problem with Rdonlp2 seems to have been that the interface packages claimed to be LGPL2 without the main copyright holder's consent (and it seems that he cannot grant consent for reasons of TU-Darmstadt policies). It is hard to safeguard agaist that sort of thing. CRAN maintainers must assume that legalities have been cleared and accept the license in good faith. (Even within the Free Software world there are current issues with, e.g., incompatibilities between GPL v.2 and v.3, and also with the Eclipse license. Don't get me started...) 2) r-forge could be left buyer beware using DESCRIPTION information 3) We may want a specific repository for restricted packages (RANC?) How to proceed? A short search on Rseek did not turn up a chain of command for CRAN. I'm prepared to help out with documentation etc. to move changes forward. They are not, in my opinion, likely to cause a lot of trouble for most users, and should simplify things over time. JN __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] debug
Hi, I just committed a change to R-devel so that if debug is called on an S3 generic function, all methods will also automatically have debug turned on for them (if they are dispatched to from the generic). I hope to be able to extend this to S4 and a few other cases that are currently not being handled over the the next few weeks. Please let me know if you have problems, or suggested improvements. Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgent...@fhcrc.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tabulate can accept NA values?
should be in devel now, NAs are ignored (as are non-integers and things outside the nbin argument) Martin Morgan wrote: tabulate has .C(R_tabulate, as.integer(bin), as.integer(length(bin)), as.integer(nbins), ans = integer(nbins), PACKAGE=base)$ans The implementation of R_tabulate has if(x[i] != R_NaInt x[i] 0 x[i] = *nbin) and so copes with (silently drops) NA. Perhaps the .C could have NAOK=TRUE? This is useful in apply'ing tabulate to the rows or columns of a (large) matrix, where the work-around involves introducing some artificial NA value (and consequently copying the matrix) outside the range of tabulate's nbin argument. Martin -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgent...@fhcrc.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] step by step debugger in R?
Hi, I stripped the cc's as I believe that all read this list. Romain Francois wrote: [moving this to r-devel] Robert Gentleman wrote: Hi, Romain Francois wrote: Duncan Murdoch wrote: On 5/22/2009 10:59 AM, Michael wrote: Really I think if there is a Visual Studio strength debugger, our collective time spent in developing R code will be greatly reduced. If someone who knows how to write a debugger plugin for Eclipse wants to help, we could have that fairly easily. All the infrastructure is there; it's the UI part that's missing. Duncan Murdoch [I've copied Mark Bravington and Robert Gentleman to the list as they are likely to have views here, and I am not sure they monitor R-help] Hello, Making a front-end to debugging was one of the proposed google summer of code for this year [1], it was not retained eventually, but I am still motivated. Pretty much all infrastructure is there, and some work has been done __very recently__ in R's debugging internals (ability to step up). As I see it, the ability to call some sort of hook each time the debugger waits for input would make it much easier for someone to write I have still not come to an understanding of what this is supposed to do? When you have the browser prompt you can call any function or code you want to. There is no need for something special to allow you to do that. Sure. What I have in mind is something that gets __automatically__ called, similar to the task callback but happening right before the user is given the browser prompt. I am trying to understand the scenario you have in mind. Is it that the user is running R directly and your debugger is essentially a helper function that gets updated etc as R runs? If so, then I don't think that works very well and given the constraints we have with R I don't think it will be able to solve many of the problems that an IDE should. The hook you want will give you some functionality, but no where near enough. Let me suggest instead that the IDE should be running the show. It should initialize an instance of R, but it controls all communication and hence controls what is rendered on the client side. If that is what you mean by embedding R, then yes that is what is needed. There is no way that I can see to support most of the things that IDE type debuggers support without the IDE controlling the communication with R. And if I am wrong about what your debugger will look like please let me know. best wishes Robert front-ends. A recent post of mine (patch included) [2] on R-devel suggested a custom prompt for browser which would do the trick, but I now think that a hook would be more appropriate. Without something similar to that, there is no way that I know of for making a front-end, unless maybe if you embed R ... (please let me know how I am wrong) I think you are wrong. I can't see why it is needed. The external debugger has lots of options for handling debugging. It can rewrite code (see examples in trace for how John Chambers has done this to support tracing at a location), which is AFAIK a pretty standard approach to writing debuggers. It can figure out where the break point is (made a bit easier by allowing it to put in pieces of text in the call to browser). These are things the internal debugger can't do. Thanks. I'll have another look into that. There is also the debug package [3,4] which does __not__ work with R internals but rather works with instrumenting tricks at the R level. debug provides a tcl/tk front-end. It is my understanding that it does not work using R internals (do_browser, ...) because it was not possible at the time, and I believe this is still not possible today, but I might be wrong. I'd prefer to be wrong actually. I don't understand this statement. It has always been possible to work with the internal version - but one can also take the approach of rewriting code. There are some difficulties supporting all the operations that one would like by rewriting code and I think a combination of external controls and the internal debugger will get most of the functionality that anyone wants. There are somethings that are hard and once I have a more complete list I will be adding this to the appropriate manual. I will also be documenting the changes that I have been making, but that project is in flux and won't be done until the end of August, so people who want to look at it are welcome (it is in R-devel), but it is in development and could change pretty much without notice. Romain noted that we now support stepping out from one place to another function. We also have a debugonce flag that lets you get close to step in, but step in is very hard in R. I am mostly interested in writing tools in R that can be used by anyone that wants to write an external debugger and am not that interested in any
Re: [Rd] Can a function know what other function called it?
Hi Kynn, Kynn Jones wrote: Suppose function foo calls function bar. Is there any way in which bar can find out the name of the function that called it, foo? essentially yes. You can find out about the call stack by using sys.calls and sys.parents etc. The man page plus additional manuals should be sufficient, but let us know if there are things that are not clear. There are two generalization to this question that interest me. First, can this query go farther up the call stack? I.e. if bar now calls baz, can baz find out the name of the function that called the function that called it, i.e. foo? Second, what other information, yes - you can (at least currently) get access to the entire calling stack and some manipulations can be performed. beside its name, can bar find about the environment where it was called? E.g. can it find out the file name and line number of the there is no real concept of file and line number associated with a function definition (nor need their even be a name - functions can be anonymous). If you want to map back to source files then I think that currently we do not keep quite enough information when a function is sourced. Others may be able to elaborate more (or correct my mistakes). I think we currently store the actual text for the body of the function so that it can be used for printing, but we don't store a file name/location/line number or anything of that sort. It could probably be added, but would be a lot of work, so it would need someone who really wanted it to do that. However, you can find out lots of other things if you want. Do note that while it is possible to determine which function initiated the call, it is not necessarily possible to figure out which of the calls (if there is more than one in the body of the function) is active. R does not keep track of things in that way. To be clear if foo looks like: foo - function(x) { bar(x) x = sqrt(x) bar(x) } and you have a breakpoint in bar, you could not (easily) distinguish which of the two calls to bar was active. There is no line counter or anything of that sort available. best wishes Robert function call? Thanks! __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgent...@fhcrc.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] a statement about package licenses
We are writing on behalf of the R Foundation, to clarify our position on the licenses under which developers may distribute R packages. Readers should also see FAQ 2.11: this message is not legal advice, which we never offer. Readers should also be aware that besides the R Foundation, R has many other copyright holders, listed in the copyright notices in the source. Each of those copyright holders may have a different opinion on the issues discussed here. We welcome packages that extend the capabilities of R, and believe that their value to the community is increased if they can be offered with open-source licenses. At the same time, we have no desire to discourage other license forms that developers feel are required. Of course, such licenses as well as the contents of the package and the way in which it is distributed must respect the rights of the copyright holders and the terms of the R license. When we think that a package is in violation of these rights, we contact the author directly, and so far package authors have always agreed to comply with our license (or convinced us that they are already in compliance). We have no desire to be involved in legal actions---our interest is in providing good software. However, everyone should understand that there are conceivable circumstances in which we would be obliged to take action. Our experience to date and the assurances of some fine commercial developers make us optimistic that these circumstances will not arise. The R Foundation __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages and dependency version checking
Hi, Prof Brian Ripley wrote: I've started to implement checks for package versions on dependencies in install.packages(). However, this is revealing a number of problems/misconceptions. (A) We do not check versions when loading namespaces, ahd the namespace registry does not contain version information. So that for example (rtracklayer) Depends: R (= 2.7.0), Biobase, methods, RCurl Imports: XML (= 1.98-0), IRanges, Biostrings will never check the version of namespace XML that is loaded, either already loaded or resulting from loading this package's namespace. For this to be operational we would need to extend the syntax of the imports() and importsFrom() directive in a NAMESPACE file to allow version restrictions. I am not sure this is worth doing, as an alternative is to put the imported package in Depends. The version dependence will in a future release cause an update of XML when rtracklayer is installed, if needed (and available). I think we need to have this functionality in both Imports and Depends, see my response to another point for why. (B) Things like (package stam) Depends: R (= 2.7.0), GO.db (= 2.1.3), Biobase (= 1.99.5), pamr (= 1.37.0), cluster (= 1.11.10), annaffy (= 1.11.5), methods (= 2.7.0), utils (= 2.7.0) are redundant: the versions of method and utils are always the same as that of R. And there is no point in having a package in both Depends: and Imports:, as Biostrings has. I don't think that is true. There are cases where both Imports and Depends are reasonable. The purpose of importing is to ensure correct resolution of symbols in the internal functions of a package. I would do that in almost all cases. In some instances I want users to see functionality from another package - and I can then either a) (re)export those functions, or if there are lots of them, then b) just put the package also in Depends. Now, a) is a bit less useful than it could be since R CMD check gets annoyed about these re-exported functions (I don't think it should care, the man page exists and is findable). (C) There is no check on the version of a package suggested by Suggests:, unless the package itself provides one (and I found no instances). It may be worthwhile, but this is a less frequent use case and I would prioritize it lower than having that functionality in Imports. (D) We can really only handle = dependencies on package versions (but then I can see no other ops in use). install.packages() will find the latest version available on the repositories, and we possibly need to check version requirements on the same dependency many times. Given that BioC has a penchant for having version dependencies on unavailable versions (e.g. last week on IRanges (= 1.1.7) with 1.1.4 available), we may be able to satisfy the requirements of some packages and not others. (In that case the strategy used is to install the latest available version if the one installed does not suffice for those we can satisfy, and report the problem(s).) I suspect one needs = (basically as Gabor pointed out, some packages have issues). (E) One of the arguments that has been used to do this version checking at install time is to avoid installing packages that cannot work. It would be possible to extend the approach to do so, but I am going to leave that to those who advocated it. The net effect of the current changes will be that if there is a dependence that is already installed but a later version is available and will help satisfy a = dependence, it will be added to the list of packages to be installed. As we have seen with Matrix this last week, that can have downsides in stopping previously functional packages working. This is work in progress: there is no way to write a test suite that will encapsulate all the possible scenarios so weneed to get experience until 2.9.0 is released. Please report any quirks to R-devel if they are completely reproducible (and preferably with the code change needed to fix them, since the chance of anyone else being able to reproduce them are fairly slim). thanks Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgent...@fhcrc.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] wish: exportClassPattern
should be in the most recent devel, Prof Brian Ripley wrote: Michael, This seems a reasonable request, but most of us will not have enough classes in a package for it to make a difference. A patch to do this would certainly speed up implementation. On Fri, 21 Nov 2008, Michael Lawrence wrote: It would be nice to have a more convenient means of exporting multiple classes from a package namespace. Why not have something like exportClassPattern() that worked like exportPattern() except for classes? Thanks, Michael [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion for the optimization code
Duncan Murdoch wrote: On 8/8/2008 8:56 AM, Mathieu Ribatet wrote: Dear list, Here's a suggestion about the different optimization code. There are several optimization procedures in the base package (optim, optimize, nlm, nlminb, ..). However, the output of these functions are slightly different. For instance, 1. optim returns a list with arguments par (the estimates), value the minimum (maxima) of the objective function, convergence (optim .convergence) 2. optimize returns a list with arguments minimum (or maximum) giving the estimates, objective the value of the obj. function 3. nlm returns a list with arguments minimum giving the minimum of the obj. function, minimum the estimates, code the optim. convergence 4. nlminb returns a list with arguments par (the estimates), objective, convergence (conv. code), evaluations Furthermore, optim keeps the names of the parameters while nlm, nlminb don't. s I believe it would be nice if all these optimizers have a kind of homogenized output. This will help in writing functions that can call different optimizers. Obviously, we can write our own function that homogenized the output after calling the optimizer, but I still believe this will be more user-friendly. Unfortunately, changing the names within the return value would break a lot of existing uses of those functions. Writing a wrapper to homogenize the output is probably the right thing to do. And potentially to harmonize inputs. The MLInterfaces package (Bioconductor) has done this for many machine learning algorithms, should you want an example to look at. Robert Duncan Murdoch Do you think this is a reasonable feature to implement - despite it isn't an important point? Best, Mathieu * BTW, if this is relevant, I could try to do it. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: What should ?foo do?
Duncan Murdoch wrote: Currently ?foo does help(foo), which looks for a man page with alias foo. If foo happens to be a function call, it will do a bit more, so ?mean(something) will find the mean method for something if mean happens to be an S4 generic. There are also the type?foo variations, e.g. methods?foo, or package?foo. I think these are all too limited. The easiest search should be the most permissive. Users should need to do extra work to limit their search to man pages, with exact matches, as ? does. While I like the idea, I don't really agree with the sentiment above. I think that the easiest search should be the one that you want the result of most often. And at least for me that is the man page for the function, so I can check some detail; and it works pretty well. I use site searches much less frequently and would be happy to type more for them. We don't currently have a general purpose search for foo, or something like it. We come close with RSiteSearch, and so possibly ?foo should mean RSiteSearch(foo), but there are problems with that: it can't limit itself to the current version of R, and it doesn't work when you're offline (or when search.r-project.org is down.) We also have help.search(foo), but it is too limited. I'd like to have a local search that looks through the man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., specific to the current R installation, and I think ? should be attached to that search. I think that would be very useful (although there will be some decisions on which tool to use to achieve this). But, it will also be problematic, as one will get tons of hits for some things, and then selecting the one you really want will be a pain. I would rather see that be one of the dyadic forms, say site?foo or all?foo one could even imagine refining that for different subsets of the docs you have mentioned; help?foo #only man pages guides?foo #the manuals, R Extensions etc and so on. You did not, make a suggestion as to how we would get the equivalent of ?foo now, if a decision to move were taken. Comments, please. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: What should ?foo do?
Duncan Murdoch wrote: On 4/25/2008 10:16 AM, Robert Gentleman wrote: Duncan Murdoch wrote: Currently ?foo does help(foo), which looks for a man page with alias foo. If foo happens to be a function call, it will do a bit more, so ?mean(something) will find the mean method for something if mean happens to be an S4 generic. There are also the type?foo variations, e.g. methods?foo, or package?foo. I think these are all too limited. The easiest search should be the most permissive. Users should need to do extra work to limit their search to man pages, with exact matches, as ? does. While I like the idea, I don't really agree with the sentiment above. I think that the easiest search should be the one that you want the result of most often. And at least for me that is the man page for the function, so I can check some detail; and it works pretty well. I use site searches much less frequently and would be happy to type more for them. That's true. What's your feeling about what should happen when ?foo fails? present of list of man pages with spellings close to foo (we have the tools to do this in many places right now, and it would be a great help, IMHO, as spellings and capitalization behavior varies both between and within individuals), so the user can select one We don't currently have a general purpose search for foo, or something like it. We come close with RSiteSearch, and so possibly ?foo should mean RSiteSearch(foo), but there are problems with that: it can't limit itself to the current version of R, and it doesn't work when you're offline (or when search.r-project.org is down.) We also have help.search(foo), but it is too limited. I'd like to have a local search that looks through the man pages, manuals, FAQs, vignettes, DESCRIPTION files, etc., specific to the current R installation, and I think ? should be attached to that search. I think that would be very useful (although there will be some decisions on which tool to use to achieve this). But, it will also be problematic, as one will get tons of hits for some things, and then selecting the one you really want will be a pain. I would rather see that be one of the dyadic forms, say site?foo or all?foo one could even imagine refining that for different subsets of the docs you have mentioned; help?foo #only man pages guides?foo #the manuals, R Extensions etc and so on. You did not, make a suggestion as to how we would get the equivalent of ?foo now, if a decision to move were taken. I didn't say, but I would assume there would be a way to do it, and it shouldn't be hard to invoke. Maybe help?foo as you suggested, or man?foo. If not then I would be strongly opposed -- I really think we want to make the most common thing the easiest to do. And if we really think that might be different for different people, then disambiguate the short-cut, ? in this case, from the command so that users have some freedom to customize, would be my favored alternative. I also wonder if one could not also provide some mechanism to provide distinct information on what is local vs what is on the internet. Something that would make tools like spotlight much more valuable, IMHO, is to tell me what I have on my computer, and what I can get, if I want to; at least as some form of option. Robert Duncan Murdoch -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check should check date in description
Kurt Hornik wrote: hadley wickham writes: I recently thought about this. I see several issues. * How can we determine if it is old? Relative to the time when the package was uploaded to a repository? * Some developers might actually want a different date for a variety of reasons ... * What we currently say in R-exts is The optional `Date' field gives the release date of the current version of the package. It is strongly recommended to use the -mm-dd format conforming to the ISO standard. Many packages do not comply with the latter (but I have some code to sanitize most of these), and release date may be a moving target. The best that I could think of is to teach R CMD build to *add* a Date field if there was none. That sounds like a good solution to me. Ok. However, 2.7.0 feature freeze soon ... Please no. If people want one then they should add it manually. It is optional, and some of us have explicitly opted out and would like to continue to do so. Otherwise, maybe just a message from R CMD check? i.e. just like failing the codetools checks, it might be perfectly ok, but you should be doing it consciously, not by mistake. I am working on that, too (e.g. a simple NOTE in case the date spec cannot be canonicalized, etc.). If file time stamps were realiable, we could compare these to the given date. This is I guess all we can do for e.g. CRAN's daily checking (where comparing to the date the check is run is not too useful) ... But definitely not a warning. Robert Best -k __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check should check date in description
hadley wickham wrote: Please no. If people want one then they should add it manually. It is optional, and some of us have explicitly opted out and would like to continue to do so. To clarify, do you mean you have decided not to provide a date field in the DESCRIPTION file? If so, would you mind elaborating why? Sure: The date of what? Hadley -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem with new(externalptr)
Hi, Herve Pages wrote: Hi, It seems that new(externalptr) is always returning the same instance, and not a new one as one would expect from a call to new(). Of course this is hard to observe: new(externalptr) pointer: (nil) new(externalptr) pointer: (nil) since not a lot of details are displayed. For example, it's easy to see that 2 consecutive calls to new(environment) create different instances: new(environment) environment: 0xc89d10 new(environment) environment: 0xc51248 getMethod(initialize, environment) and getMethod(initialize, externalptr) will give some hints about the difference. But for new(externalptr), I had to use the following C routine: SEXP sexp_address(SEXP s) { SEXP ans; char buf[40]; snprintf(buf, sizeof(buf), %p, s); PROTECT(ans = NEW_CHARACTER(1)); SET_STRING_ELT(ans, 0, mkChar(buf)); UNPROTECT(1); return ans; } Then I get: .Call(sexp_address, new(externalptr)) [1] 0xde2ce0 .Call(sexp_address, new(externalptr)) [1] 0xde2ce0 Isn't that wrong? Not what you want, but not wrong. In the absence of an initialize method all calls to new are guaranteed to return the prototype; so I think it behaves as documented. new(environment) would also always return the same environment, were it not for the initialize method. So you might want to contribute an initialize method for externalptr, but as you said, they are not useful at the R level so I don't know just what problem is being solved. This piece of code might be useful in such a construction .Call(R_externalptr_prototype_object, PACKAGE = methods) which does what you would like. best wishes Robert I worked around this problem by writing the following C routine: SEXP xp_new() { return R_MakeExternalPtr(NULL, R_NilValue, R_NilValue); } so I can create new externalptr instances from R with: .Call(xp_new) I understand that there is not much you can do from R with an externalptr instance and that you will have to manipulate them at the C level anyway. But since new(externalptr) exists and seems to work, wouldn't that be better if it was really creating a new instance at each call? Thanks! H. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] course announcement
Hi, We will be holding an advanced course in R programming at the FHCRC (Seattle), Feb 13-15. There will be some emphasis on Bioinformatic applications, but not much. Sign up at: https://secure.bioconductor.org/SeattleFeb08/index.php please note space is very limited so make sure you have a registration before making any travel plans. Also, this is definitely not a course for beginners. Best wishes Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Evaluating R expressions from C
Hi Terry, Terry Therneau wrote: I am currently puzzled by a passage in the R Extensions manual, section 5.10: SEXP lapply(SEXP list, SEXP expr, SEXP rho) { R_len_t i, n = length(list); SEXP ans; if(!isNewList(list)) error(`list' must be a list); if(!isEnvironment(rho)) error(`rho' should be an environment); PROTECT(ans = allocVector(VECSXP, n)); for(i = 0; i n; i++) { defineVar(install(x), VECTOR_ELT(list, i), rho); SET_VECTOR_ELT(ans, i, eval(expr, rho)); } I'm trying to understand this code beyond just copying it, and don't find definitions for many of the calls. PROTECT and SEXP have been well discussed previously in the document, but what exactly are R_len_t defineVar this function defines the variable (SYMSXP; one type of SEXP) of its first argument, to have the value given by its second argument, in the environment defined by its third argument. There are lots of variants, these are largely in envir.c install all symbols in R are unique (there is only one symbol named x, even though it might have bindings in many different environments). So to get the unique thing (a SYMSXP) you call install (line 1067 in names.c has a pretty brief comment to this effect). This makes it efficient to do variable look up, as we only need to compare pointers (within an environment), not compare names. VECTOR_ELT access the indicated element (2nd arg) of the vector (first arg) SET_VECTOR_ELT set the indicated element (2nd arg), of the vector (1st arg) to the value (3rd arg) The last I also found in 5.7.4, but it's not defined there either. So: What do these macros do? Some I could guess, like is.Environment; and I'm fairly confident of R_len_t. Others I need some help. Perhaps they are elswhere in the document? (My version of acrobat can't do searches.) Is there another document that I should look at first? Why isNewList? I would have guessed isList. What's the difference? old lists are of the CAR-CDR variant, and largely only used internally these days. new lists, are generic vectors, and are what users will almost always encounter (even users that program internals, you pretty much need to be messing with the language itself to run into the CAR-CDR variety). best wishes Robert Thanks for any help, Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages() and configure.args
since, in Herve's example only one package was named, it would be nice to either, make sure configure args are associated with it, or to force only named configure.args parameters, and possibly check the names? Duncan Temple Lang wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Herve The best way to specify configure.args when there are multiple packages (either directly or via dependencies) is to use names on the character vector, i.e. install.packages(Rgraphviz, rep=http://bioconductor.org/packages/2.1/bioc;, configure.args=c(Rgraphviz=--with-graphviz=/some/non/standard/place)) This allows one to specify command line arguments for many packages simultaneously and unambiguously. install.packages() only uses configure.args when there are no names if there is only one package being installed. It could be made smarter to apply this to the first of the pkgs only, or to identify the packages as direct and dependent. But it is not obvious it is worth the effort as using names on configure.args provides a complete solution and is more informative. Thanks for pointing this out. D. Herve Pages wrote: Hi, In the case where install.packages(packageA) also needs to install required package packageB, then what is passed thru the 'configure.args' argument seems to be lost when it's the turn of packageA to be installed (the last package to get installed). This is not easy to reproduce but let's say you have the graphviz libraries installed on your system, but you don't have the graph package installed yet. Then this install.packages(Rgraphviz, rep=http://bioconductor.org/packages/2.1/bioc;, configure.args=--with-graphviz=/some/non/standard/place) will fail because --with-graphviz=/some/non/standard/place doesn't seem to be passed to Rgraphviz's configure script. But if you already have the graph package, then it will work. Cheers, H. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHGnBg9p/Jzwa2QP4RAp0NAJ9Qe/thxdrX8CpFVcRP2UoHk1txFACeL9uM twmID5hsclilHhIfPsuFt7A= =vCz1 -END PGP SIGNATURE- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] gregexpr (PR#9965)
Yes, we had originally wanted it to find all matches, but user complaints that it did not perform as Perl does were taken to prevail. There are different ways to do this, but it seems the notion that one not start looking for the next match until after the previous one is more common. I did consciously decide not to have a switch, and instead we wrote something that does what we wanted it to do and put it in the Biostrings package (from Bioconductor) as geregexpr2 (sorry but only fixed = TRUE is supported, since that is all we needed). best wishes Robert Prof Brian Ripley wrote: This was a deliberate change for R 2.4.0 with SVN log: r38145 | rgentlem | 2006-05-20 23:58:14 +0100 (Sat, 20 May 2006) | 2 lines fixing gregexpr infelicity So it seems the author of gregexpr believed that the bug was in 2.3.1, not 2.5.1. On Wed, 10 Oct 2007, [EMAIL PROTECTED] wrote: Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: gregexpr(abab,ababab) [[1]] [1] 1 attr(,match.length) [1] 4 It does work correctly in Version 2.3.1 under linux. 'correctly' is a matter of definition, I believe: this could be considered to be vaguely worded in the help. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'load' does not properly add 'show' methods for classes extending 'list'
I think that it would be best then, not to load the package, as loading it in this way, means that it is almost impossible to get the methods registered correctly. That does seem to be a bug, or at least a major inconvenience. And one might wonder at the purpose of attaching if not to make methods available. That said the documentation, indeed does not state that anything good will happen. It also does not state that something bad will happen either. best wishes Robert Prof Brian Ripley wrote: I am not sure why you expected this to work: I did not expect it to and could not find relevant documentation to suggest it should. Loading an object created from a non-attached package does not in general attach that package and make the methods for the class of 'x' available. We have talked about attaching the package defining the class when an S4 object is loaded, and that is probably possible now S4 objects can be unambiguously distinguished (although I still worry about multiple packages with the same generic and their order on the search path). In your example there is no specific 'show' method on the search path when 'show' is called via autoprinting in the second session, so 'showDefault' is called. Package GSEABase gets attached as an (undocumented) side effect of calling 'getClassDef' from 'showDefault'. I can see no documentation (and in particular not in ?showDefault) that 'showDefault' is supposed to attach the package defining the class and re-dispatch to a 'show' method that package contains. Since attaching packages behind the user's back can have nasty side effects (the order of the search path does mattter), I think the pros and cons need careful consideration: a warning along the lines of 'object 'x' is of class GeneSetCollection from package 'GSEABase' which is not on the search path might be more appropriate. Things would potentially be a lot smoother if namespaces could be assumed, as loading a namespace has few side effects (and if loading a namespace registered methods for visible S4 generics smoothly). Until I see documentation otherwise, I will continue to assume that I do need to attach the class-defining package(s) for things to work correctly. On Mon, 24 Sep 2007, Martin Morgan wrote: The GeneSetCollection class in the Bioconductor package GSEABase extends 'list' library(GSEABase) showClass(GeneSetCollection) Slots: Name: .Data Class: list Extends: Class list, from data part Class vector, by class list, distance 2 Class AssayData, by class list, distance 2 If I create an instance of this class and serialize it x - GeneSetCollection(GeneSet(X)) x GeneSetCollection names: NA (1 total) save(x, file=/tmp/x.rda) and then start a new R session and load the data object (without first library(GSEABase)), the 'show' method is not added to the appropriate method table. load(/tmp/x.Rda) x Loading required package: GSEABase Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation(Biobase)' and for packages 'citation(pkgname)'. Loading required package: AnnotationDbi Loading required package: DBI Loading required package: RSQLite An object of class GeneSetCollection [[1]] setName: NA geneIds: X (total: 1) geneIdType: Null collectionType: Null details: use 'details(object)' Actually, the behavior is more complicate than appears; in a new R session after loading /tmp/x.Rda, if I immediately do x[[1]] I get the show,GeneSetCollection-method but not show,GeneSet-method. Sorry for the somewhat obscure example. Martin -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Jobs in Seattle
Hi, As many of you will realize, Seth is going to be leaving us (pretty much immediately). So we will be looking to replace him. In addition, Martin Morgan is going to be moving into another role as well, one that will require an assistant. In addition, I am looking for at least one post-doc (preferably with an interest in sequence related work). If any of these interest you, please check out the job descriptions at: http://www.fhcrc.org/about/jobs/ and you can get some idea of salary level as well You can feel free to ask me about either the lead programmer job, or the post doc and should probably direct questions about the bioinformatics position to Martin. All applications must go through the FHCRC web site. thanks Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] HTML vignette browser
Deepayan Sarkar wrote: On 6/4/07, Seth Falcon [EMAIL PROTECTED] wrote: Friedrich Leisch [EMAIL PROTECTED] writes: Looks good to me, and certainly something worth being added to R. 2 quick (related) comments: 1) I am not sure if we want to include links to the Latex-Sources by default, those might confuse unsuspecting novices a lot. Perhaps make those optional using an argument to browseVignettes(), which is FALSE by default? I agree that the Rnw could confuse folks. But I'm not sure it needs to be hidden or turned off by default... If the .R file was also included then it would be less confusing I suspect as the curious could deduce what Rnw is about by triangulation. 2) Instead links to .Rnw files we may want to include links to the R code - should we R CMD INSTALL a tangled version of each vignette such that we can link to it? Of course it is redundant information given the .Rnw, but we also have the help pages in several formats ready. Including, by default, links to the tangled .R code seems like a really nice idea. I think a lot of users who find vignettes don't realize that all of the code used to generate the entire document is available to them -- I just had a question from someone who wanted to know how to make a plot that appeared in a vignette, for example. I agree that having a Stangled .R file would be a great idea (among other things, it would have the complete code, which many PDFs will not). I don't have a strong opinion either way about linking to the .Rnw file. It should definitely be there if the PDF file is absent (e.g. for grid, and other packages installed with --no-vignettes, which I always do for local installation). Maybe we can keep them, but change the name to something more scary than source, e.g. LaTeX/Noweb source. I would very much prefer to keep the source, with some name, scary or not... -Deepayan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Calling R_PolledEvents from R_CheckUserInterrupt
should be there shortly - I have no way of testing windows (right now, at least), so hopefully Duncan M will have time to take a look Deepayan Sarkar wrote: On 5/5/07, Luke Tierney [EMAIL PROTECTED] wrote: [...] However, R_PolledEvents is only called from a limited set of places now (including the socket reading code to keep things responsive during blocking reads). But it is not called from the interupt checking code, which means if a user does something equivalent to while (TRUE) {} there is not point where events get looked at to see a user interrupt action. The current definition of R_CheckUserInterrupt is void R_CheckUserInterrupt(void) { R_CheckStack(); /* This is the point where GUI systems need to do enough event processing to determine whether there is a user interrupt event pending. Need to be careful not to do too much event processing though: if event handlers written in R are allowed to run at this point then we end up with concurrent R evaluations and that can cause problems until we have proper concurrency support. LT */ #if ( defined(HAVE_AQUA) || defined(Win32) ) R_ProcessEvents(); #else if (R_interrupts_pending) onintr(); #endif /* Win32 */ } So only on Windows or Mac do we do event processing. We could add a R_PolledEvents() call in the #else bit to support this, though the cautions in the comment do need to be kept in mind. I have been using the following patch to src/main/errors.c for a while without any obvious ill effects. Could we add this to r-devel (with necessary changes for Windows, if any)? -Deepayan Index: errors.c === --- errors.c(revision 41764) +++ errors.c(working copy) @@ -39,6 +39,8 @@ #include R_ext/GraphicsEngine.h /* for GEonExit */ #include Rmath.h /* for imax2 */ +#include R_ext/eventloop.h + #ifndef min #define min(a, b) (ab?a:b) #endif @@ -117,6 +119,8 @@ #if ( defined(HAVE_AQUA) || defined(Win32) ) R_ProcessEvents(); #else +R_PolledEvents(); if (R_interrupts_pending) onintr(); #endif /* Win32 */ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Native implementation of rowMedians()
We did think about this a lot, and decided it was better to have something like rowQ, which really returns requested order statistics, letting the user manipulate them on the return for their own version of median, or other quantiles, was a better approach. I would be happy to have this in R itself, if there is sufficient interest and we can remove the one in Biobase (without the need for deprecation/defunct as long as the args are compatible). But, if the decision is to return a particular estimate of a quantile, then we would probably want to keep our function around, with its current name. best wishes Robert Martin Maechler wrote: BDR == Prof Brian Ripley [EMAIL PROTECTED] on Mon, 14 May 2007 11:39:18 +0100 (BST) writes: BDR On Mon, 14 May 2007, Henrik Bengtsson wrote: On 5/14/07, Prof Brian Ripley [EMAIL PROTECTED] wrote: Hi Henrik, HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Sun, 13 May 2007 21:14:24 -0700 writes: HenrikB Hi, HenrikB I've got a version of rowMedians(x, na.rm=FALSE) for matrices that HenrikB handles missing values implemented in C. It has been BDR [...] Also, the 'a version of rowMedians' made me wonder what other version there was, and it seems there is one in Biobase which looks a more natural home. The rowMedians() in Biobase utilizes rowQ() in ditto. I actually started of by adding support for missing values to rowQ() resulting in the method rowQuantiles(), for which there are also internal functions for both integer and double matrices. rowQuantiles() is in R.native too, but since it has much less CPU milage I wanted to wait with that. The rowMedians() is developed from my rowQuantiles() optimized for the 50% quantile. Why do you think it is more natural to host rowMedians() in Biobase than in one of the core R packages? Biobase comes with a lot of overhead for people not in the Bio-world. BDR Because that is where there seems to be a need for it, and having multiple BDR functions of the same name in different packages is not ideal (and even BDR with namespaces can cause confusion). That's correct, of course. However, I still think that quantiles (and statistics derived from them) in general and medians in particular are under-used by many user groups. For some useRs, speed can be an important reason and for that I had made a big effort to provide runmed() in R, and I think it would be worthwhile to provide fast rowwise medians and quantiles, here as well. Also, BTW, I think it will be worthwhile to provide (R-C) API versions of median() and quantile() {with less options than the R functions, most probably!!}, such that we'd hopefully see less re-invention of the wheel happening in every package that needs such quantiles in its C code. Biobase is in quite active maintenance, and I'd assume its maintainers will remove rowMedians() from there (or first replace it with a wrapper in order to deal with the namespace issue you mentioned) as soon as R has its own function with the same (or better) functionality. In order to facilitate the transition, we'd have to make sure that such a 'stats' function does behave = to the bioBase one. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] One for the wish list - var.default etc
Jeffrey J. Hallman wrote: Prof Brian Ripley [EMAIL PROTECTED] writes: On Wed, 9 May 2007, S Ellison wrote: Brian, If we make functions generic, we rely on package writers implementing the documented semantics (and that is not easy to check). That was deemed to be too easy to get wrong for var(). Hard to argue with a considered decision, but the alternative facing increasing numbers of package developers seems to me to be pretty bad too ... There are two ways a package developer can currently get a function tailored to their own new class. One is to rely on a generic function to launch their class-specific instance, and write only the class-specific instance. That may indeed be hard to check, though I would be inclined to think that is the package developer's problem, not the core team's. But it has (as far as I know today ...?) no wider impact. But it does: it gives the method privileged access, in this case to the stats namespace, even allowing a user to change the default method which namespaces to a very large extent protect against. If var is not generic, we can be sure that all uses within the stats namespace and any namespace that imports it are of stats::var. That is not something to give up lightly. No, but neither is the flexibility afforded by generics. What we have here is a false tradeoff between flexibility vs. the safety of locking stuff down. Yes, that is precisely one of the points, and as some of us recently experienced, a reasonably dedicated programmer can over-ride any base function through an add-on package. It is, in my opinion a bad idea to become the police here. AFAIK, Brian's considered decision, was his, I am aware of no discussion of that particular point of view about var (and as noted above, it simply doesn't work), it also, AFAICS confuses what happens (implementation) from what should happen (which is easy to do, because with most of the methods, either S3 or S4 there is very little written about what should happen). That said, there has been some relatively open discussion on one solution to this problem, and I am hopeful that we will have something in place before the end of July. A big problem with S4 generics is who owns them, and what seems to be a reasonable medium term solution is to provide a package that lives slightly above base in the search path that will hold generic functions for any base functions that do not have them. Authors of add on packages can then at least share a common generic when that is appropriate. But do realize that there are lots of reasons to have generics with the same name, in different packages that are not compatible, and normal scoping rules apply. For example the XML package has a generic function addNode, as does the graph package, and they are not compatible, nor should they be. Anyone wanting to use both packages (and I often do) needs to manage the name conflicts (and that is where namespaces are essential). best wishes Robert The tradeoff is false because unit tests are a better way to assure safety. If the major packages (like stats) had a suite of tests, a package developer could load his own package, run all the unit tests, and see if he broke something. If it turns out that he broke something that wasn't covered by the tests, he could create a new test for that and submit it somewhere, perhaps on the R Wiki. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] buglet in terms calculations
Hi, Vince and I have noticed a problem with non-syntactic names in data frames and some modeling code (but not all modeling code). The following, while almost surely as documented could be a bit more helpful: m = matrix(rnorm(100), nc=10) colnames(m) = paste(1:10, letters[1:10], sep=_) d = data.frame(m, check.names=FALSE) f = formula(`1_a` ~ ., data=d) tm = terms(f, data=d) ##failure here, as somehow back-ticks have become part of the name ##not a quoting mechanism d[attr(tm, term.labels)] The variable attribute, in the terms object, keeps them as quotes, so modeling code that uses that attribute seems fine, but code that uses the term.labels fails. In particular, it seems (of those tested) that glm, lda, randomForest seem to work fine, while nnet, rpart can't handle nonsyntactic names in formulae as such In particlar, rpart contains this code: lapply(m[attr(Terms, term.labels)], tfun) which fails for the reasons given. One way to get around this, might be to modify the do_termsform code, right now we have: PROTECT(varnames = allocVector(STRSXP, nvar)); for (v = CDR(varlist), i = 0; v != R_NilValue; v = CDR(v)) SET_STRING_ELT(varnames, i++, STRING_ELT(deparse1line(CAR(v), 0), 0)); and then for term.labels, we copy over the varnames (with :, as needed) and perhaps we need to save the unquoted names somewhere? Or is there some other approach that will get us there? Certainly cleaning up the names via cleanTick = function(x) gsub(`, , x) works, but it seems a bit ugly, and it might be better if the modeling code was modified. best wishes -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] boundary case anomaly
Hi, Any reason these should be different? x=matrix(0, nr=0, nc=3) colnames(x) = letters[1:3] data.frame(x) #[1] a b c #0 rows (or 0-length row.names) y=vector(list, length=3) names(y) = letters[1:3] data.frame(y) #NULL data frame with 0 rows both should have names (the second one does not) and why print something different for y? The two of the last three examples refer to a NULL data frame, eg (d00 - d0[FALSE,]) # NULL data frame with 0 rows but there is no description of what a NULL data frame should be (zero rows or zero columns, or either or both - and why a special name?) best wishes Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] possible bug in model.frame.default
Please update to the latest snapshot R version 2.5.0 Under development (unstable) (2007-03-05 r40816) where all is well, Thibaut Jombart wrote: Dear list, I may have found a bug in model.frame.default (called by the lm function). The problem arises in my R dev version but not in my R 2.4.0. Here is my config : version _ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status Under development (unstable) major 2 minor 5.0 year 2007 month 03 day 04 svn rev 40813 language R version.string R version 2.5.0 Under development (unstable) (2007-03-04 r40813) Now a simple example to (hopefully) reproduce the bug (after a rm(list=ls())): dat=data.frame(y=rnorm(10),x1=runif(10),x2=runif(10)) weights=1:10/(sum(1:10)) form - as.formula(y~x1+x2) # here is the error lm(form,data=dat,weights=weights) Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : type (closure) incorrect pour la variable '(weights)' (sorry, error message is in French) As I said, these commands works using R.2.4.0 (same machine, same OS). Moreover, the following commands work: temp=weights lm(form,data=dat,weights=temp) This currently seems to cause a check fail in the ade4 package. I tried to find out where the bug came from: all I found is the (potential) bug comes from model.frame.default, and more precisely: debug: data - .Internal(model.frame(formula, rownames, variables, varnames, extras, extranames, subset, na.action)) Browse[1] Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : type (closure) incorrect pour la variable '(weights)' I couldn't go further because of the .Internal. I tried to googlise this, but I found no such problem reported recently. Can anyone tell if this is actually a bug? (In case not, please tell me where I got wrong). Regards, Thibaut. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wish list
programs. A simpler fix for this would be for you to define a wrapper for R CMD that installed the R tools path before executing, and uninstalls it afterwards. But this is unnecessary for most people, because Microsoft's find.exe is pretty rarely used. Anyone who uses batch files will use it quite a bit. It certainly causes me problems on an ongoing basis and is an unacceptable conflict in my opinion. I realize that its not entirely of R's doing but it would be best if R did not make it worse by requiring the use of find. 13. Make upper/lower case of simplify/SIMPLIFY consistent on all apply commands and add a simplify= arg to by. It would have been good not to introduce the inconsistency years ago, but it's too late to change now. Its not too late to add it to by(). Also note that the gsubfn package does have a workaround for this. In gsubfn one can preface any R function with fn$ and if that is done then the function can have a simplify= argument which fn$ intercepts and processes. e.g. library(gsubfn) fn$by(CO2[4:5], CO2[2], x ~ coef(lm(uptake ~ ., x)), simplify = rbind) fn$ can also interpret formulas as functions (and does quasi perl interpolation in strings) so the formula in the third argument is regarded to be the same as the anonymous function: function(x) coef(lm(uptake ~., x)) . More examples are in the gsubfn vignette. 14. better reporting of location of errors and warnings in R CMD check. This is in the works, but probably not for 2.5.x. Great. This will be very welcome. 15. tcl tile library (needs tcl 8.5 or to be compiled in with 8.4) 16. extend aggregate to allow vector valued functions: aggregate(CO2[4:5], CO2[1:2], function(x) c(mean = mean(x), sd = sd(x))) [summaryBy in doBy package and cast in reshape package can already do similar things but this seems sufficiently fundamental that it ought to be in the base of R] 17. All OSes should support input= arg of system. My previous New Year wishlists are here: https://www.stat.math.ethz.ch/pipermail/r-devel/2006-January/035949.html https://www.stat.math.ethz.ch/pipermail/r-help/2005-January/061984.html https://www.stat.math.ethz.ch/pipermail/r-devel/2004-January/028465.html To anyone still reading: Many of the suggestions above would improve R, but they're unlikely to happen unless someone volunteers to do them. I'd suggest picking whichever one of these or some other list that you think is the highest priority, and post a specific proposal to this list about how to do it. If you get a negative response or no response, move on to the next one, or put it into a contributed package instead. I think it works best when contributors develop their software in contributed packages since it avoids squabbles with the core group. The core group can then integrate these into R itself if it seems warranted. When you make the proposal, consider how much work you're asking other people to do, and how much you're volunteering to do yourself. If you're asking others to do a lot, then the suggestion had better be really valuable to *them*. The implementation effort should not be a significant consideration in generating wish lists.What should be considered is what is really needed. Its better to know what you need and then later decide whether to implement it or not than to suppress articulating the need. Otherwise the development is driven by what is easy to do rather than what is needed. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A possible improvement to apropos
I would vastly prefer apropos to be case insensitive by default. The point of it is to find things similar to a string, not the same as, and given that capitalization in R is somewhat erratic (due to many authors, and some of those changing their minds over the years), I find the current apropos of little use. I would also, personally prefer some sort of approximate matching since there are different ways to spell some words, and some folks abbreviate parts of words. Martin Maechler wrote: Hi Seth, Seth == Seth Falcon [EMAIL PROTECTED] on Wed, 13 Dec 2006 16:38:02 -0800 writes: Seth Hello all, I've had the following apropos alternative Seth in my ~/.Rprofile for some time, and have found it Seth more useful than the current version. Basically, my Seth version ignores case when searching. Seth If others find this useful, perhaps apropos could be Seth suitably patched (and I'd be willing to create such a Seth patch). Could you live with typing 'i=T' (i.e. ignore.case=TRUE)? In principle, I'd like to keep the default as ignore.case=FALSE, since we'd really should teach the users that R *is* case sensitive. Ignoring case is the exception in the S/R/C world, not the rule I have a patch ready which implements your suggestion (but not quite with the code below), but as said, not as default. Martin Seth + seth Seth Here is my version of apropos: APROPOS - function (what, where = FALSE, mode = any) { if (!is.character(what)) stop(argument , sQuote(what), must be a character vector) x - character(0) check.mode - mode != any for (i in seq(search())) { contents - ls(pos = i, all.names = TRUE) found - grep(what, contents, ignore.case = TRUE, value = TRUE) if (length(found)) { if (check.mode) { found - found[sapply(found, function(x) { exists(x, where = i, mode = mode, inherits = FALSE) })] } numFound - length(found) x - c(x, if (where) structure(found, names = rep.int(i, numFound)) else found) } } x } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] data frame subset patch, take 2
Hi, We had the names discussion and, AFAIR, the idea that someone might misinterpret the output as suggesting that one could index by number, seemed to kill it. A more reasonable argument against is that names- is problematic. You can use $, [[ (with character subscripts), and yes ls does sort of do what you want (but sorts the values, not sure if that is good). I think it is also inefficient in that I believe it copies the CHARSXP's (not sure we really need to do that, but I have not had time to sort out the issues). And there is an eapply as well, so ls() is not always needed. mget can be used to retrieve multiple values (and should be much more efficient than multiple calls to get). There is no massign (no one seems to have asked for it), and better design choice might be to vectorize assign. best wishes Robert Vladimir Dergachev wrote: On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote: Vladimir Dergachev wrote: 2. It would be nice to have true hashed arrays in R (i.e. O(1) access times). So far I have used named lists for this, but they are O(n): new.env(hash=TRUE) with get/assign/exists works ok. But I suspect its just too easy to use named lists because it is easy, and that has bad performance ramifications for user code (perhaps the R developers are more vigilant about this for the R code itself). Cool, thank you ! I wonder whether environments could be extended to allow names() to work (altough I see that ls() does the same function) and to allow for(i in E) loops. thank you Vladimir Dergachev __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] data frame subset patch, take 2
Robert Gentleman wrote: Hi, We had the names discussion and, AFAIR, the idea that someone might misinterpret the output as suggesting that one could index by number, seemed to kill it. A more reasonable argument against is that names- is problematic. You can use $, [[ (with character subscripts), and yes ls does sort of do what you want (but sorts the values, not sure if that is good). I think it is also inefficient in that I believe it copies the CHARSXP's (not sure we really need to do that, but I have not had time to sort out I misremembered - it does not copy CHARSXPs. the issues). And there is an eapply as well, so ls() is not always needed. mget can be used to retrieve multiple values (and should be much more efficient than multiple calls to get). There is no massign (no one seems to have asked for it), and better design choice might be to vectorize assign. best wishes Robert Vladimir Dergachev wrote: On Wednesday 13 December 2006 1:23 pm, Marcus G. Daniels wrote: Vladimir Dergachev wrote: 2. It would be nice to have true hashed arrays in R (i.e. O(1) access times). So far I have used named lists for this, but they are O(n): new.env(hash=TRUE) with get/assign/exists works ok. But I suspect its just too easy to use named lists because it is easy, and that has bad performance ramifications for user code (perhaps the R developers are more vigilant about this for the R code itself). Cool, thank you ! I wonder whether environments could be extended to allow names() to work (altough I see that ls() does the same function) and to allow for(i in E) loops. thank you Vladimir Dergachev __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] caching frequently used values
the idea you are considering is also, at times, referred to as memoizing. I would not use a list, but rather an environment, and basically you implement something that first looks to see if there is a value, and if not, compute and store. It can speed things up a lot in some examples (and slow them down a lot in others). Wikipedia amongst other sources: http://en.wikipedia.org/wiki/Memoization Environments have advantages over lists here (if there are lots of matrices the lookup can be faster - make sure you use hash=TRUE), and reference semantics, which you probably want. Tamas K Papp wrote: Hi, I am trying to find an elegant way to compute and store some frequently used matrices on demand. The Matrix package already uses something like this for storing decompositions, but I don't know how to do it. The actual context is the following: A list has information about a basis of a B-spline space (nodes, order) and gridpoints at which the basis functions would be evaluated (not necessarily the nodes). Something like this: bsplinegrid - list(nodes=1:8,order=4,grid=seq(2,5,by=.2)) I need the design matrix (computed by splineDesign) for various derivatives (not necessarily known in advance), to be calculated by the function bsplinematrix - function(bsplinegrid, deriv=0) { x - bsplinegrid$grid Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order, derivs = rep(deriv,length(x } However, I don't want to call splineDesign all the time. A smart way would be storing the calculated matrices in a list inside bsplinegrid. Pseudocode would look like this: bsplinematrix - function(bsplinegrid, deriv=0) { if (is.null(bsplinegrid$matrices[[deriv+1]])) { ## compute the matrix and put it in the list bsplinegrid$matrices, ## but not of the local copy } bsplinegrid$matrices[[deriv+1]] } My problem is that I don't know how to modify bsplinegrid$matrices outside the function -- assignment inside would only modify the local copy. Any help would be appreciated -- I wanted to learn how Matrix does it, but don't know how to display the source with s3 methods (getAnywhere doesn't work). Tamas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] caching frequently used values
e1 = new.env(hash=TRUE) e1[[1]] = whateveryouwant ie. just transform to characters, but I don't see why you want to do that - surely there are more informative names to be used - Tamas K Papp wrote: Hi Robert, Thanks for your answer. I would create and environment with new.env(), but how can I assign and retrieve values based on a numerical index (the derivative)? The example of the help page of assign explicitly shows that assign(a[1]) does not work for this purpose. Thanks, Tamas On Wed, Dec 13, 2006 at 01:54:28PM -0800, Robert Gentleman wrote: the idea you are considering is also, at times, referred to as memoizing. I would not use a list, but rather an environment, and basically you implement something that first looks to see if there is a value, and if not, compute and store. It can speed things up a lot in some examples (and slow them down a lot in others). Wikipedia amongst other sources: http://en.wikipedia.org/wiki/Memoization Environments have advantages over lists here (if there are lots of matrices the lookup can be faster - make sure you use hash=TRUE), and reference semantics, which you probably want. Tamas K Papp wrote: Hi, I am trying to find an elegant way to compute and store some frequently used matrices on demand. The Matrix package already uses something like this for storing decompositions, but I don't know how to do it. The actual context is the following: A list has information about a basis of a B-spline space (nodes, order) and gridpoints at which the basis functions would be evaluated (not necessarily the nodes). Something like this: bsplinegrid - list(nodes=1:8,order=4,grid=seq(2,5,by=.2)) I need the design matrix (computed by splineDesign) for various derivatives (not necessarily known in advance), to be calculated by the function bsplinematrix - function(bsplinegrid, deriv=0) { x - bsplinegrid$grid Matrix(splineDesign(bslinegrid$knots, x, ord=basis$order, derivs = rep(deriv,length(x } However, I don't want to call splineDesign all the time. A smart way would be storing the calculated matrices in a list inside bsplinegrid. Pseudocode would look like this: bsplinematrix - function(bsplinegrid, deriv=0) { if (is.null(bsplinegrid$matrices[[deriv+1]])) { ## compute the matrix and put it in the list bsplinegrid$matrices, ## but not of the local copy } bsplinegrid$matrices[[deriv+1]] } My problem is that I don't know how to modify bsplinegrid$matrices outside the function -- assignment inside would only modify the local copy. Any help would be appreciated -- I wanted to learn how Matrix does it, but don't know how to display the source with s3 methods (getAnywhere doesn't work). Tamas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] data frame subset patch, take 2
Hi, I tried take 1, and it failed. I have been traveling (and with Martin's changes also waiting for things to stabilize) before trying take 2, probably later this week and I will send an email if it goes in. Anyone wanting to try it and run R through check and check-all is welcome to do so and report success or failure. best wishes Robert Martin Maechler wrote: Marcus == Marcus G Daniels [EMAIL PROTECTED] on Tue, 12 Dec 2006 09:05:15 -0700 writes: Marcus Vladimir Dergachev wrote: Here is the second iteration of data frame subset patch. It now passes make check on both 2.4.0 and 2.5.0 (svn as of a few days ago). Same speedup as before. Marcus Hi, Marcus I was wondering if this patch would make it into the Marcus next release. I don't see it in SVN, but it's hard Marcus to be sure because the mailing list apparently Marcus strips attachments. If it isn't in, or going to be Marcus in, is this patch available somewhere else? I was wondering too. http://www.r-project.org/mail.html explains what kind of attachments are allowed on R-devel. I'm particularly interested, since during the last several days I've made (somewhat experimental) changes to R-devel, which makes some dealings with large data frames that have trivial rownames (those represented as 1:nrow(.)) much more efficient. Notably, as.matrix() of such data frames now no longer produces huge row names, and e.g. dim(.) of such data frames has become lightning fast [compared to what it was]. Some measurements: N - 1e6 set.seed(1) ## we round (for later dump().. reasons) x - round(rnorm(N),2) y - round(rnorm(N),2) mOrig - cbind(x = x, y = y) df - data.frame(x = x, y = y) mNew - as.matrix(df) (sizes - sapply(list(mOrig=mOrig, df=df, mNew=mNew), object.size)) ## R-2.4.0 (64-bit): ##mOrig df mNew ## 16000520 16000776 72000560 ## R-2.4.1 beta (32-bit): ##mOrig df mNew ## 16000296 16000448 52000320 ## R-pre-2.5.0 (32-bit): ##mOrig df mNew ## 16000296 16000448 16000296 ## N - 1e6 df - data.frame(x = 0+ 1:N, y = 1+ 1:N) system.time(for(i in 1:1000) d - dim(df)) ## R-2.4.1 beta (32-bit) [deb1]: ## [1] 1.920 3.748 7.810 0.000 0.000 ## R-pre-2.5.0 (32-bit) [deb1]: ##user system elapsed ## 0.012 0.000 0.011 --- --- --- --- --- --- --- --- --- --- However, currently df[2,] ## still internally produces the character(1e6) row names! something I think we should eliminate as well, i.e., at least make sure that only seq_len(1e6) is internally produced and not the character vector. Note however that some of these changes are backward incompatible. I do hope that the changes gaining efficiency for such large data frames are worth some adaption of current/old R source code.. Feedback on this topic is very welcome! Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Error condition in evaluating a promise
Simon Urbanek wrote: Seth, thanks for the suggestions. On Oct 18, 2006, at 11:23 AM, Seth Falcon wrote: Simon Urbanek [EMAIL PROTECTED] writes: thanks, but this is not what I want (the symbols in the environment are invisible outside) and it has nothing to do with the question I posed: as I was saying in the previous e-mail the point is to have exported variables in a namespace, but their value is known only after the namespace was attached (to be precise I'm talking about rJava here and many variables are valid only after the VM was initialized - using them before is an error). We have a similar use case and here is one workaround: Define an environment in your name space and use it to store the information that you get after VM-init. There are a number of ways to expose this: * Export the env and use vmEnv$foo * Provide accessor functions, getVmFooInfo() * Or you can take the accessor function approach a bit further to make things look like a regular variable by using active bindings. I can give more details if you want. We are using this in the BSgenome package in BioC. I'm aware of all three solutions and I've tested all three of them (there is in fact a fourth one I'm actually using, but I won't go into detail on that one ;)). Active bindings are the closest you can get, but then the value is retrieved each time which I would like to avoid. The solution with promises is very elegant, because it guarantees that on success the final value will be locked. It also makes sense semantically, because the value is determined by code bound to the variable and premature evaluation is an error - just perfect. Probably I should have been more clear in my original e-mail - the question was not to find a work-around, I have plenty of them ;), the question was whether the behavior of promises under error conditions is desirable or not (see subject ;)). For the internal use of promises it is irrelevant, because promises as function arguments are discarded when an error condition arises. However, if used in the wild, the behavior as described would be IMHO more useful. Promises were never intended for use at the user level, and I don't think that they can easily be made useful at that level without exposing a lot of stuff that cannot easily be explained/made bullet proof. As Brian said, you have not told us what you want, and I am pretty sure that there are good solutions available at the R level for most problems. Although the discussion has not really started, things like dispatch in the S4 system are likely to make lazy evaluation a thing of the past since it is pretty hard to dispatch on class without knowing what the class is. That means, that as we move to more S4 methods/dispatch we will be doing more evaluation of arguments. best wishes Robert Cheers, Simon __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Feature request: names(someEnv) same as ls(someEnv)
Duncan Murdoch wrote: On 10/15/2006 2:48 PM, Seth Falcon wrote: Hi, I would be nice if names() returned the equivalent of ls() for environments. Wouldn't that just confuse people into thinking that environments are vectors? Wouldn't it then be reasonable to assume that env[[which(names(env) == foo)]] would be a synonym for env$foo? absolutely not - environments can only be subscripted by name, not by logicals or integer subscripts - so I hope that most users would figure that one out I don't see why this would be nice: why not just use ls()? why? environments do get used, by many as vectors (well hash tables), modulo the restrictions on subscripting and the analogy is quite useful and should be encouraged IMHO. Robert Duncan Murdoch --- a/src/main/attrib.c +++ b/src/main/attrib.c @@ -687,6 +687,8 @@ SEXP attribute_hidden do_names(SEXP call s = CAR(args); if (isVector(s) || isList(s) || isLanguage(s)) return getAttrib(s, R_NamesSymbol); +if (isEnvironment(s)) +return R_lsInternal(s, 0); return R_NilValue; } + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Strange behaviour of the [[ operator
True, re name matching, but I think we might want to consider a warning if they are supplied as the user may not be getting what they expect, regardless of the documentation Peter Dalgaard wrote: Seth Falcon [EMAIL PROTECTED] writes: Similar things happen in many similar circumstances. Here's a similar thing: Not really, no? v - 1:5 v [1] 1 2 3 4 5 v[mustBeDocumentedSomewhere=3] [1] 3 And this can be confusing if one thinks that subsetting is really a function and behaves like other R functions w.r.t. to treatment of named arguments: m - matrix(1:4, nrow=2) m [,1] [,2] [1,]13 [2,]24 m[j=2] [1] 2 Or even m[j=2,i=] [1] 2 4 However, what would the argument names be in the 2-dim case? i, j are used only in help([) and that page is quite specific about explaining that named matching doesn't work. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Question about substitute() and function def
Duncan Murdoch wrote: On 9/14/2006 3:01 PM, Seth Falcon wrote: Hi, Can someone help me understand why substitute(function(a) a + 1, list(a=quote(foo))) gives function(a) foo + 1 and not function(foo) foo + 1 The man page leads me to believe this is related to lazy evaluation of function arguments, but I'm not getting the big picture. I think it's the same reason that this happens: substitute(c( a = 1, b = a), list(a = quote(foo))) c(a = 1, b = foo) The a in function(a) is the name of the arg, it's not the arg itself yes, but the logic seems to be broken. In Seth's case there seems to be no way to use substitute to globally change an argument and all instances throughout a function, which seems like a task that would be useful. even here, I would have expected all instances of a to change, not some (which is missing). Now a harder question to answer is why this happens: substitute(function(a=a) 1, list(a=quote(foo))) function(a = a) 1 a bug for sure I would have expected to get function(a = foo) 1. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Bug/problem reporting: Possible to modify posting guide FAQ?
Hi, I guess the question often comes down to whether it is a bug report, or a question. If you know it is a bug, and have a complete and correct example where the obviously incorrect behavior occurs and you are positive that the problem is the package then sending it to the maintainer is appropriate. When I get these I try to deal with them. Real bug reports that go to the mailing list may be missed so in my opinion it would be best to cc the maintainer and we will amend the FAQ in that direction. If instead you are asking a question, of the form, is this a bug, or why is this happening, then for BioC at least, it is better to post directly to the list, as there are many folks who can help and you are more likely to get an answer. When I get one of these emails I always refer the person to the mailing lists. I see little problem with being redirected by a maintainer to the mailing list if they feel that the question is better asked there. Bioconductor is different from R, clearly our mailing list has to be more about the constituent packages, since we will direct questions about R to the appropriate R mailing lists. R mailing lists tend to be about R, so asking about a specific package there (among the 1000 or so) often does not get you very far, but sometimes it does. best wishes Robert Steven McKinney wrote: If users post a bug or problem issue to an R-based news group (R-devel, R-help, BioC - though BioC is far more forgiving) they get yelled at for not reading the posting guide and FAQ. Please *_do_* read the FAQ, the posting guide, ... the yellers do say. So I read the BioC FAQ and it says... http://www.bioconductor.org/docs/faq/ Bug reports on packages should perhaps be sent to the package maintainer rather than to r-bugs. So I send email to a maintainer, who I believe rightly points out best to send this kind of questions to the bioc mailing list, rather than to myself privately, because other people might (a) also have answers or (b) benefit from the questions answers. Could the FAQ possibly be revised to some sensible combination that generates less finger pointing, such as Bug reports on packages should be sent to the Bioconductor mailing list, and sent or copied to the package maintainer, rather than to r-bugs. or Bug reports on packages should be sent to the package maintainer, and copied to the Bioconductor mailing list, rather than to r-bugs. Could the posting guides to R-help and R-devel do something similar? Sign me Tired of all the finger pointing http://www.r-project.org/posting-guide.html If the question relates to a contributed package , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find(functionname) and packageDescription(packagename) to find this information. Only send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports. How about If the question relates to a contributed package , e.g., one downloaded from CRAN, email the list and be sure to additionally send to or copy to the package maintainer as well. You can use find(functionname) and packageDescription(packagename) to find this information. Only send such questions to one of R-help or R-devel. This applies to both requests for help and to bug reports. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 'object is not subsettable' in prototype
Prof Brian Ripley wrote: On Mon, 21 Aug 2006, Seth Falcon wrote: John Chambers [EMAIL PROTECTED] writes: When I was introducing the special type for S4 objects, my first inclination was to have length(x) for those objects be either NA or an error, along the lines that intuitively length(x) means the number of elements in the vector-style object x. However, that change quickly was demonstrated to need MANY revisions to the current code. Perhaps some details on the required changes will help me see the light, but I would really like to see length(foo) be an error (no such method) when foo is an arbitary S4 class. According to the Blue Book p.96 every S object has a length and 'An Introduction to R' repeats this. So I believe an error is not an option. Indeed, from the wording, I think code could legitimately assume length(x) works and 0 = length(x) and it is an integer (but not necessarily of type 'integer'). Certainly functions and formulae have a length (different for functions in S and R, as I recall), and they are not 'vector-style'. Yes, but that is because that in S(-Plus), and not in R, virtually every object was an instance of a generic vector, including functions (formulas were white book, not blue, and I'm still not sure that indexing of them makes sense, but I am sure that indexing functions does not; it suggests, at least to me, that we want to emphasize implementation over semantics). Now, in R, since not everything is a generic vector, it is less clear what to do in some cases, and I am not going to argue too hard against everything having a length, but I think the number 1 is a much better choice than the number 0. (the compromise solution of 0.5 has some charm :-) I am also scared that such reasoning will lead one to believe that indexing these things using [, or similar should work, and that leads to major problems, since I lost the argument about not indexing outside of array bounds some years ago. What would be sensible in that case? Certainly not what currently happens with S4 objects (in R release). best wishes Robert I have encountered bugs due to accidental dispatch -- functions returning something other than an error because of the zero-length list implementation of S4. It would not be surprising if some of the breakage caused by removing this feature identifies real bugs. I was thinking that one of the main advatnages of the new S4 type was to get away from this sort of accidental dispatch. Not trying to be snide, but what is useful about getting a zero for length(foo)? The main use I can think of is in trying to identify S4 instances, but happily, that is no longer needed. + seth -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] configure on mac
Hi, I think Simon and Stefano are both off line for a little while. I can confirm that an upgrade of Xcode to either 2.3 or the very recent 2.4 is needed in most cases, either seems to work so probably 2.4 is the better choice. best wishes Robert Prof Brian Ripley wrote: I gather you need to update your Xtools: others have had similar problems. (If they are online you will no doubt get more complete information.) On Sat, 12 Aug 2006, roger koenker wrote: I'm having trouble making yesterday's R-devel on my macs. ./configure seems fine, but eventually in make I get: gcc -dynamiclib -Wl,-macosx_version_min -Wl,10.3 -undefined dynamic_lookup -single_module -multiply_defined suppress -L/sw/lib -L/ usr/local/lib -install_name libR.dylib -compatibility_version 2.4.0 - current_version 2.4.0 -headerpad_max_install_names -o libR.dylib Rembedded.o CConverters.o CommandLineArgs.o Rdynload.o Renviron.o RNG.o apply.o arithmetic.o apse.o array.o attrib.o base.o bind.o builtin.o character.o coerce.o colors.o complex.o connections.o context.o cov.o cum.o dcf.o datetime.o debug.o deparse.o deriv.o dotcode.o dounzip.o dstruct.o duplicate.o engine.o envir.o errors.o eval.o format.o fourier.o gevents.o gram.o gram-ex.o graphics.o identical.o internet.o iosupport.o lapack.o list.o localecharset.o logic.o main.o mapply.o match.o memory.o model.o names.o objects.o optim.o optimize.o options.o par.o paste.o pcre.o platform.o plot.o plot3d.o plotmath.o print.o printarray.o printvector.o printutils.o qsort.o random.o regex.o registration.o relop.o rlocale.o saveload.o scan.o seq.o serialize.o size.o sort.o source.o split.o sprintf.o startup.o subassign.o subscript.o subset.o summary.o sysutils.o unique.o util.o version.o vfonts.o xxxpr.o `ls ../appl/*.o ../nmath/ *.o ../unix/*.o 2/dev/null|grep -v /ext-` -framework vecLib - lgfortran -lgcc_s -lSystemStubs -lmx -lSystem ../extra/zlib/ libz.a ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a -lintl - liconv -Wl,-framework -Wl,CoreFoundation -lreadline -lm -liconv /usr/bin/libtool: unknown option character `m' in: -macosx_version_min Usage: /usr/bin/libtool -static [-] file [...] [-filelist listfile [,dirname]] [-arch_only arch] [-sacLT] Usage: /usr/bin/libtool -dynamic [-] file [...] [-filelist listfile [,dirname]] [-arch_only arch] [-o output] [-install_name name] [- compatibility_version #] [-current_version #] [-seg1addr 0x#] [- segs_read_only_addr 0x#] [-segs_read_write_addr 0x#] [-seg_addr_table filename] [-seg_addr_table_filename file_system_path] [-all_load] [-noall_load] make[3]: *** [libR.dylib] Error 1 make[2]: *** [R] Error 2 make[1]: *** [R] Error 1 make: *** [R] Error 1 This was ok as of my last build which was: version _ platform powerpc-apple-darwin8.7.0 arch powerpc os darwin8.7.0 system powerpc, darwin8.7.0 status Under development (unstable) major 2 minor 4.0 year 2006 month 07 day28 svn rev38710 language R version.string R version 2.4.0 Under development (unstable) (2006-07-28 r38710) url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] HTTP User-Agent header
should appear at an R-devel near you... thanks Seth Seth Falcon wrote: Robert Gentleman [EMAIL PROTECTED] writes: OK, that suggests setting at the options level would solve both of your problems and that seems like the best approach. I don't really want to pass this around as a parameter through the maze of functions that might actually download something if we don't have to. I have an updated patch that adds an HTTPUserAgent option. The default is a string like: R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu) If the HTTPUserAgent option is NULL, no user agent header is added to HTTP requests (this is the current behavior). This option allows R to use an arbitrary user agent header. The patch adds two non-exported functions to utils: 1) defaultUserAgent - returns a string like above 2) makeUserAgent - formats content of HTTPUserAgent option for use as part of an HTTP request header. I've tested on OSX and Linux, but not on Windows. When USE_WININET is defined, a user agent string of R was already being used. With this patch, the HTTPUserAgent options is used. I'm unsure if NULL is allowed. Also, in src/main/internet.c there is a comment: Next 6 are for use by libxml, only and then a definition for R_HTTPOpen. Not sure how/when these get used. The user agent for these calls remains unspecified with this patch. + seth Patch summary: src/include/R_ext/R-ftp-http.h |2 +- src/include/Rmodules/Rinternet.h |2 +- src/library/base/man/options.Rd |5 + src/library/utils/R/readhttp.R | 25 + src/library/utils/R/zzz.R|3 ++- src/main/internet.c |2 +- src/modules/internet/internet.c | 37 + src/modules/internet/nanohttp.c |8 ++-- 8 files changed, 66 insertions(+), 18 deletions(-) Index: src/include/R_ext/R-ftp-http.h === --- src/include/R_ext/R-ftp-http.h(revision 38715) +++ src/include/R_ext/R-ftp-http.h(working copy) @@ -36,7 +36,7 @@ int R_FTPRead(void *ctx, char *dest, int len); void R_FTPClose(void *ctx); -void * RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK); +void * RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers, int cacheOK); int RxmlNanoHTTPRead(void *ctx, void *dest, int len); void RxmlNanoHTTPClose(void *ctx); int RxmlNanoHTTPReturnCode(void *ctx); Index: src/include/Rmodules/Rinternet.h === --- src/include/Rmodules/Rinternet.h (revision 38715) +++ src/include/Rmodules/Rinternet.h (working copy) @@ -9,7 +9,7 @@ typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode); typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, char *mode); -typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK); +typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, const int cacheOK); typedef int(*R_HTTPReadRoutine)(void *ctx, char *dest, int len); typedef void (*R_HTTPCloseRoutine)(void *ctx); Index: src/main/internet.c === --- src/main/internet.c (revision 38715) +++ src/main/internet.c (working copy) @@ -129,7 +129,7 @@ { if(!initialized) internet_Init(); if(initialized 0) - return (*ptr-HTTPOpen)(url, 0); + return (*ptr-HTTPOpen)(url, NULL, 0); else { error(_(internet routines cannot be loaded)); return NULL; Index: src/library/utils/R/zzz.R === --- src/library/utils/R/zzz.R (revision 38715) +++ src/library/utils/R/zzz.R (working copy) @@ -9,7 +9,8 @@ internet.info = 2, pkgType = .Platform$pkgType, str = list(strict.width = no), - example.ask = default) + example.ask = default, + HTTPUserAgent = defaultUserAgent()) extra - if(.Platform$OS.type == windows) { list(mailer = none, Index: src/library/utils/R/readhttp.R === --- src/library/utils/R/readhttp.R(revision 38715) +++ src/library/utils/R/readhttp.R(working copy) @@ -6,3 +6,28 @@ stop(transfer failure) file.show(file, delete.file = delete.file, title = title, ...) } + + + +defaultUserAgent - function() +{ +Rver - paste(R.version$major, R.version$minor, sep=.) +Rdetails - paste(Rver, R.version$platform, R.version$arch, + R.version$os) +paste(R (, Rdetails, ), sep=) +} + + +makeUserAgent - function(format = TRUE) { +agent - getOption(HTTPUserAgent) +if (is.null(agent
Re: [Rd] [R] HTTP User-Agent header
I wonder if it would not be better to make the user agent string something that is configurable (at the time R is built) rather than at run time. This would make Seth's patch about 1% as long. Or this could be handled as an option. The patches are pretty extensive and allow for setting the agent header by setting parameters in function calls (eg download.files). I am not sure there is a good use case for that level of flexibility and the additional code is substantial. The issue that I think arises is that there are potentially other systems that will be unhappy with R's identification of itself and so some users may also need to turn it off. Any strong opinions? James P. Howard, II wrote: On 7/28/06, Seth Falcon [EMAIL PROTECTED] wrote: I have a rough draft patch, see below, that adds a User-Agent header to HTTP requests made in R via download.file. If there is interest, I will polish it. It looks right, but I am running under Windows without a compiler. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] HTTP User-Agent header
OK, that suggests setting at the options level would solve both of your problems and that seems like the best approach. I don't really want to pass this around as a parameter through the maze of functions that might actually download something if we don't have to. I think we can provide something early next week on R-devel for folks to test. But I suspect that as Henrik also does, the set of sites that will refuse us with a User-Agent header will be much larger than those that James has found that refuse us without it. best wishes Robert Henrik Bengtsson wrote: On 7/28/06, Robert Gentleman [EMAIL PROTECTED] wrote: I wonder if it would not be better to make the user agent string something that is configurable (at the time R is built) rather than at run time. This would make Seth's patch about 1% as long. Or this could be handled as an option. The patches are pretty extensive and allow for setting the agent header by setting parameters in function calls (eg download.files). I am not sure there is a good use case for that level of flexibility and the additional code is substantial. The issue that I think arises is that there are potentially other systems that will be unhappy with R's identification of itself and so some users may also need to turn it off. Any strong opinions? Actually two: 1) If you wish to pull down (read extract from HTML or similar) live data from the web, you might want to be able to immitate a certain browser. For instance, if you tell some webserver you're a simple mobile phone or lynx, you might be able get back very clean data. Some servers might also block unknown web browsers. 2) If the webserver of a package reprocitory decided to make use of the user-agent string to decide what version of the reprocitory it should deliver, I would like to be able to trick the server. Why? Many times I found myself working on a system where I do not have the rights to update to the latest or the developers version of R. However, although I have not the very latest version of R you can do work. For instance, in Bioconductor the biocLite() co gives you either the stable or the developers of Bioconductor depending on your R version, but looking into the biocLite() code and beyond, you find that you actually can install a Bioconductor v1.9 package in R v2.3.1. It can be risky business, but if you know what you're doing, it can save your day (or week). Cheers Henrik James P. Howard, II wrote: On 7/28/06, Seth Falcon [EMAIL PROTECTED] wrote: I have a rough draft patch, see below, that adds a User-Agent header to HTTP requests made in R via download.file. If there is interest, I will polish it. It looks right, but I am running under Windows without a compiler. -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] proposed modifications to deprecated
Hi, Over the past six months we have had a few problems with deprecation and Seth Falcon and I want to propose a few additions to the mechanism that will help deal with cases other than the deprecation of functions. In the last release one of the arguments to La.svd was deprecated, but the warning message was very unclear and suggested that in fact La.svd was deprecated. Adding a third argument to .Deprecated, msg say (to be consistent with the internal naming mechanism) that contains the message string would allow for handling the La.svd issue in a more informative way. It is a strict addition so no existing code is likely to be broken. We also need to deprecate data from time to time. Since the field of genomics is moving fast as good example from five years ago is often no longer a good example today. This one is a bit harder, but we can modify tools:::.make_file_exts(data) to first look for a .DEP extension (this does not seem to be a widely used extension), and if such a file exists, ie NameofData.DEP one of two things happens: if it contains a character string we use that for the message (we could source it for the message?), if not print a standard message (just as .Deprecated does) and then continue with the search using the other file extensions. Defunct could be handled similarly. Comments, alternative suggestions? thanks Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))
Kurt Hornik wrote: Simon Urbanek writes: On Apr 20, 2006, at 1:23 PM, Henrik Bengtsson (max 7Mb) wrote: Is it a general consensus on R-devel that *.tar.gz distributions should only be treated as a distribution for *building* packages and not for developing them? [Actually, distributing so that they can be installed and used.] I don't know whether this is a general consensus, but it definitely an important distinction. Some authors put their own Makefiles in src although they are not needed and in fact harmful, preventing the package to build on other systems - only because they are too lazy to use R building mechanism for development and don't make the above distinction. Right :-) Henrik, as I think I mentioned the last time you asked about this: of course you can basically do everything you want. But it comes at a price. For external sources, you need to write a Makefile of your own, so as to make it clear that you provide a mechanism which is different from the standard one. And, as Simon said, the gain in flexibility comes at a price. Personally and as one of the CRAN maintainers, I'd be very unhappy if package maintainers would start flooding their source .tar.gz packages with full development environment material. (I am also rather unhappy about shipping large data sets which are only used for instructional purposes [rather than providing the data set on its own].) It is simply not true that bandwidth does not matter. I can see the problem with large packages, but the current system does nothing about that AFAIC. And as Simon indicated, his biggest problem is the one set of files that we are allowed - so the argument is that the current approach is neither necessary nor sufficient and it imposes a structure on people that seems to be unneccearily restrictive. I don't see how excluding README (or any thing else that a package maintainer has put there) makes life better, but maybe I am missing something here. These are precisely the sorts of things that have helped me to figure out what was intended when it didn't work. So this approach is regressive, IMHO. If the size is not large, who cares what is in a package, and things releated to source should be in src. I see that a similar approach is being taken with the R directory (and probably other directories). This is, in my opinion, unfortunate, imposing restrictions that don't solve the problem mentioned in some general way are not useful. For BioC, we manually check the size etc and ask people to reduce and remove. You could easily do the same at CRAN (and even automate it). BioC packages can be enormous relative to those on CRAN and I don't think we have ever had a serious complaint about it. But then the data sets tend to be large, so maybe people are just more forgiving. As for the difference between source packages and built packages, yes it would be nice at some time to enter into a discussion on that topic. There are lots of things that can be done at build time (that are not currently being done) that would speed up package installation etc. But they come at the price that Henrik has mentioned. The built package is no longer suitable for development. And hence we may usefully consider another format (something between source and binary, .Rgz?) best wishes Robert If there is need, we could start having developer-package repositories. However, I'd prefer a different approach. We're currently in the process of updating the CRAN server infrastructure, and should be able to start deploying an R-forge project hosting service eventually (hopefully, we can set things up during the summer). This should provide us with an ideal infrastructure for sharing developer resources, in particular as we could add QC testing et al to the standard community services. Best -k __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))
I disagree, things like README files and other objects are important and should be included. I don't see the real advantage to such warnings, if someone wants them they could be turned on optionally. If size is an issue then authors should be warned that their package is large (in the top 1% at CRAN would be useful to some). I also find it helpful to know whose packages take forever to build, which we don't do. Just because someone put something in TFM doesn't mean it is either a good idea or sensible, in my experience. best wishes Robert Prof Brian Ripley wrote: On Wed, 19 Apr 2006, James Bullard wrote: Hello, I am having an issue with R CMD check with the nightly build of RC 2.3.0 (listed in the subject.) This is all explained in TFM, `Writing R Extensions'. The problem is this warning: * checking if this is a source package ... WARNING Subdirectory 'src' contains: README _Makefile These are unlikely file names for src files. In fact, they are not source files, but I do not see any reason why they cannot be there, or why I need to be warned of their presence. Potentially I could be informed of their presence, but that is another matter. Having unnecessary files in other people's packages just waste space and download bandwidth for each one of the users. Now, I only get this warning when I do: R CMD build affxparser R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz If I do: R CMD check -l ~/R-packages affxparser I do not get the warning. Is this inconsistent, or is there rationale behind this? I think the warning is inappropriate, or at the least a little restrictive. It seems as if I should be able to put whatever I want in there, especially the _Makefile as I like to build test programs directly and I want to be able to build exactly what I check out from my source code repository without having to copy files in and out. All described in TFM, including how to set defaults for what is checked. The output from R CMD check is below. Any insight would be appreciated. As always thanks for your patience. [...] -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))
Hi, Well, I guess if someone thinks they know how I am going to configure and build the sources needed to construct appropriate dynamic libraries so well that they can feel free to exclude files at their whim at install time, perhaps they could feel just as free to exclude them at build time? This makes no sense to me and certainly does not solve the size problem mentioned by Brian. If there is a single example of something that was better this way, I would be interested to hear it. I can think of several things that are worse. best wishes Robert Roger Bivand wrote: On Thu, 20 Apr 2006, Robert Gentleman wrote: I disagree, things like README files and other objects are important and should be included. I don't see the real advantage to such warnings, if someone wants them they could be turned on optionally. Isn't the point at least partly that all those files are lost on installation? If the README is to be accessible after installation, it can be placed under inst/, so that both users reading the source and installed versions can access it. So maybe the warning could be re-phrased to suggest use of the inst/ tree for files with important content? Best wishes, Roger If size is an issue then authors should be warned that their package is large (in the top 1% at CRAN would be useful to some). I also find it helpful to know whose packages take forever to build, which we don't do. Just because someone put something in TFM doesn't mean it is either a good idea or sensible, in my experience. best wishes Robert Prof Brian Ripley wrote: On Wed, 19 Apr 2006, James Bullard wrote: Hello, I am having an issue with R CMD check with the nightly build of RC 2.3.0 (listed in the subject.) This is all explained in TFM, `Writing R Extensions'. The problem is this warning: * checking if this is a source package ... WARNING Subdirectory 'src' contains: README _Makefile These are unlikely file names for src files. In fact, they are not source files, but I do not see any reason why they cannot be there, or why I need to be warned of their presence. Potentially I could be informed of their presence, but that is another matter. Having unnecessary files in other people's packages just waste space and download bandwidth for each one of the users. Now, I only get this warning when I do: R CMD build affxparser R CMD check -l ~/R-packages/ affxparser_1.3.3.tar.gz If I do: R CMD check -l ~/R-packages affxparser I do not get the warning. Is this inconsistent, or is there rationale behind this? I think the warning is inappropriate, or at the least a little restrictive. It seems as if I should be able to put whatever I want in there, especially the _Makefile as I like to build test programs directly and I want to be able to build exactly what I check out from my source code repository without having to copy files in and out. All described in TFM, including how to set defaults for what is checked. The output from R CMD check is below. Any insight would be appreciated. As always thanks for your patience. [...] -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Word boundaries and gregexpr in R 2.2.1 (PR#8547)
Should be patched in R-devel, will be available shortly [EMAIL PROTECTED] wrote: Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. text-c(This is a first example sentence., And this is a second example sentence.) If I now look for word boundaries with regexpr, this is what I get: regexpr(\\b, text, perl=TRUE) [1] 1 1 attr(,match.length) [1] 0 0 So far, so good. But with gregexpr I get: gregexpr(\\b, text, perl=TRUE) Error: cannot allocate vector of size 524288 Kb In addition: Warning messages: 1: Reached total allocation of 1015Mb: see help(memory.size) 2: Reached total allocation of 1015Mb: see help(memory.size) Why don't I get the locations and extensions of all word boundaries? I am using R 2.2.1 on a machine running Windows XP: R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor2.1 year 2005 month12 day 20 svn rev 36812 language R __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] clarification of library/require semantics
Recently I have added a lib.loc argument to require, so that it is more consistent with library. However, there are some oddities that folks have pointed out, and we do not have a documented description of the semantics for what should happen when the lib.loc parameter is provided. Proposal: the most common use case seems to be one where any other dependencies, or calls to library/require should also see the library specified in the lib.loc parameter for the duration of the initial call to library. Hence, we should modify the library search path for the duration of the call (via .libPaths). The alternative, is to not do that. Which is what happens now. Both have costs, automatically setting the library search path, of course, means that users that do not want that behavior have to manually remove things from their library. But if almost no one does that, and most folks I have asked have said they want the lib.loc parameter to be used for other loading. Comments? Robert -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel