Re: [Rd] SVN vs DVCS

2010-05-27 Thread Roger Peng
I think the main advantage of a DVCS is that it allows many many
people to make changes to a project and to integrate those changes in
a non-insane way. Given that R as a very restricted list of people who
actually make changes to the source, it doesn't seem that something
like git or Hg would provide a major advantage. If the people on that
list are happy with SVN then there's not much else to say. However, if
it were thought that maybe we want more people submitting
patches/making changes, then perhaps it might make more sense to move
to a DVCS.

I use git for everything mainly because it's *fast* and it has much
better tools for viewing changes/patches and revision history. For
example, 'git bisect' has allowed me track down bugs that would have
been very painful for me because I'm not intimately familiar with the
entire R source code.

-roger

On Wed, May 26, 2010 at 1:14 PM, Seth Falcon s...@userprimary.net wrote:
 On 5/26/10 4:16 AM, Gabor Grothendieck wrote:

 Note that one can also use any of the dvcs systems without actually
 moving from svn by using the dvcs (or associated extension/addon) as
 an svn client or by using it on an svn checkout.

 FWIW, I have been using git for several years now as my vsc of choice and
 use it for all svn-backed projects (R included) via git-svn.

 Some of the things I like:

 - Being able to organize changes in local commits that can be revised,
 reordered, rebased prior to publishing.  Once I got in the habit of working
 this way, I simply can't imagine going back.

 - Having quick access to full repository history without network
 access/delay.  Features for searching change history are more powerful (or
 easier for me to use) and I have found that useful as well.

 - This may not be true any longer with more recent svn servers/clients, but
 aside form the initial repo clone, working via git-svn was noticeably faster
 than straight svn client (!) -- I think related to how the tools organize
 the working copy and how many fstat calls they make.

 - I find the log reviewing functionality much better suited to reviewing
 changes.


 + seth

 --
 Seth Falcon | @sfalcon | http://userprimary.net/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave Feature Requests and Questions

2010-05-11 Thread Roger Peng
Hi, see below.

On Sat, May 8, 2010 at 7:35 PM, Charlotte Maia mai...@gmail.com wrote:

snip

 Furthermore, any help appreciated here:
 1. Does anyone know how to build Sweave documents, using Make, without
 starting a new instance (or multiple instances) of R, every time Make
 is called?

This can be worked around if you have code sections that don't change
at all, put them in a separate file and import them into the main
LaTeX document using '\input'. Then use the Makefile to specify the
dependencies between the LaTeX files. That way, when you run make the
first time, everything will run, but on subsequent calls to make,
Sweave will not be called as long as the code in the separate file
hasn't changed.

 2. Does anyone know a simple workaround to problem 3, without killing
 the entire Sweave.sty file?

 Furthermore, I'm temped to stop Sweave from generating all the eps
 files (I saw an option for this in the Sweave documentation), however
 I'm concerned it may stop others from building the document, if they
 use postscript.

 Then again, do enough people use postscript, to warrant such consideration?

I don't think I've used postscript in about 8 years.

-roger

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Generate NA of specified type

2010-05-05 Thread Roger Peng
What about NAof - function(x) as(NA, class(x))?

-roger

On Wed, May 5, 2010 at 11:57 AM, Hadley Wickham had...@rice.edu wrote:
 Hi all,

 Is there an existing function that provides the correct type of
 missing value for a given vector? e.g.

 NAof(1:3) # NA_integer_
 NAof(pi) # NA_real_
 NAof(a) # NA_character_
 NAof(T) # NA

 ?

 Thanks,

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tcltk and R

2010-03-15 Thread Roger Peng
R can be built without tcl/tk and so I think it would still be good to
check at runtime.

-roger

On Mon, Mar 15, 2010 at 11:21 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 I have had some comments on sqldf regarding its dependence on tcltk
 such as the second last sentence on this blog post:

 http://translate.google.com/translate?hl=ensl=zh-CNu=http://www.wentrue.net/blog/%3Fp%3D453prev=http://blogsearch.google.com/blogsearch%3Fhl%3Den%26ie%3DUTF-8%26q%3Dsqldf%26lr%3D%26sa%3DN%26start%3D10

 sqldf does not directly use tcltk but it does use strapply in gsubfn
 for its parsing and strapply uses tcl from the tcltk package to it
 speed up -- there is also an all R version of strapply but the gsubfn
 package as a whole still depends on tcltk whether or not the user uses
 tcltk or not.    I was thinking of changing the Depends:tcltk of
 gsubfn to Suggests:tcltk and then checking for tcltk availability at
 run time so if not available it would use the slower all R version;
 however, I was under the impression that all R platforms have a
 distribution of R that includes tcltk so in principle this should not
 be necessary.

 Is that right regarding tcltk availability on various platforms?  What
 is the situation here?

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Associative array?

2010-03-15 Thread Roger Peng
If I recall correctly, I thought indexing a vector/list with a
character vector uses hashing if the vector is over a certain length
(I can't remember the cutoff). Otherwise, it's a linear operation.

-roger

On Thu, Mar 11, 2010 at 8:09 PM, Ben mi...@emerose.org wrote:
 lists are generic vectors with names so lookup is O(n). Environments
 in R are true hash tables for that purpose:

 Ahh, thanks for the information!  A function I wrote before indexing
 on a data frame was slower than I expected, and now I know why.

 I don't quite understand - characters are (after raw vectors) the
 most expressive data type, so I'm not quite sure why that would be a
 limitation .. You can cast anything (but raw vector with nulls) into
 to a character.

 It's no big problem, it's just that if the solution is to convert to
 character type, then there are some implementation details to worry
 about.  For instance, I assume that as.character(x) is a reversible
 1-1 mapping if x is an integer (and not NA or NULL, etc).  But
 apparently that isn't exactly true for floats, and it would get more
 complicated for other data types.  So that's why I said it would not
 be elegant, but that is a very subjective statement.

 On a deeper level, it seems counterintuitive to me that indexing in R
 is O(n).  Futhermore, associative arrays are a fundamental data type,
 so I think it's weird that I can read the R tutorial, the R language
 definition, and even the manual page for new.env() and still not have
 enough information to build a decent one.  So IMHO things would be
 better if R had a built-in easy-to-use general purpose associative
 array.

 I don't see a problem thus I'm not surprised it didn't come up
 ;). But maybe I'm just missing your point ...

 Nope, this has come up before---I think R and I are just on different
 wavelengths.  Various things that I think are a problem with R are
 apparently not, and it's fine the way it is.

 Anyway, sorry for getting off topic ;-) You posted everything I need to know 
 and I really appreciate your help.


 --
 Ben Escoto

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] silly SVN question

2009-12-17 Thread Roger Peng
Are you searching for ./tools/rsync-recommended?

-roger

On Tue, Dec 15, 2009 at 11:14 AM, Ben Bolker bol...@ufl.edu wrote:

  Yes, but ... on my system at least the Recommended folder has a recent
 version of the Makefile, but the packages are old tarballs.  I have a
 fuzzy memory that I needed to download the packages from somewhere else
 to build a complete/up-to-date version, but I have forgotten where I
 read that.  And
 https://svn.r-project.org/R/trunk/src/library/Recommended/ shows that
 only Makefile.in and Makefile.win live here.

  Does your  src/library/Recommended have up-to-date source code for all
 the packages ... ?

  cheers
   Ben

 Kasper Daniel Hansen wrote:
 The obvious: the recommended packages are inside
   src/library

 Kasper

 On Dec 15, 2009, at 10:57 AM, Ben Bolker wrote:

  I followed the suggestions at
 http://developer.r-project.org/SVNtips.html to check out an anonymous
 copy of the development branch of R, but so far I have been unable to
 figure out an analogous way to track the development branch of the
 recommended packages. (I'm assuming they actually live somewhere on the
 same SVN server, which might not be true ...) Any ideas (including
 pointing out the obvious, or the obvious-in-hindsight)?

  thanks
    Ben Bolker


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --
 Ben Bolker
 Associate professor, Biology Dep't, Univ. of Florida
 bol...@ufl.edu / people.biology.ufl.edu/bolker
 GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel





-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] force identity

2009-09-03 Thread Roger Peng
Given that the two functions (although identical) serve different
purposes and are in a sense unrelated, it's not clear that a See Also
is needed (and in fact might be confusing).

-roger

On Thu, Sep 3, 2009 at 9:17 AM, Hadley Wickhamhad...@rice.edu wrote:
 Similarly, force  identity are identical (although with different
 semantic connotations).  Would be nice to have a see also from each.

 Hadley

 --
 http://had.co.nz/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speed up code, profiling, optimization, lapply vs. loops

2009-07-07 Thread Roger Peng
That's a good point---I've found that skipping a lot of the setup that
'glm' does and calling 'glm.fit' directly can save a lot of time.

-roger

On Tue, Jul 7, 2009 at 12:53 AM, Kasper Daniel
Hansenkhan...@stat.berkeley.edu wrote:
 Aside from the advice from other people, you seem to be doing many glm
 calls. A big part of a call to a model function involves setting up the
 design matrix, check for missing values etc. If I understand you description
 correctly you may only need to do this once. This will require some poking
 around in glm, but might save you a lot of time.

 Kasper

 On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:

 High everybody,

 currently I'm writinig a package that, for a given family of variance
 functions depending on a parameter theta, say, computes the extended quasi
 likelihood (eql) function for different values of theta.

 The computation involves a couple of calls of the 'glm' routine. What I'm
 doing now is to call 'lapply' for a list of theta values and a function,
 that constructs a family object for the particular choice of theta, computes
 the glm and uses the results to get the eql. Not surprisingly the function
 is not very fast. Depending on the size of the parameter space under
 consideration it takes a couple of minutes until the function finishes.
 Testing ~1000 Parameters takes about 5 minutes on my machine.

 I know that loops in R are slow more often than not. Thus, I thought using
 'lapply' is a better way. But anyways, it is just another way of a loop.
 Besides, it involves some overhead for the function call and hence i'm not
 sure wheter using 'lapply' is really the better choice.

 What I like to know is to figure out, where the bottleneck lies.
 Vectorization would help, but since I don't think that there is vectorized
 'glm' function, which is able to handle a vector of family objects. I'm not
 aware if there is any choice aside from using a loop.

 So my questions:
 - how can I figure out where the bottleneck lies?
 - is 'lapply' always superior to a loop in terms of execution time?
 - are there any 'evil' commands that should be avoided in a loop, for they
 slow down the computation?
 - are there any good books, tutorials about how to profile R code
 efficiently?

 TIA 4 ur help,

 Thorn

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] what is the preferred method to create a package local variable?

2009-04-02 Thread Roger Peng
The use of an environment gets around the fact that package namespaces
are locked, so that values can't be changed once the package is
loaded. However, elements of environments can be changed.

-roger

On Thu, Apr 2, 2009 at 9:41 AM, Whit Armstrong armstrong.w...@gmail.com wrote:
 Thanks to everyone for the suggestions.

 The package local environment (per Roger Peng) works well.
 .localstuff - new.env()
 .localstuff$bbg.db.conn - dbConnect(...)

 However, there is one thing that I'm confused about.

 Why must the .localstuff variable be an environment?

 I've tried the following, but the variable conn stays null during the
 whole R session.  Despite the database connection succeeding (I can
 see the constructor printing to the console):

 conn - NULL

 .onAttach - function(libname, pkgname) {
    conn - dbConnect(dbDriver(PostgreSQL), user=...)
 }

 .onUnload - function(libpath) {
    dbDisconnect(conn)
 }

 output from R session:

 [warmstr...@linuxsvr R.packages]$ R
 library(KLS)
 Loading required package: fts
 Loading required package: RCommodity
 Loading required package: unifiedDBI
 Loading required package: RFincad
 Loading required package: RLIM
 Loading required package: RBoostDateTime
 PostgresConnection::PostgresConnection()
 KLS:::conn
 NULL
 x - get.bbg(EURUSD Curncy)
 Error in get.bbg(EURUSD Curncy) : Database connection not initialized
 q()
 PostgresConnection::~PostgresConnection()
 [warmstr...@linuxsvr R.packages]$


 Thanks,
 Whit






 On Tue, Mar 31, 2009 at 3:51 PM, Philippe Grosjean
 phgrosj...@sciviews.org wrote:
 The best way is to have those variable hidden in the package's workspace, as
 explained by Roger Peng.

 However, if you like to use a mechanism managing an environment specifically
 dedicated to temporary variables very easily, look at assignTemp() and
 getTemp() from svMisc package. The advantage is an easier sharing of such
 variables between different packages (plus the bonus of easy management of
 default values, overwriting or not of current content if the variable
 already exists, ...). The temporary environment (TempEnv) is always located
 in the forelast position just before 'base'.

 In any cases, avoid using .GlobalEnv and the ugly - for that purpose.
 Best,

 Philippe Grosjean


 Roger Peng wrote:

 I usually use environments for this. So, in one of the R files for the
 package, just do

 .localstuff - new.env()

 Then, in functions you can do things like

 .localstuff$bbg.db.conn - dbConnect(...)

 -roger

 On Tue, Mar 31, 2009 at 11:45 AM, Whit Armstrong
 armstrong.w...@gmail.com wrote:

 for the moment, I'm using:

 .onAttach - function(libname, pkgname) {
   .bbg.db.conn - dbConnect(dbDriver(PostgreSQL), user=blah,blah)
 }

 .onUnload - function(libpath) {
   dbDisconnect(.bbg.db.conn)
 }


 which results in a hidden global variable in the global environment.

 I would prefer to make the assignment only in the package namespace.
 I've looked at assignInNamespace, but I can't seem to make it work.

 Is there a preferred method for doing this?

 When I try adding an assignment directly in the source file, I get the
 cannot change value of locked binding error.

 What am I missing?

 Thanks,
 Whit

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel









-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] what is the preferred method to create a package local variable?

2009-03-31 Thread Roger Peng
I usually use environments for this. So, in one of the R files for the
package, just do

.localstuff - new.env()

Then, in functions you can do things like

.localstuff$bbg.db.conn - dbConnect(...)

-roger

On Tue, Mar 31, 2009 at 11:45 AM, Whit Armstrong
armstrong.w...@gmail.com wrote:
 for the moment, I'm using:

 .onAttach - function(libname, pkgname) {
    .bbg.db.conn - dbConnect(dbDriver(PostgreSQL), user=blah,blah)
 }

 .onUnload - function(libpath) {
    dbDisconnect(.bbg.db.conn)
 }


 which results in a hidden global variable in the global environment.

 I would prefer to make the assignment only in the package namespace.
 I've looked at assignInNamespace, but I can't seem to make it work.

 Is there a preferred method for doing this?

 When I try adding an assignment directly in the source file, I get the
 cannot change value of locked binding error.

 What am I missing?

 Thanks,
 Whit

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fontconfig warning with X11() on MAC OS X 10.4

2009-03-04 Thread Roger Peng
I realize this doesn't directly answer your question, but seeing as
you're on a Mac, have you tried using the quartz device?

-roger

On Wed, Feb 25, 2009 at 10:27 AM, MerliseClyde cl...@stat.duke.edu wrote:

 I posted previously about problems with X11() on my MAC using R 2.8.1 .
 After installing the securilty update for Tiger this morning, X11() now
 works from an xterm :-)

 However, I receive the following warnings with any plotting command using
 the default X11 settings.
 Fontconfig warning: no cachedir elements found. Check configuration.
 Fontconfig warning: adding
 cachedir/Library/Frameworks/R.framework/Resources/fontconfig/cache/cachedir
 Fontconfig warning: adding cachedir~/.fontconfig/cachedir

 Everything works fine with the Xlib option:

 X11(type=Xlib)
 plot(1:10)  # no problems!
 X11(type=cairo)
 plot(1:10)
 Fontconfig warning: no cachedir elements found. Check configuration.
 Fontconfig warning: adding
 cachedir/Library/Frameworks/R.framework/Resources/fontconfig/cache/cachedir
 Fontconfig warning: adding cachedir~/.fontconfig/cachedir

 subsequent commands with X11/cairo plot with no errors.

 If I quit R, and start a new session I continue to receive the Fontconfig
 warning.

 Any suggestions on what is wrong with my font configuration?
 (mainly annoying :-)

 Thanks!
 Merlise
 Version:
  platform = i386-apple-darwin8.11.1
  arch = i386
  os = darwin8.11.1
  system = i386, darwin8.11.1
  status =
  major = 2
  minor = 8.1
  year = 2008
  month = 12
  day = 22
  svn rev = 47281
  language = R
  version.string = R version 2.8.1 (2008-12-22)


 --
 View this message in context: 
 http://www.nabble.com/Fontconfig-warning-with-X11%28%29-on-MAC-OS-X-10.4-tp22205067p22205067.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New version of X11, png and jpeg

2008-02-25 Thread Roger Peng
(Apologies, I meant to 'Reply to all' the first time but forgot).

I built r44608 of R-devel with (I think) cairo support.   At least,
that's what the configure script told me.  In addition,
'capabilities(cairo)' is TRUE.  Calling X11(type = Cairo) gives me
the error:

Error in X11() : X11 module cannot be loaded
In addition: Warning message:
In X11() :
 unable to load shared library
'/home/rpeng/install/R-devel/lib64/R/modules//R_X11.so':
 /home/rpeng/install/R-devel/lib64/R/modules//R_X11.so: undefined
symbol: cairo_image_surface_get_data


I figured I must be missing a library somewhere, but I'm not sure how
to track down which one.  Any thoughts here?

I'm on a FC5 system with:

cairo-devel-1.0.4-1
cairo-1.0.4-1
cairo-1.0.4-1

and

pango-1.12.4-4
pango-devel-1.12.4-4
pango-1.12.4-4

Also, I have

[EMAIL PROTECTED] R-source]$ pkg-config --modversion pango
1.12.4
[EMAIL PROTECTED] R-source]$ pkg-config --modversion cairo
1.0.4

-roger

On Mon, Feb 25, 2008 at 11:56 AM, Prof Brian Ripley
[EMAIL PROTECTED] wrote:
 R-devel has new versions of the X11(), png() and jpeg() devices on
  Unix-alikes.  The intention is that these are used identically to the
  previous versions (which remain available) but will produce higher-quality
  output with more features.

  Pros:

  Antialiasing of text and lines (can be turned off) but no blurring of
  fills.

  Buffering of the X11 display and fast repainting from a backing image.
  (The intention is to emulate the timer-based buffering of the windows()
  device in due course, but not for 2.7.0.)

  Ability to use translucent colours, including backgrounds, and produce
  partially transparent PNG files.

  Scalable text, including to sizes like 4.5 pt. This allows more accurate
  sizing on non-standard screen sizes (e.g. my home machine has a 90dpi
  1650x1024 display whereas standard X11 fonts are set up for 75 or 100
  dpi).

  Full support for UTF-8, so on systems with suitable fonts you can plot in
  many languages on a single figure (and this will work even in non-UTF-8
  locales).  The output should be locale-independent (unlike the current
  devices where even English text is rendered slightly differently in
  Latin-1 and UTF-8 locales).

  A utility function savePlot() to make a PNG/JPEG/TIFF copy of the current
  plot.

  The new png() and jpeg() devices do not require an X server to be running.

  Cons:

  Needs more software installed - cairo, pango and support packages (which
  on all the systems we have looked at are pulled in by the packages checked
  for).  You will see something like

Additional capabilities:   PNG, JPEG, iconv, MBCS, NLS, cairo
^
  if configure finds the software we are looking for.

  Slower under some circumstances (although on the test systems much faster
  than packages Cairo and cairoDevice).  This will be particularly true for
  X11() with a slow connection between the machine running R and the X
  server.

  The additional software might not work correctly.


  The new versions are not currently the default, but can be made so by
  setting X11.options(type=Cairo), e.g. as a load hook for package
  grDevices.  I am using

  setHook(packageEvent(grDevices, onLoad),
  function(...) {
  grDevices::ps.options(horizontal=FALSE)
  if(getRversion() = '2.7.0') grDevices::X11.options(type=Cairo)
  })


  Please try these out and let us know how you get on.  As a check, try the
  TestChars() examples in ?points - on one Solaris 10 system a few of the
  symbol font characters were incorrect.  It worked on an FC5 system with

  auk% pkg-config --modversion pango
  1.12.4
  auk% pkg-config --modversion cairo
  1.0.4

  so the versions required are not all recent.

  Although these devices would in principle work on Mac OS X, neither cairo
  nor pango is readily available.  We are working on other versions for
  Mac OS (X11 based on cairo/freetype, png/jpeg based on Quartz).

  There are also new svg() and tiff() devices.

  --
  Brian D. Ripley,  [EMAIL PROTECTED]
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595

  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: Digest package - make digest generic?

2007-10-16 Thread Roger Peng
Sorry, I forgot the 'reply-all'.

-roger

-- Forwarded message --
From: Roger Peng [EMAIL PROTECTED]
Date: Oct 16, 2007 8:24 AM
Subject: Re: [Rd] Digest package - make digest generic?
To: Henrik Bengtsson [EMAIL PROTECTED]


Would it be possible to instead create a function with a name like
'digest0' which is the current function, and then create a generic
function with the name 'digest'?  In this case 'digest0' always
returns the digest of the raw object.

My one concern is that my current expectation is that 'digest' takes
an object and hashes the entire object, regardless of class.  So if
two objects are different (even in their internal representation),
they should return different digests.  I would be a little worried if
'digest' had a different (and perhaps unpredictable) behavior
depending on the class of the object where two objects that were in
fact different could lead to the same digest.

I can see why one might want class-specific behavior, but what a class
author wants from 'digest' may not be different from what other users
of 'digest' on that object want.

A simple approach might be

digest0 - function(x, ...) digest(unclass(x), ...)

although this doesn't work for S4 objects I don't think.

-roger

On 10/15/07, Henrik Bengtsson [EMAIL PROTECTED] wrote:
 On 10/15/07, hadley wickham [EMAIL PROTECTED] wrote:
  On 10/15/07, Henrik Bengtsson [EMAIL PROTECTED] wrote:
   [As agreed, CC:ing r-devel since others might be interested in this as 
   well.]
  
   Hi.
  
   On 10/15/07, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:
   
Hi Hadley,
   
On 15 October 2007 at 09:51, hadley wickham wrote:
| Would you consider making digest a generic function?  That way I could
| (e.g.) make a generic method for ggplot objects which didn't depend
| (so much) on their internal representation.
   
Well, generally speaking, I always take patches :)
  
   I see know problems in doing this.  The patch would be:
  
   digest - function(...) UseMethod(digest);
   digest.default - current digest function.
  
   I think that should do, and I don't think it has any surprising side
   effects so it could be added in the next release.  Dirk, can you do
   that?
  
   
I have to admit that I am fairly weak on these aspects of the S 
language.
One question is:  how to the current users of digest (i.e. Henrik's and
Seth's caching mechanism, for example) use it on arbitrary objects 
_without_
it being generic?
  
   I basically put everything I want into a list() and pass that to
   digest::digest().
 
  Yes, that's what I'm doing too.
 
   
| The reason I ask is that I'm using digest as a way of coming up with a
| unique file name for each example graphic.  I want to be able to
| easily compare the appearance of examples between versions, but
| currently the digest depends on internal details, so it's hard to
| match up graphics between versions.
  
   See loadCache(key) and saveCache(object, key) in R.cache, which
   basically loads and saves results from and to a file cache based on a
   key object - no need to specify paths or filenames.  You can specify
   paths etc if you want to, but by default it is just transparent.
 
  The problem is I need to refer to the image from the documentation, so
  I do need to know it's path.  I also want to be able to look at the
  image, so if the digests are different I can see what the difference
  is (I'm planning to automate this with the imagemagick compare command
  line tool).

 See ?findCache.  That will give you the pathname given a key.  It is
 on purpose that I do not list this function in the HTML help index - I
 want to keep the public API to a minimum.

 /Henrik

 
   However, I think Hadley is referring to a different problem.
   Basically, he got an object containing a lot of fields, but for his
   purposes it is only a subset of the fields that he wants to use to
   generate a consistent the hashcode.  If he pass any other field, that
 
  Yes, exactly.
 
   will break the consistency.  In that case, the designer of the class
   has to identify the fields that makes uniquely identify the state of
   the object.  I do that for many of my object and pass them down in a
   list() structure to digest().  I agree, by making digest() generic,
   one can make the code nicer.  [If there is a need to dispatch on
   multiple arguments, we have to go for S4, but otherwise S3 gives the
   minimal modification].
  
   Side comment: This basically comes down to how for instance Java deals
   with hashCode() and equals() etc.  By default the object as is used to
   generate the hashcode (and can be used by equals() compare objects).
 
  Yes, that's the model I was thinking of too.
 
  Hadley
 
  --
  http://had.co.nz/
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 __
 R

Re: [Rd] Fwd: Digest package - make digest generic?

2007-10-16 Thread Roger Peng
Calling 'digest.default' directly would not be possible if the method
were hidden in a namespace (without resorting to some maneuvering).
To force the default method I think you'd need to 'unclass' the
object.

I'm not against making 'digest' generic, but I'd prefer it if there
were a guaranteed way to compute the digest of the raw/full object
without having to wonder about class-specific behavior.  Something
like:

digest0 - [[the current 'digest' function]]
digest - function(object, ...) UseMethod(digest)
digest.default - function(object, ...) digest0(object, ...)

As I think we've seen in this discussion already, what is surprising
to one person may not be surprising to another (and vice versa) so
having something like 'digest0' which is consistent across all R
objects would be useful.

-roger

On 10/16/07, hadley wickham [EMAIL PROTECTED] wrote:
 On 10/16/07, Roger Peng [EMAIL PROTECTED] wrote:
  My understanding was that Hadley wanted 'digest' to operate on part of
  an object rather than on the entire, which might contain uninteresting
  or irrelevant details.  For example, if we had
 
  a - structure(list(x = 1, y = 2), class = foo)
  b - structure(list(x = 2342342, y = 2), class = foo)
 
  digest.foo - function(object, ...) digest(object$y)

 Yes, that's exactly what I want, except in my case my objects contain
 about 20 or 30 bits of information that are irrelevant (I'm my case
 documentation about the class and other functions), so it would be
 surprising if p1 and p2 which produced identical plots gave different
 digests.

 If you want the default behaviour, you could always call
 digest.default to digest the entire object.

 Hadley



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Lack of final newline in write.dcf changes append usage

2007-09-26 Thread Roger Peng
The change in r42731 eliminating the final blank line when writing DCF
files changes the way 'append' can be used in 'write.dcf' and I was
wondering if this is intentional.  Basically, I want to write a data
frame to DCF format one row at a time, so I make use of repeated calls
to 'write.dcf(append = TRUE)'. However, in R 2.6.0RC the resulting DCF
file is not formatted properly because there are no blank lines
between the entries.

It seems the only way to use 'write.dcf(append = TRUE)' is to
explicitly write a blank line to the file before writing the entry.
Perhaps 'write.dcf' could be changed to write a blank line in the
beginning if append = TRUE?  I've attached a patch for this
possibility.

-roger
-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] saving objects with embedded environments

2007-06-29 Thread Roger Peng
I believe this is intentional.  See ?serialize.  When lm() is called
in a function, the environment is saved in case the resulting fitted
model object needs to be updated, for example, with update().

if you don't want the linear model object, you might try just saving
the relevant objects to a separate list rather than try to delete
everything that is irrelevant from the 'lm' object.

-roger

On 6/28/07, McGehee, Robert [EMAIL PROTECTED] wrote:
 Hello,
 I have been running linear regressions on large data sets. As 'lm' saves
 a great deal of extraneous (for me) data including the residuals,
 fitted.values, model frame, etc., I generally set these to NULL within
 the object before saving off the model to a file.

 In the below example, however, I have found that depending on whether or
 not I run 'lm' within another function or not, the entire function
 environment is saved off with the file. So, even while object.size and
 all.equal report that both 'lm's are equal and of small size, one saves
 as a 24MB file and the other as 646 bytes. These seems to be because in
 the first example the function environment is saved in attr(x1$terms,
 .Environment) and takes up all 24MB of space.

 Anyway, I think this is a bug, or if nothing else very undesirable (that
 an object reported to be 0.5kb takes up 24MB). There also seems to be
 some inconsistency on how environments are saved depending on if it is
 the global environment or not, though I'm not familiar enough with
 environments to know if this was intentional. Comments are appreciated.

 Thanks,
 Robert

 ##
 testEq - function(B) {
 x - lm(y ~ x1+x2+x3, data=B, model=FALSE)
 x$residuals - x$effects - x$fitted.values - x$qr$qr - NULL
 x
 }

 N - 90
 B - data.frame(y=rnorm(N)+1:N, x1=rnorm(N)+1:N, x2=rnorm(N)+1:N,
 x3=rnorm(N)+1:N)
 x1 - testEq(B)
 x2 - lm(y ~ x1+x2+x3, data=B, model=FALSE)
 x2$residuals - x2$effects - x2$fitted.values - x2$qr$qr - NULL

 all.equal(x1, x2) ## TRUE
 object.size(x1)  ## 5112
 object.size(x2)  ## 5112
 save(x1, file=x1.RData)
 save(x2, file=x2.RData)
 file.info(x1.RData)$size ## 24063852 bytes
 file.info(x2.RData)$size ## 646 bytes

  R.version
_
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status
 major  2
 minor  5.0
 year   2007
 month  04
 day23
 svn rev41293
 language   R
 version.string R version 2.5.0 (2007-04-23)


 Robert McGehee, CFA
 Quantitative Analyst
 Geode Capital Management, LLC
 One Post Office Square, 28th Floor | Boston, MA | 02109
 Tel: 617/392-8396Fax:617/476-6389
 mailto:[EMAIL PROTECTED]



 This e-mail, and any attachments hereto, are intended for us...{{dropped}}

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] HTML vignette browser

2007-06-05 Thread Roger Peng
This is very nice work and I like it a lot.  I think it would be quite
useful for teaching.  My only thought was that when I saw 'source' I
was expecting R code, but instead got LaTeX/Noweb.  It does seem that
'source' is  a little ambiguous here.

-roger

On 6/1/07, Deepayan Sarkar [EMAIL PROTECTED] wrote:
 Hi,

 this is tangentially related to the recent discussion on vignettes.
 vignette() currently produces a listing of available vignettes, but
 these are not clickable. Since R has a browseURL() function, it seems
 natural to have a version that produces HTML with clickable links.
 Here's an attempt at that:

 source(http://dsarkar.fhcrc.org/R/vignette-browser.R;)
 browseVignettes()
 browseVignettes(package = grid)

 etc. Perhaps some variant of this could be added to R.

 Comments welcome.

 -Deepayan

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible changes to connections

2007-05-30 Thread Roger Peng
In a previous version of the 'filehash' package, the 'filehashDB1'
class had a slot for an open connection corresponding to the database
file.  I quickly learned that if the R object ever got removed or
reassigned I was left hanging with an open file connection.

If I remember correctly, I resorted to creating an environment in the
R object which stored the connection number for the the database file
connection.  Then I registered a finalizer for that environment which
grabbed the connection via 'getConnection' and then closed the
connection.

I eventually abandoned this approach since it was error-prone and I
often ran into strange difficult-to-reproduce situations where the R
object representing the database had been removed but the file
connection was still open because garbage collection had not yet
occurred.  I would have very much preferred a system where the file
connection was automatically closed once any references to it were
gone.

-roger

On 5/30/07, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 When I originally implemented connections in R 1.2.0, I followed the model
 in the 'Green Book' closely.  There were a number of features that forced
 a particular implementation, and one was getConnection() that allows one
 to recreate a connection object from a number.

 I am wondering if anyone makes use of this, and if so for what?

 It would seem closer to the R philosophy to have connection objects that
 get garbage collected when no R object refers to them.  This would allow
 for example

 readLines(gzfile(foo.gz))

 which currently leaks a connection slot as the connection cannot be closed
 (except via closeAllConnections() or getConnection()) without an R object
 being returned.

 The correct usage currently is

 readLines(con - gzfile(foo.gz)); close(con)

 which is a little awkward but more importantly seems little understood.

 Another issue is that the current connection objects can be saved and
 restored but refer to a global table that is session-specific so they lose
 their meaning (and perhaps gain an unintended one).

 What I suspect is that very few users are aware of the Green Book
 description and so we have freedom to make some substantial changes
 to the implementation.  Both issues suggest that connection objects should
 be based on external pointers (which did not exist way back in 1.2.0).

 [I know there is a call to getConnection in package gtools, but the return
 value is unused!]

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Display list redraw incomplete

2007-05-10 Thread Roger Peng
Just to follow up with a few more details on this problem (which
persists on my PowerBook Mac but no where else).  I ran a bisection
search on the sources and found that SVN revision 40634 introduces
this problem.  This revision introduced quite a few changes to
'src/main/graphics.c' which I am just beginning to go through.

GIven that I don't see this problem on my Linux box and only on the
Mac, I'm tempted to think this is a compiler issue or perhaps some
weird problem with my setup.  I noticed that I don't have the exact
same gfortran compiler as recommended on
http://r.research.att.com/tools/ but I don't see how that would cause
a problem here.
Otherwise, I am using gcc 4.0.1 which comes with Xcode.

I will keep digging but any pointers would be appreciated.

Thanks,
-roger

On 5/4/07, Roger Peng [EMAIL PROTECTED] wrote:
 Since compiling R 2.5.0 from source on my Mac (PowerBook) I've noticed
 some strange behavior when plotting.  I'm not sure if it's a problem
 with my setup/compilation because I feel like a problem as basic as
 this one would have been reported already.  I'm running R with X11 and
 R was built with gcc 4.0.1.

 Basically, I run

  plot(0, 0)
  dev.off()
 X11
   4
 Warning message:
 Display list redraw incomplete

 I don't think I've ever seen that warning before.  In addition to the
 warning, two extra x11 devices are launched.  Same thing happens in
 R-devel (r41438).  Any thoughts on what might be going on?

 Thanks,
 -roger
 --
 Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] directory/filename completion

2007-04-13 Thread Roger Peng
I've noticed something recently in R-beta that has changed since R 2.4.1 
and I'm not sure if it's a readline problem or an R problem.  I am on a 
Linux FC5 system and in R 2.4.1 I could do

load(my-directory/

and then hit TAB and it would list all of the files in 'my-directory', 
after which I could start typing the beginning of the file, hit TAB 
again, and it would complete the file name along with adding the closing 
double quote.

In R-beta, when I try the same thing, nothing happens (i.e. no directory 
listing and no file name completion).  Specifically, if I type

load(my

and then it TAB, it will complete the 'my-directory' but it will not 
look further into the files inside of 'my-directory'.

Also, I've noticed that if I rename 'my-directory' to 'mydirectory' (so 
no dash), everything works fine (just like in R 2.4.1).

Has anyone else noticed this?

-roger
-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] directory/filename completion

2007-04-13 Thread Roger Peng
Okay, I wasn't sure if it was definitely related to rcompgen or was a 
result of something else.  At least I can turn it off then.

Thanks,
-roger

Prof Brian Ripley wrote:
 On Fri, 13 Apr 2007, hadley wickham wrote:
 
 On 4/13/07, Roger Peng [EMAIL PROTECTED] wrote:

 I've noticed something recently in R-beta that has changed since R 2.4.1
 and I'm not sure if it's a readline problem or an R problem.  I am on a
 Linux FC5 system and in R 2.4.1 I could do

 load(my-directory/

 and then hit TAB and it would list all of the files in 'my-directory',
 after which I could start typing the beginning of the file, hit TAB
 again, and it would complete the file name along with adding the closing
 double quote.

 In R-beta, when I try the same thing, nothing happens (i.e. no directory
 listing and no file name completion).  Specifically, if I type

 load(my

 and then it TAB, it will complete the 'my-directory' but it will not
 look further into the files inside of 'my-directory'.

 Also, I've noticed that if I rename 'my-directory' to 'mydirectory' (so
 no dash), everything works fine (just like in R 2.4.1).

 Has anyone else noticed this?


 I've noticed that if I have the rcompgen library loaded - perhaps
 you've installed that recently?
 
 
 Loading rcompgen is the default uner 2.5.0 beta, but you can turn it 
 off. This is in the USER-VISIBLE CHANGES in NEWS: just how prominent 
 does it need to be?
 
 It is a feature of the rcompletion interface to rcompgen, now part of 
 base R.  From the source code comment
 
These break line into tokens.  Unfortunately, this also breaks
file names, so a path with a - in it will not be completed.
 
 We decided to make this the default in 2.5.0 to get user feedback: if it 
 was not switched on few people would use it.
 

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug in Stangle(split=TRUE)

2007-04-07 Thread Roger Peng
[I originally emailed this to Friedrich Leisch but got no response and I 
just wanted to make sure it made it in before release.]


While working with Stangle(), I noticed a problem when using 'split = 
TRUE'.  Particularly, when there are two chunks where one chunk's name 
is a prefix of another chunk's name, then the two chunks will be written 
to a single file rather than two separate files (if the chunk who's name 
is a prefix comes after the other chunk).  Running 'Stangle(split=TRUE)' 
with the attached 'test1.Rnw' file should reproduce the problem.


I think it boils down to a partial matching problem in 'RtangleRuncode'. 
 I've attached a patch against R-alpha (r41020) which I think fixes 
this problem but I'm not sure it's necessarily the best approach (there 
are other instances of this construction in the code).


-roger
--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/
Index: src/library/utils/R/Sweave.R
===
--- src/library/utils/R/Sweave.R(revision 41020)
+++ src/library/utils/R/Sweave.R(working copy)
@@ -854,7 +854,7 @@
 outfile - paste(chunkprefix, options$engine, sep=.)
 if(!object$quiet)
 cat(options$chunknr, :, outfile,\n)
-chunkout - object$chunkout[[chunkprefix]]
+chunkout - object$chunkout[chunkprefix][[1]]
 if(is.null(chunkout)){
 chunkout - file(outfile, w)
 if(!is.null(options$label))
abcde,echo=true=
y - 2
@

abcd,echo=true=
x - 1
@

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Aggregate?

2006-05-03 Thread Roger Peng
This was fixed fairly recently in 2.3.0 patched.  It works in SVN 
revision 37953.

-roger

Gabor Grothendieck wrote:
 I am moving this from r-help to r-devel.
 
 The poster pointed out to me that my solution works in 2.2.1 but not
 in 2.3.0 patched.  Does anyone know what the problem is?
 
 
# 2.3.0 patched -- gives error
DF - data.frame(A = gl(2,2), B = gl(2,2), C = 1:4)  # test data
out.by - by(DF, DF$A, function(x) replace(x[1,], C, sum(x$C)))
do.call(rbind, out.by)
 
 Error in data.frame(A = c(1, 2), B = c(1, 2), C = c(3, 7),  :
 row names contain missing values
 
R.version.string # Windows XP
 
 [1] Version 2.3.0 Patched (2006-04-28 r37936)
 
 
# 2.3.1 -- works ok
DF - data.frame(A = gl(2,2), B = gl(2,2), C = 1:4)  # test data
out.by - by(DF, DF$A, function(x) replace(x[1,], C, sum(x$C)))
do.call(rbind, out.by)
 
   A B C
 1 1 1 3
 2 2 2 7
 
R.version.string # Windows XP
 
 [1] R version 2.2.1, 2005-12-20
 
 On 5/3/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 
Suppose we want to sum C over levels of A and that B is constant
within levels of A.  Then:

DF - data.frame(A = gl(2,2), B = gl(2,2), C = 1:4)  # test data
do.call(rbind, by(DF, DF$A, function(x) replace(x[1,], C, sum(x$C



On 5/3/06, Guenther, Cameron [EMAIL PROTECTED] wrote:

Hello,

I have a data set with a grouping variable (TRIPID) and  several other
variables.  TRIPID is repeated in some areas and I would like to use a
function like aggregate to sum the variable UNITS according to TRIPID.
However I would also like to retain the other variables as they are in
the data set with the new summed TRIPID.

So what I have is something like this:

YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
NUMSETS TRIPST  TRIPID
19921   26  1 SP0073928   8
25 4   NA  100 NA  NA
NA  161 1   NA  NA
NA  02163399054 19921   26
1 SP0073928   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399054
19921   26  2 SP0004228   8
25 4   NA  100 NA  NA
NA  161 1   NA  NA
NA  02163399054  19921   26
2 SP0004228   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399054
19921   25  NA  SP0052652   8
25 4   NA  100 NA  NA
NA  85  1   NA  NA
NA  02163399057   19921   26
NA  SP0037940   8 25 4   NA
100 NA  NA  NA  70
1   NA  NA  NA  02163399058
19921   27  NA  SP0072357   8
25 4   NA  100 NA  NA
NA  15  1   NA  NA
NA  02163399059
19921   27  NA  SP0072357   8
25 4   NA  100 NA  NA
NA  20  1   NA  NA
NA  02163399059 19921   27
NA  SP0026324   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399060
19921   28  1 SP0072357   8
25 4   NA  100 NA  NA
NA  2001   NA  NA
NA  02163399062

And what I want is this:

YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
NUMSETS TRIPST  TRIPID
19921   26  1 SP0073928   8
25 4   NA  100 NA  NA
NA  3381   NA  NA
NA  02163399054  19921   25
NA  SP0052652   8 25 4   NA
100 NA  NA  NA  85
1   NA  NA  NA  02163399057
19921   26  NA  SP0037940   8
25 4   NA  100 NA  NA
NA  70  1   NA  NA
NA  02163399058
19921   27  NA  SP0072357   8
25 4   NA  100 NA  

Re: [Rd] Sweaving in png

2006-03-26 Thread Roger Peng
For what it's worth, I would find such a adaptation useful.

-roger

Thibaut Jombart wrote:
 Hello list, 
 
 despite I already posted a mail on this topic on R help, I guess this place 
 may be more appropriate.
 I'll make it shorter this time. Sorry for posting twice.
 
 I found that using pixmap pictures in a Sweave document was sometimes almost 
 impossible, due to the huge size of the pdf pictures produced.
 
 The first solution I found was to save pictures in png, when too heavy in 
 pdf. Here is an example:
 
 ### in a .rnw document ###
 
 % here is an invisible chunck to create a picture
 fig =FALSE,echo=FALSE=
 png(filename='figs/myPic.png')
 @
 
 % next, R code to generate picture
 fig=FALSE,echo=TRUE=
 ...[code to produce the figure]
 @
 
 % then, close the device. Hidden, again
 fig =FALSE,echo=FALSE=
 dev.off()
 @
 
 % and then, include it as a picture
 \includegraphics{figs/myPic.png}
 
 ### end of the example ###
 
 I
 This is quite long, and I would have prefered to need simply: 
 
 fig=TRUE,pdf=FALSE,png=TRUE
 ...[code to produce the figure]
 @
 
 So I tried to adapte the Sweave driver 'RweaveLatex' in order to do so. It 
 worked.
 
 The not-so-new driver is only a slight modification of RweaveLatex, and can 
 generate ps, pdf or png figures; it was tested on Ubuntu64, Debian, 
 several Windows systems and macOS X partforms with no detected problem.
 
 Does someone find this useful, and/or were there better solutions I missed?
 
 Regards,
 
 Thibaut Jombart .
 

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] a generic 'attach'?

2006-02-05 Thread Roger Peng
I think having a generic attach might be useful in the end.  But I agree 
that some more thought needs to go into how such a generic would behave. 
  I've always avoided using `attach()' precisely because I didn't fully 
understand the semantics.

One related possibility would be to create a method for `with()' (which 
is already generic) which would work on (in my case) filehash 
databases.  It still wouldn't be quite as nice as `attach()' for 
interactive work but it could serve some purposes.

-roger

[EMAIL PROTECTED] wrote:
 What have I started?  I had nothing anywhere near as radical as that
 in mind, Peter...
 
 One argument against making 'attach' generic might be that such a
 move would slow it down a bit, but I can't really see why speed would
 be much of an issue with 'attach'.
 
 I've noticed that David Brahm's package, g.data, for example really
 has a method for attach as part of it, (well almost), but he has to
 calls it g.data.attach.
 
 Another package that has an obvious application for a method for
 attach is the filehash package of Roger Peng.
 
 And as it happens I have another, but for now I call it 'Attach',
 which is pretty unsatisfying from an aesthetic point of view.
 
 I think I'll just sew the seed for now.  The thing about generic
 functions is that if they exist people sometimes find quite
 innovative uses for them, and if they come at minimal cost, and break
 no existing code, I suggest we thik about implementing them.
 
 (Notice I have had no need to use a 'compatibility with another
 system' argument at any stage...)
 
 ---
 
 Another, even more minor issue I've wondered about is giving rm() the
 return value the object, or list of objects, that are removed.  Thus
 
 newName - rm(object)
 
 would become essentially a renaming of an object in memory.
 
 For some reason I seem to recall that this was indeed a feature of a
 very early version of the S language, but dropped out (I think) when
 S3 was introduced.  Have I got that completely wrong?  (I seem to
 recall a lot of code had to be scrapped at that stage, including
 something rather reminiscent of the R with(), but I digress...)
 
 Bill.
 
 
 -Original Message- From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Peter Dalgaard Sent: Sunday,
 5 February 2006 8:35 PM To: Venables, Bill (CMIS, Cleveland) Cc:
 [EMAIL PROTECTED] Subject: Re: [Rd] a generic 'attach'?
 
 
 [EMAIL PROTECTED] writes:
 
 
 Is there any reason why 'attach' is not generic in R?
 
 I notice that it is in another system, for example,
 
 
 I wonder which one? ;-)
 
 
 and I can see some applications if it were so in R.
 
 
 I suppose there is no particular reason, except that it was probably 
 good enough for now at some point in time.
 
 Apropos attach(), and apologies in advance for the lengthy rant that 
 follows:
 
 There are a couple of other annoyances with the attach/detach 
 mechanism that could do with a review. In particular, detach() is not
  behaving according to documentation (return value is really NULL). I
  feel that sensible semantics for editing an attached database and 
 storing it back would be useful. The current semantics tend to get 
 people in trouble, and some of the stuff you need to explain really 
 feels quite odd:
 
 attach(airquality) airquality$Month - factor(airquality$Month) #
 oops, that's not going to work. You need: detach(airquality) 
 attach(airquality)
 
 (notice in particular that this tends to keep two copies of the data 
 in memory at a time).
 
 You can actually modify a database after attaching it (I'm 
 deliberately not saying data frame, because it will not be one at 
 that stage), but it leads to contorsions like
 
 assign(Month, factor(Month), airquality)
 
 or
 
 with(pos.to.env(2), Month - factor(Month))
 
 (or even with(pos.to.env(match(airquality,search())),))
 
 I've been thinking on and off about these matters. It is a bit tricky
  because we'd rather not break codes (and books!) all over the place,
  but suppose we
 
 (a) allowed with() to have its first argument interpreted like the
 3rd argument in assign()
 
 (b) made detach() do what it claims: return the (possibly modified) 
 database. This requires that more metadata are kept around than 
 currently. Also, the semantics of
 
 attach(airquality) assign(foo, function(bar)baz, airquality) aq
 - detach(airquality)
 
 would need to be sorted out. Presumably foo needs to be dropped 
 with a warning.
 
 Potentially, one could then also devise mechanisms for load/store 
 directly to/from the search path.
 
 Alternative ideas include changing the search path itself to be an 
 actual list of objects (rather than a nesting of environments), but 
 that leads to the same sort of issues.
 
 

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Building R-devel with ACML

2005-12-14 Thread Roger Peng
I'm trying to build R-devel with AMD's ACML.  I downloaded version 3.0.0 
64bit for gfortran (acml-3-0-0-gfortran-64bit.tgz) and copied the 
libraries to /usr/local/lib.  When I configure R to build against the 
ACML library, how do I know if the library has been detected and will be 
used?

I run 'configure' with the '--with-blas=-lacml' flag and am using gcc 
4.0.2 (with gfortran) on FC4.

Thanks,
-roger
-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] import of Namespaces

2005-12-01 Thread Roger Peng
My understanding is of your questions is below:

Matthias Kohl wrote:
 Dear R devels,
 
 let's say I have three packages pkg1, pkg2 and pkg3 which all 
 contain new S4 classes and methods. Where pkg3 depends on pkg2 and 
 pkg2 depends on pkg1. Moreover, all three packages have namespaces.
 
 1) I use .onLoad - function(lib, pkg) require(methods). Do I also 
 have to import the namespace of methods package?

No.

 
 2) If I use import(pkg1) in the namespace of pkg2, does this also 
 (correctly) import the S4 classes and methods of pkg1? Or do I 
 explicitly have to use importClassesFrom resp. importMethodsFrom?

Importing an entire package namespace will import all of the exported 
classes/methods from pkg1.

 
 3) If I import the Namespace of pkg2 in pkg3, where the namespace of 
 pkg2 has import(pkg1) (or maybe importClassesFrom, 
 importMethodsFrom) and I also want to use S4 classes and methods of 
 pkg1 in pkg3. Is it sufficient to have import(pkg2) in the 
 Namespace of pkg3 or do I need import(pkg1) and import(pkg2)?

I believe you need to import each separately since the S4 
classes/methods from pkg1 will not be available to you simple because 
you imported pgk2 (i.e. I don't think the chain rule applies here).

 
 Many thanks for your help and advice
 Matthias
 

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv

2005-11-24 Thread Roger Peng
If you don't want the row names, as 'write.csv()' writes out by default, 
try

write.table(object, file = myfile.csv, sep = ,, row.names = FALSE)

-roger

Sven Schaltenbrand wrote:
 hallo,
  
 i have a problem by writing a csv file
 the first colum is filled with index numbers from 1 to n.
 i have to unique two csv files once a week while one file is always the
 same.
 can anybody tell me, how to write the dataset into a csv file without the
 first row of the indexnumbers.
 x[,-1] does not wok as it eliminates the first interesting colum.
 col.names is not accepted by r (do i habe to start a package first? which
 one?)
  
 thx
  
 sven
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] write.csv ignores 'row.names'

2005-11-24 Thread Roger Peng
Upon replying to this email, I took a look at 'write.csv()' and noticed 
something interesting.  I remember there being a discussion sometime in 
the past about letting 'write.csv()' accept the 'row.names' argument. 
However, I get the following error:

  write.csv(airquality, file = myfile.csv, row.names = F)
Error in write.table(airquality, file = myfile.csv, row.names = F, 
col.names = NA,  :
 col.names = NA makes no sense when row.names = FALSE
 

In 'write.csv()' there is

 rn - Call$row.names
 Call$col.names - if (is.logical(rn)  !rn)
 TRUE

but is.logical(rn) is always FALSE because even if 'row.names' is 
specified (non-NULL), it is of class name.  Perhaps something like

rn - eval(Call$row.names)

would suffice?  I can't tell if that would break anything.

-roger

Sven Schaltenbrand wrote:
 hallo,
  
 i have a problem by writing a csv file
 the first colum is filled with index numbers from 1 to n.
 i have to unique two csv files once a week while one file is always the
 same.
 can anybody tell me, how to write the dataset into a csv file without the
 first row of the indexnumbers.
 x[,-1] does not wok as it eliminates the first interesting colum.
 col.names is not accepted by r (do i habe to start a package first? which
 one?)
  
 thx
  
 sven
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv ignores 'row.names'

2005-11-24 Thread Roger Peng
Okay, upon further examination, it appears that it works fine if you set 
'row.names = FALSE' as opposed to 'row.names = F'.

-roger

Roger Peng wrote:
 Upon replying to this email, I took a look at 'write.csv()' and noticed 
 something interesting.  I remember there being a discussion sometime in 
 the past about letting 'write.csv()' accept the 'row.names' argument. 
 However, I get the following error:
 
   write.csv(airquality, file = myfile.csv, row.names = F)
 Error in write.table(airquality, file = myfile.csv, row.names = F, 
 col.names = NA,  :
 col.names = NA makes no sense when row.names = FALSE
  
 
 In 'write.csv()' there is
 
 rn - Call$row.names
 Call$col.names - if (is.logical(rn)  !rn)
 TRUE
 
 but is.logical(rn) is always FALSE because even if 'row.names' is 
 specified (non-NULL), it is of class name.  Perhaps something like
 
 rn - eval(Call$row.names)
 
 would suffice?  I can't tell if that would break anything.
 
 -roger
 
 Sven Schaltenbrand wrote:
 
 hallo,
  
 i have a problem by writing a csv file
 the first colum is filled with index numbers from 1 to n.
 i have to unique two csv files once a week while one file is always the
 same.
 can anybody tell me, how to write the dataset into a csv file without the
 first row of the indexnumbers.
 x[,-1] does not wok as it eliminates the first interesting colum.
 col.names is not accepted by r (do i habe to start a package first? which
 one?)
  
 thx
  
 sven

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Packages that require other packages - How?

2005-11-09 Thread Roger Peng
If I'm not mistaken, when you put the other package in the Depends: 
field of DESCRIPTION, the other package will be loaded first, before 
your package is loaded.  So you shouldn't have to put require/library 
anywhere else.

-roger

Gavin Simpson wrote:
 Dear list,
 
 The help page for library/require contains the following paragraph in
 the section Packages that require other packages:
 
  The source code for a package that requires one or more other
  packages should have a call to 'require', preferably near the
  beginning of the source, and of course before any code that uses
  functions, classes or methods from the other package. 
 
 Now, I'm being very dense today, but I don't know where to put such a
 call to require.
 
 My package has added methods for a generic function supplied by another
 package. I have listed this package in the Depends field in my
 DESCRIPTION.
 
 What do I need to do to have the package that my package depends on be
 attached when I call library or require to attach my package?
 
 Apologies for being dense...
 
 Thanks,
 
 G

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel