date:20070927

Re: [Rd] delayedAssign

2007-09-27 Thread Luke Tierney

On Wed, 26 Sep 2007, Gabor Grothendieck wrote:

> I thought that perhaps the behavior in the previous post,
> while inconsistent with the documentation, was not all that
> harmful but I think its related to the following which is a potentially
> serious bug.

The previous discussion already established that as.list of an
environment should not return a list with promises in as promises
should not be visible at the R level.  (Another loophole that needs
closing is $ for environments). So behavior of results that should not
exist is undefined and I cannot see how any such behavior is a further
bug, serious or otherwise.

> z is a list with a single numeric component,
> as the dput output verifies,

Except it isn't, as print or str verify, which might be a problem if z
was an input these functions should expect, but it isn't.

> yet we cannot compare its first element
> to 7 without getting an error message.
>
> Later on we see that its because it thinks that z[[1]] is of type "promise"

As z[[1]] is in fact of type promise that would seem a fairly
reasonable thing to think at this point ...

> and even force(z[[1]]) is of type "promise".

which is consistent with what force is documented to do. The
documentation is quite explicit that force does not do what you seem
to be expecting.  That documentation is from a time when delay()
existed to produce promises at the R level, which was a nightmare
because of all the peculiarities it introduced, which is why it was
removed.

force is intended for one thing only -- replacing code like this:

   # I know the following line look really stupid and you will be
   # tempted to remove it for efficiency but DO NOT: it is needed
   # to make sure that the formal argument y is evaluated at this
   # point.
   y <- y

with this:

  force(y)

which seems much clearer -- at least it suggest you look at the help
page for force to see what it does.

At this point promises should only ever exist in bindings in
environments. If we wanted lazy evaluation constructs more widely
there are really only two sensible options:

 The Scheme option where a special function delay creates a deferred
 evaluation and another, called force in Scheme, forces the evaluation
 but there is no implicit forcing

or

 The Haskell option where data structurs are created lazily so

 z <- list(f(x))

 would create a list with a deferred evaluation, but any attempt to
 access the value of z would force the evaluation. So printing z,
 for example, would force the evaluation but

y <- z[[1]]

 would not.

It is easy enough to create a Delay/Force pair that behaves like
Scheme's with the tools available in R if that is what you want.
Haskell, and other fully lazy functional languages, are very
interesting but very different animals from R. For some reason you
seem to be expecting some combination of Scheme and Haskell behavior.

Best,

luke

>
>> f <- function(x) environment()
>> z <- as.list(f(7))
>> dput(z)
> structure(list(x = 7), .Names = "x")
>> z[[1]] == 7
> Error in z[[1]] == 7 :
>  comparison (1) is possible only for atomic and list types
>> force(z[[1]]) == 7
> Error in force(z[[1]]) == 7 :
>  comparison (1) is possible only for atomic and list types
>>
>> typeof(z)
> [1] "list"
>> typeof(z[[1]])
> [1] "promise"
>> typeof(force(z[[1]]))
> [1] "promise"
>> R.version.string # Vista
> [1] "R version 2.6.0 beta (2007-09-23 r42958)"
>
>
> On 9/19/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>> The last two lines of example(delayedAssign) give this:
>>
>>> e <- (function(x, y = 1, z) environment())(1+2, "y", {cat(" HO! "); pi+2})
>>> (le <- as.list(e)) # evaluates the promises
>> $x
>> 
>> $y
>> 
>> $z
>> 
>>
>> which contrary to the comment appears unevaluated.  Is the comment
>> wrong or is it supposed to return an evaluated result but doesn't?
>>
>>> R.version.string # Vista
>> [1] "R version 2.6.0 alpha (2007-09-06 r42791)"
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] delayedAssign

2007-09-27 Thread Gabor Grothendieck

Thanks for the explanation.

For lists either: (a) promises should be evaluated as they
enter the list or (b) promises evaluated as they exit the
list (i.e. as they are compared, inspected, etc.).  I gather
the intent was (a) but it does not happen that way due to
a bug in R.   Originally I thought (b) would then occur but
my surprise was that it does not occur either which is why
I feel its more serious than I had originally thought.

I think its ok if promises only exist in environments and not
lists.  Items that would be on my wishlist would be to be able
to do at R level the two mentioned previously

https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html

and thirdly an ability to get the evaluation environment, not just the
expression,
associated with a promise -- substitute only gets the expression.
Originally I thought I would need some or all of these wish items
and then thought not but am back to the original situation again as I use
them more and realize that they are at least important
for debugging (its very difficult to debug situations involving promises as
there is no way to inspect the evaluation environment so you are never sure
which environment a given promise is evaluating in) and possibly
for writing programs as well.

On 9/27/07, Luke Tierney <[EMAIL PROTECTED]> wrote:
> On Wed, 26 Sep 2007, Gabor Grothendieck wrote:
>
> > I thought that perhaps the behavior in the previous post,
> > while inconsistent with the documentation, was not all that
> > harmful but I think its related to the following which is a potentially
> > serious bug.
>
> The previous discussion already established that as.list of an
> environment should not return a list with promises in as promises
> should not be visible at the R level.  (Another loophole that needs
> closing is $ for environments). So behavior of results that should not
> exist is undefined and I cannot see how any such behavior is a further
> bug, serious or otherwise.
>
> > z is a list with a single numeric component,
> > as the dput output verifies,
>
> Except it isn't, as print or str verify, which might be a problem if z
> was an input these functions should expect, but it isn't.
>
> > yet we cannot compare its first element
> > to 7 without getting an error message.
> >
> > Later on we see that its because it thinks that z[[1]] is of type "promise"
>
> As z[[1]] is in fact of type promise that would seem a fairly
> reasonable thing to think at this point ...
>
> > and even force(z[[1]]) is of type "promise".
>
> which is consistent with what force is documented to do. The
> documentation is quite explicit that force does not do what you seem
> to be expecting.  That documentation is from a time when delay()
> existed to produce promises at the R level, which was a nightmare
> because of all the peculiarities it introduced, which is why it was
> removed.
>
> force is intended for one thing only -- replacing code like this:
>
>   # I know the following line look really stupid and you will be
>   # tempted to remove it for efficiency but DO NOT: it is needed
>   # to make sure that the formal argument y is evaluated at this
>   # point.
>   y <- y
>
> with this:
>
>  force(y)
>
> which seems much clearer -- at least it suggest you look at the help
> page for force to see what it does.
>
> At this point promises should only ever exist in bindings in
> environments. If we wanted lazy evaluation constructs more widely
> there are really only two sensible options:
>
> The Scheme option where a special function delay creates a deferred
> evaluation and another, called force in Scheme, forces the evaluation
> but there is no implicit forcing
>
> or
>
> The Haskell option where data structurs are created lazily so
>
> z <- list(f(x))
>
> would create a list with a deferred evaluation, but any attempt to
> access the value of z would force the evaluation. So printing z,
> for example, would force the evaluation but
>
>y <- z[[1]]
>
> would not.
>
> It is easy enough to create a Delay/Force pair that behaves like
> Scheme's with the tools available in R if that is what you want.
> Haskell, and other fully lazy functional languages, are very
> interesting but very different animals from R. For some reason you
> seem to be expecting some combination of Scheme and Haskell behavior.
>
> Best,
>
> luke
>
> >
> >> f <- function(x) environment()
> >> z <- as.list(f(7))
> >> dput(z)
> > structure(list(x = 7), .Names = "x")
> >> z[[1]] == 7
> > Error in z[[1]] == 7 :
> >  comparison (1) is possible only for atomic and list types
> >> force(z[[1]]) == 7
> > Error in force(z[[1]]) == 7 :
> >  comparison (1) is possible only for atomic and list types
> >>
> >> typeof(z)
> > [1] "list"
> >> typeof(z[[1]])
> > [1] "promise"
> >> typeof(force(z[[1]]))
> > [1] "promise"
> >> R.version.string # Vista
> > [1] "R version 2.6.0 beta (2007-09-23 r42958)"
> >
> >
> > On 9/19/07, Gabor Grothendi

Re: [Rd] rJava and RJDBC

2007-09-27 Thread Simon Urbanek

Joe,

which version of R and RJDBC are you using? The behavior you describe  
should have been fixed in RJDBC 0.1-4. Please try the latest version  
from rforge
install.packages("RJDBC",,"http://rforge.net/";)
and please let me know if that solves your problem.

Cheers,
Simon


On Sep 26, 2007, at 10:03 PM, Joe W. Byers wrote:

> I am desperate for help.
>
> I am trying to get the RJDBC and rJava .5to work on both my windows xp
> and linux Redhat EL5 Server.  On both I get a
> ava.lang.ClassNotFoundException when calling JDBC().
>
> My example is
> require(RJDBC)
> classPath='C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql- 
> connector-java-5.1.3-rc-bin.jar'
> driverClass=c("com.mysql.jdbc.Driver")
> drv <- JDBC(c("com.mysql.jdbc.Driver"),classPath,"`")
>
>
> This returns a NULL value and a java exception.
>> .jgetEx()
> [1] "Java-Object{java.lang.ClassNotFoundException:  
> com.mysql.jdbc.Driver}"
> my java version is
>> .jcall('java.lang.System','S','getProperty','java.version')
> [1] "1.6.0_02"
> jre
>
>
> When I use java 1.5.0_11 jre I have the same problem but the .jgetEx()
> is
>> .jgetEx()
> [1] "Java-Object{}
>
> my class path is
>> .jclassPath()
>   [1] "C:\\PROGRA~1\\R\\library\\rJava\\java"
>
>   [2] "."
>
>   [3]
> "C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql-connector- 
> java-5.1.3-rc-bin.jar"
>   [4] "C:\\libraries\\xmlbeans-2.0.0-beta1\\lib\\xbean.jar"
>
>   [5] "C:\\libraries\\POI\\poi-2.5.1-final-20040804.jar"
>
>   [6] "C:\\libraries\\POI\\poi-contrib-2.5.1-final-20040804.jar"
>
>   [7] "C:\\libraries\\POI\\poi-scratchpad-2.5.1-final-20040804.jar"
>
>   [8] "C:\\Libraries\\PJM\\eDataFeed.jar"
>
>   [9] "C:\\Libraries\\PJM\\webserviceclient.jar"
>
> [10] "C:\\Java\\Libraries\\QTJava.zip"
>
> My java_Home is
>> .jcall('java.lang.System','S','getProperty','java.home')
> [1] "C:\\Java\\jre1.6.0_02"
>
>
> I have tried breaking down the JDBC as
> .jinit() or .jinit(classPath)
> v<-.jcall("java/lang/ClassLoader","Ljava/lang/ClassLoader;",
> "getSystemClassLoader")
> .jcall("java/lang/Class", "Ljava/lang/Class;",
>  "forName", as.character(driverClass)[1], TRUE, v)
>   to no avail.
>
> I have tried different versions of the mysql jar.
>
> I do not know if my java version not compatible, my java settings are
> wrong, or I am just blind to the problem.  This is the same for  
> both my
> Windows XP and Redhat EL5 Server.
>
> I really appreciate any and all assistance.
>
> Thank you
> Joe
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky

On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
> For the most part, doing anything to an R object result in it's
> duplication. You generally have to do a lot of work to NOT copy an R
> object.

Thank you for your response. Unfortunately, you are right. For example,
the allocated memory determined by top command on Linux may change during
a session as follows:
  a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
  a[1,1] <- 0 # 3.0g
  gc() # 1.5g

In the current applicatin, I modify the matrix only using my own C code
and only read it on R level. So, the above is not a big problem for me
(at least not now).

However, there is a related thing, which could be a bug. The following
code determines the value of NAMED field in SEXP header of an object:

  SEXP getnamed(SEXP a)
  {
  SEXP out;
  PROTECT(out = allocVector(INTSXP, 1));
  INTEGER(out)[0] = NAMED(a);
  UNPROTECT(1);
  return(out);
  }

Now, consider the following session

  u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call("getnamed",u) # 1 (OK)

  length(u)
  .Call("getnamed",u) # 1 (OK)

  dim(u)
  .Call("getnamed",u) # 1 (OK)

  nrow(u)
  .Call("getnamed",u) # 2 (why?)

  u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call("getnamed",u) # 1 (OK)
  ncol(u)
  .Call("getnamed",u) # 2 (so, ncol does the same)

Is this a bug?

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky

In my previous email, I sent the example:
>   a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
>   a[1,1] <- 0 # 3.0g
>   gc() # 1.5g

This is misleading. The correct version is
   a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
   a[1,1] <- as.integer(0) # 1.5g
   gc() # 774m

So, the object duplicates, but nothing more.

The main part of my previous email (question concerning
a possible bug in the behavior of nrow(a) and ncol(a))
remains open.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] delayedAssign

2007-09-27 Thread Luke Tierney

On Thu, 27 Sep 2007, Gabor Grothendieck wrote:

> Thanks for the explanation.
>
> For lists either: (a) promises should be evaluated as they
> enter the list or (b) promises evaluated as they exit the
> list (i.e. as they are compared, inspected, etc.).

What makes you conclude that this is what "should" happen?

Again, promises are internal.  We could, and maybe will, eliminate
promises in favor of a mark on bindings in environments that indicates
that they need to be evaluated.  At the R level this woud produce the
same behavior as we currently (intend to) have.

If we allowed lazy structures outside of bindings then I still don't
see how (b) "should" happen.  With Scheme-like semantics we would
definitely NOT want this to happen; with Haskell-like semantics any
attempt to look at the value (including priting) would result in
evaluation (and replacing the promise/thunk/whatever by its value).

> I gather
> the intent was (a) but it does not happen that way due to
> a bug in R.   Originally I thought (b) would then occur but
> my surprise was that it does not occur either which is why
> I feel its more serious than I had originally thought.
>
> I think its ok if promises only exist in environments and not
> lists.  Items that would be on my wishlist would be to be able
> to do at R level the two mentioned previously
>
> https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html

I am still not persuaded that tools for inspecting environments are
worth the time and effort required but I am prepared to be.

Best,

luke

>
> and thirdly an ability to get the evaluation environment, not just the
> expression,
> associated with a promise -- substitute only gets the expression.
> Originally I thought I would need some or all of these wish items
> and then thought not but am back to the original situation again as I use
> them more and realize that they are at least important
> for debugging (its very difficult to debug situations involving promises as
> there is no way to inspect the evaluation environment so you are never sure
> which environment a given promise is evaluating in) and possibly
> for writing programs as well.
>
> On 9/27/07, Luke Tierney <[EMAIL PROTECTED]> wrote:
>> On Wed, 26 Sep 2007, Gabor Grothendieck wrote:
>>
>>> I thought that perhaps the behavior in the previous post,
>>> while inconsistent with the documentation, was not all that
>>> harmful but I think its related to the following which is a potentially
>>> serious bug.
>>
>> The previous discussion already established that as.list of an
>> environment should not return a list with promises in as promises
>> should not be visible at the R level.  (Another loophole that needs
>> closing is $ for environments). So behavior of results that should not
>> exist is undefined and I cannot see how any such behavior is a further
>> bug, serious or otherwise.
>>
>>> z is a list with a single numeric component,
>>> as the dput output verifies,
>>
>> Except it isn't, as print or str verify, which might be a problem if z
>> was an input these functions should expect, but it isn't.
>>
>>> yet we cannot compare its first element
>>> to 7 without getting an error message.
>>>
>>> Later on we see that its because it thinks that z[[1]] is of type "promise"
>>
>> As z[[1]] is in fact of type promise that would seem a fairly
>> reasonable thing to think at this point ...
>>
>>> and even force(z[[1]]) is of type "promise".
>>
>> which is consistent with what force is documented to do. The
>> documentation is quite explicit that force does not do what you seem
>> to be expecting.  That documentation is from a time when delay()
>> existed to produce promises at the R level, which was a nightmare
>> because of all the peculiarities it introduced, which is why it was
>> removed.
>>
>> force is intended for one thing only -- replacing code like this:
>>
>>   # I know the following line look really stupid and you will be
>>   # tempted to remove it for efficiency but DO NOT: it is needed
>>   # to make sure that the formal argument y is evaluated at this
>>   # point.
>>   y <- y
>>
>> with this:
>>
>>  force(y)
>>
>> which seems much clearer -- at least it suggest you look at the help
>> page for force to see what it does.
>>
>> At this point promises should only ever exist in bindings in
>> environments. If we wanted lazy evaluation constructs more widely
>> there are really only two sensible options:
>>
>> The Scheme option where a special function delay creates a deferred
>> evaluation and another, called force in Scheme, forces the evaluation
>> but there is no implicit forcing
>>
>> or
>>
>> The Haskell option where data structurs are created lazily so
>>
>> z <- list(f(x))
>>
>> would create a list with a deferred evaluation, but any attempt to
>> access the value of z would force the evaluation. So printing z,
>> for example, would force the evaluation but
>>
>>y <- z[[1]]
>>
>>

Re: [Rd] delayedAssign

2007-09-27 Thread Gabor Grothendieck

You or Peter stated that promises are internal so clearly they
should be evaluated going in or out of lists.  Otherwise you get
the current situation.

If you had just wasted as much time as I have trying to debug
a program with promises you would immediately understand why
it was necessary
to be able to query a promise for its evaluation environment.  I think
this is an obvious requirement for anyone dealing with promises since
you are never quite sure what environment the promises are being
evaluated with respect to.  If you knew this then you would immediately
understand if its doing what you think.  Its like trying to debug a program
without any way of finding out what the variables are at any point in the
program.

For the other features see #1 in these release notes (for the
next version of proto) which features lazy evaluation:

http://r-proto.googlecode.com/svn/trunk/inst/ReleaseNotes.txt

The examples there already seem to work with the devel version of proto
(available on the Source tab of r-proto.googlecode.com) but
not necessarily in a desirable way.  In the current implementation cloned
objects have their promises forced in the specified environment
defaulting to current environment rather than the original environment.
We don't know what the
original environment is since its not available at the R level, i.e.
substitute will tell us the expression but not the evaluation environment.
We could redundantly pass the evaluation environment around but then
it would not
be garbage collected causing large amount of inefficiency and
anyways, it seems pointless when the information is there and
just needs to be made accessible.  I think the default is that cloned
promises should be evaluated in the original
environment since that one is known to hold the variables that are
being evaluated whereas the current environment may or may not have
variables of those names.  In order to try this and variations on this
we need the three wish list items.

On 9/27/07, Luke Tierney <[EMAIL PROTECTED]> wrote:
> On Thu, 27 Sep 2007, Gabor Grothendieck wrote:
>
> > Thanks for the explanation.
> >
> > For lists either: (a) promises should be evaluated as they
> > enter the list or (b) promises evaluated as they exit the
> > list (i.e. as they are compared, inspected, etc.).
>
> What makes you conclude that this is what "should" happen?
>
> Again, promises are internal.  We could, and maybe will, eliminate
> promises in favor of a mark on bindings in environments that indicates
> that they need to be evaluated.  At the R level this woud produce the
> same behavior as we currently (intend to) have.
>
> If we allowed lazy structures outside of bindings then I still don't
> see how (b) "should" happen.  With Scheme-like semantics we would
> definitely NOT want this to happen; with Haskell-like semantics any
> attempt to look at the value (including priting) would result in
> evaluation (and replacing the promise/thunk/whatever by its value).
>
> > I gather
> > the intent was (a) but it does not happen that way due to
> > a bug in R.   Originally I thought (b) would then occur but
> > my surprise was that it does not occur either which is why
> > I feel its more serious than I had originally thought.
> >
> > I think its ok if promises only exist in environments and not
> > lists.  Items that would be on my wishlist would be to be able
> > to do at R level the two mentioned previously
> >
> > https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html
>
> I am still not persuaded that tools for inspecting environments are
> worth the time and effort required but I am prepared to be.
>
> Best,
>
> luke
>
> >
> > and thirdly an ability to get the evaluation environment, not just the
> > expression,
> > associated with a promise -- substitute only gets the expression.
> > Originally I thought I would need some or all of these wish items
> > and then thought not but am back to the original situation again as I use
> > them more and realize that they are at least important
> > for debugging (its very difficult to debug situations involving promises as
> > there is no way to inspect the evaluation environment so you are never sure
> > which environment a given promise is evaluating in) and possibly
> > for writing programs as well.
> >
> > On 9/27/07, Luke Tierney <[EMAIL PROTECTED]> wrote:
> >> On Wed, 26 Sep 2007, Gabor Grothendieck wrote:
> >>
> >>> I thought that perhaps the behavior in the previous post,
> >>> while inconsistent with the documentation, was not all that
> >>> harmful but I think its related to the following which is a potentially
> >>> serious bug.
> >>
> >> The previous discussion already established that as.list of an
> >> environment should not return a list with promises in as promises
> >> should not be visible at the R level.  (Another loophole that needs
> >> closing is $ for environments). So behavior of results that should not
> >> exist is undefined and I cannot see how any such behavior is a f

Re: [Rd] rJava and RJDBC

2007-09-27 Thread Joe W Byers

Simon Urbanek  r-project.org> writes:

> 
> Joe,
> 
> which version of R and RJDBC are you using? The behavior you describe  
> should have been fixed in RJDBC 0.1-4. Please try the latest version  
> from rforge
> install.packages("RJDBC",,"http://rforge.net/";)
> and please let me know if that solves your problem.
> 
> Cheers,
> Simon

Simon,

Thank you so much.  I have been working on this for a week.

I also have not been using rforge.net as a repository only the defaults to get
my package update.  Usually the IL mirror.

This really rocks!.

Agains,  Thank you and have a wonderful day.

Joe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Peter Dalgaard

Petr Savicky wrote:
> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
>   
>> For the most part, doing anything to an R object result in it's
>> duplication. You generally have to do a lot of work to NOT copy an R
>> object.
>> 
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
>   a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
>   a[1,1] <- 0 # 3.0g
>   gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
>   SEXP getnamed(SEXP a)
>   {
>   SEXP out;
>   PROTECT(out = allocVector(INTSXP, 1));
>   INTEGER(out)[0] = NAMED(a);
>   UNPROTECT(1);
>   return(out);
>   }
>
> Now, consider the following session
>
>   u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>   .Call("getnamed",u) # 1 (OK)
>
>   length(u)
>   .Call("getnamed",u) # 1 (OK)
>
>   dim(u)
>   .Call("getnamed",u) # 1 (OK)
>
>   nrow(u)
>   .Call("getnamed",u) # 2 (why?)
>
>   u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>   .Call("getnamed",u) # 1 (OK)
>   ncol(u)
>   .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>   
No. It is an infelicity.

The issues are that
1. length() and dim() call .Primitive directly, whereas nrow() and 
ncol() are "real" R functions
2. NAMED records whether an object has _ever_  had  0, 1, or 2+ names

During the evaluation of ncol(u). the argument x is evaluated, and at
that point the object "u" is also named "x" in the evaluation frame of
ncol(). A full(er) reference counting system might drop NAMED back to 1
when exiting ncol(), but currently, R  can only count up (and trying to
find the conditions under which it is safe to reduce NAMED will make
your head spin, believe me! )
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Prof Brian Ripley

1) You implicitly coerced 'a' to be numeric and thereby (almost) doubled 
its size: did you intend to?  Does that explain your confusion?

2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
understanding.

When you called nrow(a) you created another reference to 'a' in the 
evaluation frame of nrow.  (At a finer level you first created a promise 
to 'a' and then dim(x) evaluated that promise, which did SET_NAMED() 
= 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.

More generally, any argument to a closure that actually gets used will 
get NAMED set to 2.

Having too high a value of NAMED could never be a 'bug'.  See the 
explanation in the R Internals manual:

   When an object is about to be altered, the named field is consulted. A
   value of 2 means that the object must be duplicated before being
   changed.  (Note that this does not say that it is necessary to
   duplicate, only that it should be duplicated whether necessary or not.)

3) Memory profiling can be helpful in telling you exactly what copies get 
made.

On Thu, 27 Sep 2007, Petr Savicky wrote:

> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
>> For the most part, doing anything to an R object result in it's
>> duplication. You generally have to do a lot of work to NOT copy an R
>> object.
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
>  a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
>  a[1,1] <- 0 # 3.0g
>  gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
>  SEXP getnamed(SEXP a)
>  {
>  SEXP out;
>  PROTECT(out = allocVector(INTSXP, 1));
>  INTEGER(out)[0] = NAMED(a);
>  UNPROTECT(1);
>  return(out);
>  }
>
> Now, consider the following session
>
>  u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>  .Call("getnamed",u) # 1 (OK)
>
>  length(u)
>  .Call("getnamed",u) # 1 (OK)
>
>  dim(u)
>  .Call("getnamed",u) # 1 (OK)
>
>  nrow(u)
>  .Call("getnamed",u) # 2 (why?)
>
>  u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>  .Call("getnamed",u) # 1 (OK)
>  ncol(u)
>  .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence

Hi all,

A suggestion derived from discussions amongst a number of R users in  
my research group: set the default column names produced by aggregate 
() equal to the names of the objects in the list passed to the 'by'  
object.

ex. it is annoying to type

with(
my.data
,aggregate(
my.dv
,list(
one.iv = one.iv
,another.iv = another.iv
,yet.another.iv = yet.another.iv
)
,some.function
)
)

to yield a data frame with names = c 
('one.iv','another.iv','yet.another.iv','x') when this seems more  
economical:

with(
my.data
,aggregate(
my.dv
,list(
one.iv
,another.iv
,yet.another.iv
)
,some.function
)
)

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

"The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less."
- Piet Hein

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck

You can do this:

aggregate(iris[-5], iris[5], mean)


On 9/27/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> A suggestion derived from discussions amongst a number of R users in
> my research group: set the default column names produced by aggregate
> () equal to the names of the objects in the list passed to the 'by'
> object.
>
> ex. it is annoying to type
>
> with(
>my.data
>,aggregate(
>my.dv
>,list(
>one.iv = one.iv
>,another.iv = another.iv
>,yet.another.iv = yet.another.iv
>)
>,some.function
>)
> )
>
> to yield a data frame with names = c
> ('one.iv','another.iv','yet.another.iv','x') when this seems more
> economical:
>
> with(
>my.data
>,aggregate(
>my.dv
>,list(
>one.iv
>,another.iv
>,yet.another.iv
>)
>,some.function
>)
> )
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>- Piet Hein
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence

Understood, but my point is that the naming I suggest should be the  
default. One should not be 'punished' for being explicit in calling  
aggregate.


On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote:

> You can do this:
>
> aggregate(iris[-5], iris[5], mean)
>
>
> On 9/27/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> A suggestion derived from discussions amongst a number of R users in
>> my research group: set the default column names produced by aggregate
>> () equal to the names of the objects in the list passed to the 'by'
>> object.
>>
>> ex. it is annoying to type
>>
>> with(
>>my.data
>>,aggregate(
>>my.dv
>>,list(
>>one.iv = one.iv
>>,another.iv = another.iv
>>,yet.another.iv = yet.another.iv
>>)
>>,some.function
>>)
>> )
>>
>> to yield a data frame with names = c
>> ('one.iv','another.iv','yet.another.iv','x') when this seems more
>> economical:
>>
>> with(
>>my.data
>>,aggregate(
>>my.dv
>>,list(
>>one.iv
>>,another.iv
>>,yet.another.iv
>>)
>>,some.function
>>)
>> )
>>
>> --
>> Mike Lawrence
>> Graduate Student, Department of Psychology, Dalhousie University
>>
>> Website: http://memetic.ca
>>
>> Public calendar: http://icalx.com/public/informavore/Public
>>
>> "The road to wisdom? Well, it's plain and simple to express:
>> Err and err and err again, but less and less and less."
>>- Piet Hein
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

"The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less."
- Piet Hein

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck

You can do this too:

aggregate(iris[-5], iris["Species"], mean)

or this:

with(iris, aggregate(iris[-5], data.frame(Species), mean))

or this:

attach(iris)
aggregate(iris[-5], data.frame(Species), mean)

The point is that you already don't have to write x = x.  The only
reason you are writing it that way is that you are using list instead
of data.frame.  Just use data.frame or appropriate indexing as shown.

On 9/27/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
> Understood, but my point is that the naming I suggest should be the
> default. One should not be 'punished' for being explicit in calling
> aggregate.
>
>
> On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote:
>
> > You can do this:
> >
> > aggregate(iris[-5], iris[5], mean)
> >
> >
> > On 9/27/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
> >> Hi all,
> >>
> >> A suggestion derived from discussions amongst a number of R users in
> >> my research group: set the default column names produced by aggregate
> >> () equal to the names of the objects in the list passed to the 'by'
> >> object.
> >>
> >> ex. it is annoying to type
> >>
> >> with(
> >>my.data
> >>,aggregate(
> >>my.dv
> >>,list(
> >>one.iv = one.iv
> >>,another.iv = another.iv
> >>,yet.another.iv = yet.another.iv
> >>)
> >>,some.function
> >>)
> >> )
> >>
> >> to yield a data frame with names = c
> >> ('one.iv','another.iv','yet.another.iv','x') when this seems more
> >> economical:
> >>
> >> with(
> >>my.data
> >>,aggregate(
> >>my.dv
> >>,list(
> >>one.iv
> >>,another.iv
> >>,yet.another.iv
> >>)
> >>,some.function
> >>)
> >> )
> >>
> >> --
> >> Mike Lawrence
> >> Graduate Student, Department of Psychology, Dalhousie University
> >>
> >> Website: http://memetic.ca
> >>
> >> Public calendar: http://icalx.com/public/informavore/Public
> >>
> >> "The road to wisdom? Well, it's plain and simple to express:
> >> Err and err and err again, but less and less and less."
> >>- Piet Hein
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>- Piet Hein
>
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Aggregate factor names

2007-09-27 Thread Prof Brian Ripley

You seem to be assuming that the argument 'by' to the "data frame" method 
of aggregate() is a call to list() with arguments which are names (and 
evaluate to factors).

When aggregate.data.frame comes to be called, the 'by' argument is a 
promise to the actual argument.  In your example the actual argument is 
the call (in a readable layout)

list(one.iv, another.iv, yet.another.iv)

but that is one of a very large number of possibilities for 'by'.  Trying 
to produce reasonable names for unnamed arguments is hard enough (see 
?cbind), but trying to deduce reasonable names for the elements of a list 
argument is one step further up the chain.  Further, if we did that, 
people who wanted the documented behaviour would not longer be able to get 
it.

I think what you want is a function that takes unnamed arguments and 
returns a named list, to replace your usage of list().  That's not so very 
hard to do, not least as in this context data.frame() will do the job.
So to extend the example on the help page

> aggregate(x = testDF, by = data.frame(by1, by2), FUN = "mean")
by1  by2 v1 v2
11   95  5 55
22   95  7 77
31   99  5 55
42   99 NA NA
5  big damp  3 33
6 blue  dry  3 33
7  red  red  4 44
8  red  wet  1 11

However, note that the grouping variables need NOT be factors, and this 
has made them so.  So you may want to look at data.frame() and
write list_with_names() to do just that.

On Thu, 27 Sep 2007, Mike Lawrence wrote:

> Hi all,
>
> A suggestion derived from discussions amongst a number of R users in
> my research group: set the default column names produced by aggregate
> () equal to the names of the objects in the list passed to the 'by'
> object.
>
> ex. it is annoying to type
>
> with(
>   my.data
>   ,aggregate(
>   my.dv
>   ,list(
>   one.iv = one.iv
>   ,another.iv = another.iv
>   ,yet.another.iv = yet.another.iv
>   )
>   ,some.function
>   )
> )
>
> to yield a data frame with names = c
> ('one.iv','another.iv','yet.another.iv','x') when this seems more
> economical:
>
> with(
>   my.data
>   ,aggregate(
>   my.dv
>   ,list(
>   one.iv
>   ,another.iv
>   ,yet.another.iv
>   )
>   ,some.function
>   )
> )
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>   - Piet Hein
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Unnecessary extra copy with matrix(..., dimnames=NULL) (Was: Re: modifying large R objects in place)

2007-09-27 Thread Henrik Bengtsson

As others already mentioned, in your example you are first creating an
integer matrix and the coercing it to a double matrix by assigning
(double) 1 to element [1,1].  However, even when correcting for this
mistake, there is an extra copy created when using matrix().

Try this in a fresh vanilla R session:

> print(gc())
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 136684  3.7 35  9.4   35  9.4
Vcells  81026  0.7 786432  6.0   473127  3.7
> x <- matrix(1, nrow=5000, ncol=5000)
> print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136793   3.7 35   9.4   35   9.4
Vcells 25081043 191.4   27989266 213.6 25081056 191.4
> x[1,1] <- 2
> print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136797   3.7 35   9.4   35   9.4
Vcells 25081044 191.4   52830254 403.1 50081058 382.1

So, yes, in that x[1,1] <- 2 assignment an extra copy is created.  It
is related to to the fact that there is NAMED matrix object being
created inside matrix(), cf. the last rows in matrix():

x <- .Internal(matrix(data, nrow, ncol, byrow))
dimnames(x) <- dimnames
x

Here is a patch for matrix() that avoids this problem *when dimnames
is NULL* (which is many time the case):

matrix <- function(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL) {
  data <- as.vector(data);

  if(missing(nrow)) {
nrow <- ceiling(length(data)/ncol);
  } else if(missing(ncol)) {
ncol <- ceiling(length(data)/nrow);
  }

  # Trick to avoid extra copy in the case when 'dimnames' is NULL.
  if (is.null(dimnames)) {
.Internal(matrix(data, nrow, ncol, byrow));
  } else {
x <- .Internal(matrix(data, nrow, ncol, byrow));
dimnames(x) <- dimnames;
x;
  }
} # matrix()


Try the above again in a fresh R session with this patch applied and you'll get:

> print(gc())
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 136805  3.7 35  9.4   35  9.4
Vcells  81122  0.7 786432  6.0   473127  3.7
> x <- matrix(1, nrow=5000, ncol=5000)
> print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136919   3.7 35   9.4   35   9.4
Vcells 25081139 191.4   27989372 213.6 25081152 191.4
> x[1,1] <- 2
> print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136923   3.7 35   9.4   35   9.4
Vcells 25081140 191.4   29468840 224.9 25081276 191.4

Voila!

I talked to Luke Tierney about this and he though the internal method
should be updated to take the dimnames argument, i.e.
.Internal(matrix(data, nrow, ncol, byrow, dimnames)).  However, until
that is happening, may I suggest this simple patch/workaround to go in
R v2.6.0?

Cheers

Henrik


On 9/27/07, Petr Savicky <[EMAIL PROTECTED]> wrote:
> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
> > For the most part, doing anything to an R object result in it's
> > duplication. You generally have to do a lot of work to NOT copy an R
> > object.
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
>   a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
>   a[1,1] <- 0 # 3.0g
>   gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
>   SEXP getnamed(SEXP a)
>   {
>   SEXP out;
>   PROTECT(out = allocVector(INTSXP, 1));
>   INTEGER(out)[0] = NAMED(a);
>   UNPROTECT(1);
>   return(out);
>   }
>
> Now, consider the following session
>
>   u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>   .Call("getnamed",u) # 1 (OK)
>
>   length(u)
>   .Call("getnamed",u) # 1 (OK)
>
>   dim(u)
>   .Call("getnamed",u) # 1 (OK)
>
>   nrow(u)
>   .Call("getnamed",u) # 2 (why?)
>
>   u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
>   .Call("getnamed",u) # 1 (OK)
>   ncol(u)
>   .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R "capabilities" on a cluster node

2007-09-27 Thread Earl F. Glynn

R version 2.5.1 (2007-06-27)

I' running some simple R jobs via the  Sun Grid Engine on our Linux cluster 
in preparation for some bigger ones.

I checked R's capabilities on the cluster nodes (after failing to create a 
png file) and am getting the following warning message:
[Run on a cluster node using qrsh:]



> capabilities()

jpeg  pngtcltk  X11 http/ftp  sockets   libxml fifo

   FALSEFALSEFALSEFALSE TRUE TRUE TRUE TRUE

  clediticonv  NLS  profmem

TRUE TRUE TRUEFALSE

Warning message:

unable to load shared library 
'/n/site/inst/Linux-i686/bioinfo/R/2.5.1/lib/R/modules//R_X11.so':

  libSM.so.6: cannot open shared object file: No such file or directory in: 
capabilities()





Is the double slash (//) in the path above a bug in how we've configured R 
here (the /n/site/inst/ directory is shared but is platform specific), or a 
bug in how the capabilities command works?  The file does exist if the 
double slash in the path had not caused the warning above.






How can one programmatically get the info from ?Devices, which appears to be 
dynamic based on one's system?  Is it safe to assume that pdf's or 
postscript files are always available in R since they're not listed in 
capabilities and seem to be shown everywhere under ?Devices ?





Part of the ?Devices output on a cluster node says this:



   The following devices will be available if R was compiled to use

 them:



*  'X11' The graphics driver for the X11 Window system



*  'png' PNG bitmap device



*  'jpeg' JPEG bitmap device



We can just recompile to get png or jpeg support?  Are X11 libraries used on 
cluster nodes while running "headless"?  Can I create pngs or jpegs without 
X11?



Thanks for any advice about this.



efg



Earl F. Glynn

Scientific Programmer
Stowers Institute for Medical Research

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky

Thank you very much for all the explanations. In particular for pointing
out that nrow is not a .Primitive unlike dim, which is the
reason for the difference in their behavior. (I rised the question
of possible bug due to this difference, not just being unsatisfied
with nrow). Also, thanks for:

On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote:
[...]
> 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
> understanding.
> 
> When you called nrow(a) you created another reference to 'a' in the 
> evaluation frame of nrow.  (At a finer level you first created a promise 
> to 'a' and then dim(x) evaluated that promise, which did SET_NAMED() 
> = 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.
> 
> More generally, any argument to a closure that actually gets used will 
> get NAMED set to 2.
[...]

This explains a lot.

I appreciate also the patch to matrix by Henrik Bengtsson, which saved
me time formulating a further question just about this.

I do not know, whether there is a reason to keep nrow, ncol not .Primitive,
but if there is such, the problem may be solved by rewriting
them as follows:

nrow <- function(...) dim(...)[1]
ncol <- function(...) dim(...)[2]

At least in my environment, the new versions preserved NAMED == 1.

It has a side effect that this unifies the error messages generated
by too many arguments to nrow(x) and dim(x). Currently
  a <- matrix(1:6,nrow=2)
  nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6)
  dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1

May be, also other solutions exist.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] modifying large R objects in place

2007-09-27 Thread Peter Dalgaard

Petr Savicky wrote:
> Thank you very much for all the explanations. In particular for pointing
> out that nrow is not a .Primitive unlike dim, which is the
> reason for the difference in their behavior. (I rised the question
> of possible bug due to this difference, not just being unsatisfied
> with nrow). Also, thanks for:
>
> On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote:
> [...]
>   
>> 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
>> understanding.
>>
>> When you called nrow(a) you created another reference to 'a' in the 
>> evaluation frame of nrow.  (At a finer level you first created a promise 
>> to 'a' and then dim(x) evaluated that promise, which did SET_NAMED() 
>> = 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.
>>
>> More generally, any argument to a closure that actually gets used will 
>> get NAMED set to 2.
>> 
> [...]
>
> This explains a lot.
>
> I appreciate also the patch to matrix by Henrik Bengtsson, which saved
> me time formulating a further question just about this.
>
> I do not know, whether there is a reason to keep nrow, ncol not .Primitive,
> but if there is such, the problem may be solved by rewriting
> them as follows:
>
> nrow <- function(...) dim(...)[1]
> ncol <- function(...) dim(...)[2]
>
> At least in my environment, the new versions preserved NAMED == 1.
>   
Yes, but changing the formal arguments is a bit messy, is it not?

Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works 
too, but if the gain is important enough to warrant that sort of 
programming, you might as well make nrow a .Primitive.

Longer-term, I still have some hope for better reference counting, but 
the semantics of environments make it really ugly -- an environment can 
contain an object that contains the environment, a simple example being 

f <- function()
g <- function() 0
f()

At the end of f(), we should decide whether to destroy f's evaluation 
environment. In the present example, what we need to be able to see is 
that this would remove all refences to g and that the reference from g 
to f can therefore be ignored.  Complete logic for sorting this out is 
basically equivalent to a new garbage collector, and one can suspect 
that applying the logic upon every function return is going to be 
terribly inefficient. However, partial heuristics might apply.

> It has a side effect that this unifies the error messages generated
> by too many arguments to nrow(x) and dim(x). Currently
>   a <- matrix(1:6,nrow=2)
>   nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6)
>   dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1
>
> May be, also other solutions exist.
>
> Petr Savicky.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] windows device transparency issue

2007-09-27 Thread Austin, Matt

 

I read in a thread in r-help today that the windows device in 2.6 supports
transparency, so I tried an example and had some issues.  The density plots
should be filled with transparent color in the following example (similar to
the points), however the color is "fully" transparent.  This works in the
Cairo device, but not in the windows device.

 

Thanks,

 

--Matt

 

Matt Austin

Statistician, Amgen Inc.

 

 

Example of problem derived from example(layout) code.

 

##Start

windows()

 

x <- pmin(3, pmax(-3, rbeta(5000,1,5)))

y <- pmin(3, pmax(-3, c(rnorm(2500, .25, 1), rnorm(2500))) )

trtp.f <-  factor(rep(c("Placebo", "Not Placebo"), each=2500),
levels=c("Placebo", "Not Placebo"))

 

xdens <-  tapply(x, trtp.f,  density, na.rm=TRUE)

ydens <-  tapply(y, trtp.f,  density, na.rm=TRUE)

 

topx <- max(unlist(lapply(xdens, function(x) max(x$y

topy <- max(unlist(lapply(ydens, function(x) max(x$y

 

xrange <- range(x, na.rm=TRUE)

yrange <- range(y, na.rm=TRUE)

 

nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)

 

par(mar=c(4,4,0,0), oma=c(0,0,0,0))

 

getcolors <- rainbow(length(levels(trtp.f)), alpha=.5)

plot(x, y, xlim=xrange, ylim=yrange,xlab="X value", ylab="Y value", pch=21,
bg=getcolors[as.numeric(trtp.f)])

 

par(mar=c(0,4,1,1))

plot(x=xrange, axes=FALSE, xlim=xrange, ylim=c(0, topx), xlab="", ylab="")

#lapply(xdens, lines)

 

for(i in 1:length(xdens)){

  dat <- xdens[[i]]

  polygon(x=c(min(dat$x), dat$x, max(dat$x)) , y=c(0, dat$y, 0),   col=
rainbow(length(xdens), alpha=.5)[i])

}

 

par(mar=c(4,0,1,1))

plot(x=yrange, axes=FALSE, ylim=yrange, xlim=c(0, topy), xlab="", ylab="")

 

for(i in 1:length(ydens)){

  dat <- ydens[[i]]

  polygon(y=c(min(dat$x), dat$x, max(dat$x)) , x=c(0, dat$y, 0),
col=rainbow(length(ydens), alpha=.5)[i])

  

}

##End

 

Other Info:

 

platform   i386-pc-mingw32   

arch   i386  

os mingw32   

system i386, mingw32 

status RC

major  2 

minor  6.0   

year   2007  

month  09

day26

svn rev42991 

language   R 

version.string R version 2.6.0 RC (2007-09-26 r42991)


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] to overcome error while computing sd for each subset of data (PR#9931)

2007-09-27 Thread anjalishitole

Full_Name: Anjali Shitole
Version: R 2.5.1
OS: 
Submission from: (NULL) (202.141.157.91)


I want to compute coefficient of variation for each subset of data.
Each subset is heterogeneous means number of rows in each subset are
different.These all subsets are in one dataset situated one by one.
i have used aggregate command to calculate mean for each subset.
But same command for calculating sd gives error 
see the following program,
> a<-read.table("c:/mri/data/ratioinput.txt",header=T, na.strings =
"NA",fill=TRUE)
> dim(a)
[1] 3251527
> BusinessRisk<-a[,16]/(a[,26]/a[,27]) 
> a1<-a[,1:3]
> a2<-cbind(a1,BusinessRisk)
> c1<-aggregate(BusinessRisk,list(a2[,1]),mean)
> names(a)
 [1] "Co_Code""Co_Name""Year"   "TotSholderFund"
 [5] "TotLiab""NetBlock"   "Investments""Inventories"   
 [9] "SundryDebts""CashandBank""TotCurrAssets"  "CurrLiab"  
[13] "TotCurrLiab""TotAssets"  "NetSales"   "OpPft" 
[17] "Interest"   "GrossPft"   "AdjNetPft"  "Dividend"  
[21] "PrefDivi"   "EquityPaidUp"   "CurrentAssets"  "PBIT"  
[25] "TotExpen"   "Equity" "FaceValue" 
> c2<-aggregate(BusinessRisk,list(a2[,1]),sd(BusinessRisk,na.rm=TRUE))
Error in FUN(X[[1L]], ...) : argument "INDEX" is missing, with no default
> 
Please help me to overcome this error.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] windows device transparency issue

2007-09-27 Thread Prof Brian Ripley

Thank you for trying 2.6.0 RC.

The example is far from minimal: most semi-transparent polygon fills were 
failing due to a typo prior to r43003.  Your example now works (several 
times faster than under Cairo).

As you will see from the below, sending HTML makes your postings 
unnecessarily hard to read for users of text email clients: please do 
follow the posting guide.


On Thu, 27 Sep 2007, Austin, Matt wrote:

>
>
> I read in a thread in r-help today that the windows device in 2.6 supports
> transparency, so I tried an example and had some issues.  The density plots
> should be filled with transparent color in the following example (similar to
> the points), however the color is "fully" transparent.  This works in the
> Cairo device, but not in the windows device.
>
>
>
> Thanks,
>
>
>
> --Matt
>
>
>
> Matt Austin
>
> Statistician, Amgen Inc.
>
>
>
>
>
> Example of problem derived from example(layout) code.
>
>
>
> ##Start
>
> windows()
>
>
>
> x <- pmin(3, pmax(-3, rbeta(5000,1,5)))
>
> y <- pmin(3, pmax(-3, c(rnorm(2500, .25, 1), rnorm(2500))) )
>
> trtp.f <-  factor(rep(c("Placebo", "Not Placebo"), each=2500),
> levels=c("Placebo", "Not Placebo"))
>
>
>
> xdens <-  tapply(x, trtp.f,  density, na.rm=TRUE)
>
> ydens <-  tapply(y, trtp.f,  density, na.rm=TRUE)
>
>
>
> topx <- max(unlist(lapply(xdens, function(x) max(x$y
>
> topy <- max(unlist(lapply(ydens, function(x) max(x$y
>
>
>
> xrange <- range(x, na.rm=TRUE)
>
> yrange <- range(y, na.rm=TRUE)
>
>
>
> nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
>
>
>
> par(mar=c(4,4,0,0), oma=c(0,0,0,0))
>
>
>
> getcolors <- rainbow(length(levels(trtp.f)), alpha=.5)
>
> plot(x, y, xlim=xrange, ylim=yrange,xlab="X value", ylab="Y value", pch=21,
> bg=getcolors[as.numeric(trtp.f)])
>
>
>
> par(mar=c(0,4,1,1))
>
> plot(x=xrange, axes=FALSE, xlim=xrange, ylim=c(0, topx), xlab="", ylab="")
>
> #lapply(xdens, lines)
>
>
>
> for(i in 1:length(xdens)){
>
>  dat <- xdens[[i]]
>
>  polygon(x=c(min(dat$x), dat$x, max(dat$x)) , y=c(0, dat$y, 0),   col=
> rainbow(length(xdens), alpha=.5)[i])
>
> }
>
>
>
> par(mar=c(4,0,1,1))
>
> plot(x=yrange, axes=FALSE, ylim=yrange, xlim=c(0, topy), xlab="", ylab="")
>
>
>
> for(i in 1:length(ydens)){
>
>  dat <- ydens[[i]]
>
>  polygon(y=c(min(dat$x), dat$x, max(dat$x)) , x=c(0, dat$y, 0),
> col=rainbow(length(ydens), alpha=.5)[i])
>
>
>
> }
>
> ##End
>
>
>
> Other Info:
>
>
>
> platform   i386-pc-mingw32
>
> arch   i386
>
> os mingw32
>
> system i386, mingw32
>
> status RC
>
> major  2
>
> minor  6.0
>
> year   2007
>
> month  09
>
> day26
>
> svn rev42991
>
> language   R
>
> version.string R version 2.6.0 RC (2007-09-26 r42991)
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] delayedAssign

Re: [Rd] delayedAssign

Re: [Rd] rJava and RJDBC

Re: [Rd] modifying large R objects in place

Re: [Rd] modifying large R objects in place

Re: [Rd] delayedAssign

Re: [Rd] delayedAssign

Re: [Rd] rJava and RJDBC

Re: [Rd] modifying large R objects in place

Re: [Rd] modifying large R objects in place

[Rd] Aggregate factor names

Re: [Rd] Aggregate factor names

Re: [Rd] Aggregate factor names

Re: [Rd] Aggregate factor names

Re: [Rd] Aggregate factor names

[Rd] Unnecessary extra copy with matrix(..., dimnames=NULL) (Was: Re: modifying large R objects in place)

[Rd] R "capabilities" on a cluster node

Re: [Rd] modifying large R objects in place

Re: [Rd] modifying large R objects in place

[Rd] windows device transparency issue

[Rd] to overcome error while computing sd for each subset of data (PR#9931)

Re: [Rd] windows device transparency issue

22 matches

Site Navigation

Mail list logo

Footer information