Re: [R] somebody help me about this error message...

2010-02-27 Thread Allan Engelhardt

You forgot the assign the second time:

assign(paste(a,2,sep=), 4)

does what you want.

Hope this helps a little

Allan.

On 27/02/10 05:13, Joseph Lee wrote:

I created variables automatically like this way

for(i in 1:5){
nam- paste(a,i,sep=)
assign(nam,1:i)
}

and then, i want to insert a new data into a2 variable. so, i did next
sentence

paste(a,2,sep=)- 4

so, i got this error message

Error in get(paste(a, 2, sep = ))[1]- 4 :
   target of assignment expands to non-language object

anyone knows abou this error message and tell me how to solve thie problem,
please..



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Computing Probit Marginal Effects

2010-02-27 Thread Ted Harding
On 27-Feb-10 03:52:19, Cardinals_Fan wrote:
 
 Hi,  I am a stata user trying to transition to R.  Typically I
 compute marginal effects plots for (example) probit models by
 drawing simulated betas by using the coefficient/standard error
 estimates after I run a probit model. I then use these simulated
 betas to compute first difference marginal effects.  My question
 is, can I do this in R?  Specifically, I was wondering if anyone
 knows how R stores the coefficient/standard error estimates after
 you estimate the model?  I assume it's  vector, but what is it
 called?
 
 Cheers
 --

Here is an example which sets up (X,Y) data using a probit mechaism,
then fits a probit model, and then extracts the information which
you seek.

  set.seed(54321)
  X - 0.2*(-10:10)
  U - rnorm(21)
  Y - 1*(U = X)  ## binary outcome 0/1, = 1 if N(0,1) = X
  GLM  - glm(Y ~ X, family=binomial(link=probit)) ## fit a probit
  Coef - summary(GLM)$coef  ## apply summary() to the fit

GLM is a list with a large number of components: enter the command

  str(GLM)

and have a look at what you get! Only a few of these are displayed
when you apply print() to it:

  print(GLM)
  # Call:  glm(formula = Y ~ X, family = binomial(link = probit)) 
  # Coefficients:
  # (Intercept)X  
  # 0.08237  0.56982  
  # 
  # Degrees of Freedom: 20 Total (i.e. Null);  19 Residual
  # Null Deviance:  29.06 
  # Residual Deviance: 23.93AIC: 27.93 

Note that you do *not* get Standard Errors from this.

However, all the information in GLM is available for processing
by other functions. In particular, summary(GLM) produces another
list with several components -- have a look at the output from

  str(summary(GLM))

One of these components (listed near the end of this output)
is coef, and it can be accessed as summary(GLM)$coef as in the
above command

  Coef - summary(GLM)$coef

This is a matrix (in this case 2 named rows, 4 named columns):

  Coef
  #  Estimate Std. Error   z value   Pr(|z|)
  # (Intercept) 0.0823684  0.2974595 0.2769063 0.78185207
  # X   0.5698200  0.2638657 2.1595076 0.03081081

So there is one row for each coefficient in the model (here 2,
one for Intercept, one for variable X), and four columns
(for the Estimate itself of the coefficient, for its Standard
Error, for the z-value (Est/SE), and for the P-value).

Hence you can access the estimates as

  Coef[,1]   # (the first column of the matrix)
  # (Intercept)   X 
  #   0.0823684   0.5698200 

and their respective SEs as

  Coef[,2]   # (the second column of the matrix)
  # (Intercept)   X 
  #   0.2974595   0.2638657 

I have spelled this out in detail to demonstrate that the key
to accessing information in objects constructed by R lies in
its structures (especially lists, vectors and matrices). You
can find out what is involved for any function by looking for
the section Value in its help page. For instance, the function
summary() when applied to a GLM uses the method summary.glm(),
so you can enter the command

  ?summary.glm

and then read what is in the section Value. This shows that
it is a list with components whose names are

  call, family, deviance, ... , coefficients, ... , symbolic.cor

and a component with name Name can be accessed using $Name as
in GLM$coef (you can use coef instead of coefficients since
the first four letters are [more than] enough to identify the name
uniquely).

Once you get used to this, things become straightforward!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 27-Feb-10   Time: 08:38:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preserving lists in a function

2010-02-27 Thread baptiste auguie
Hi,

I think I would follow this approach too, using updatelist() from the
reshape package,


updatelist - function (x, y)
{
common - intersect(names(x), names(y))
x[common] - y[common]
x
}

myfunction=function(list1=NULL, list2=NULL, list3=NULL){
   list1=updatelist(list(variable1=1,
 variable2=2,
 variable3=3), list1)

   list2=updatelist(list(variable1=variable1,
 variable2=variable2,
 variable3=variable3), list2)

   list3=updatelist(list(variable1=character,
 variable2=24,
 variable3=c(0.1,0.1,0.1,0.1),
 variable4=TRUE), list3)

   return(list(list1=list1,list2=list2,list3=list3))

 }


Best regards,

baptiste

On 27 February 2010 01:51, Don MacQueen m...@llnl.gov wrote:
 Barry explained your first puzzle, but let  me add some explanation and
 examples.


  tmpfun - function( a =3 ) {a}
  tmpfun()

 [1] 3

  tmpfun(a='x')

 [1] x

 Inside the function, the value of the argument is whatever the user
 supplied. The default is replaced by what the user supplies. There is no
 mechanism for retaining the default structure and filling in any missing
 parts. R never preserves the defaults when the user supplies something other
 than the default.

 For example, and using your function,

  myfunction(list1='x')

 $list1
 [1] x

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable2
 [1] variable2

 $list2$variable3
 [1] variable3


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable3
 [1] 0.1 0.1 0.1 0.1

 $list3$variable4
 [1] TRUE


  myfunction(list1=data.frame(a=1:2, b=c('x','y')))

 $list1
  a b
 1 1 x
 2 2 y

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable2
 [1] variable2

 $list2$variable3
 [1] variable3


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable3
 [1] 0.1 0.1 0.1 0.1

 $list3$variable4
 [1] TRUE

 What you put in is what you get out.

 I don't know that I would deal with this the way Barry did. I would probably
 write code to examine the structure of what the user supplies, compare it to
 the required structure, and then fill in.

 myf - function(l1, l2, l3) {
  if (missing(l1)) {
   ## user did not supply l1, so set it = to the default
    l1 - list(v1=1, v2=2, v3=3)
  }  else if (!is.list(l1)) {
   ## user must supply a list, if not, it's an error
   stop('l1 must be a list')
 } else {
   ## user has at least supplied a list
   ## now write code to check the names of the list that the user supplied
   ## make sure the names that the user supplied are valid, if not, stop()
   ## if the user supplied too few elements, fill in the missing ones
   ## if the user supplied too many elements stop()
   ## if the user supplied all the correct elements, with all the correct
 names, use what the user supplied
 }

 Looks complicated; maybe Barry's way is better...

 -Don

 At 5:56 PM -0500 2/26/10, Shang Gao wrote:

 Dear R users,

 A co-worker and I are writing a function to facilitate graph plotting in
 R. The function makes use of a lot of lists in its defaults.

 However, we discovered that R does not necessarily preserve the defaults
 if we were to input them in the form of list() when initializing the
 function. For example, if you feed the function codes below into R:

 myfunction=function(
    list1=list  (variable1=1,
                variable2=2,
                variable3=3),

    list2=list  (variable1=variable1,
                variable2=variable2,
                variable3=variable3),

    list3=list  (variable1=character,
                variable2=24,
                variable3=c(0.1,0.1,0.1,0.1),
                variable4=TRUE))

 {return(list(list1=list1,list2=list2,list3=list3))}

 By definition, the values associated with each variable in the lists would
 be the default unless the user impute a different value while executing the
 function. But a problem arises when a variable in the list is left out
 completely (not imputed at all). An example is shown below:

 myfunction( list1=list  (variable1=1,
                        variable2=2), #variable 3 deliberately left out

            list2=list  (variable1=variable1,
                        variable3=position changed,
                        variable2=variable2),

            list3=list  (variable1=character,
                        variable2=24,
                        variable4=FALSE)) #variable 3 deliberately left out

 #The outcome of the above execution is shown below:

 $list1
 $list1$variable1
 [1] 1

 $list1$variable2
 [1] 2
 #list1$variable3 is missing. Defaults in function not assigned in this
 execution

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable3
 [1] position changed

 $list2$variable2
 [1] variable2


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable4
 [1] FALSE
 #list3$variable3 is missing. Defaults in function not assigned in this
 execution

 We later realized that the problem lies in list() commands. Hence, we
 tried to enforce the defaults on 

Re: [R] Error in mvpart example

2010-02-27 Thread Gavin Simpson
These functions (rpart, the mvpart wrapper, and summary.rpart) are
fairly complex doing many things.

For contributed packages you'd be best served by contacting the
author/maintainer. I've CC'd Glenn (the maintainer) here.

HTH

G

On Fri, 2010-02-26 at 13:55 +, Wearn, Oliver wrote:
 Dear all,
 
 I'm getting an error in one of the stock examples in the 'mvpart'
 package. I tried:
 
 require(mvpart)
 data(spider)
 fit3 - rpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water
 +twigs+reft+herbs+moss+sand,spider,method=dist) #directly
 from ?rpart
 summary(fit3)
 
 ...which returned the following:
 
 Error in apply(formatg(yval, digits - 3), 1, paste, collapse = ,,
 sep = ) : 
   dim(X) must have a positive length
 
 This seems to be a problem with the cross-validation, since the
 xerror and xstd columns are missing from the summary table as
 well.
 
 Using the mpart() wrapper results in the same error:
 
 fit4-mvpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water
 +twigs+reft+herbs+moss+sand,spider,method=dist)
 summary(fit4)
 
 Note, changing the 'method' argument to =mrt seems, superficially,
 to solve the problem. However, when the dependent variable is a
 dissimilarity matrix, shouldn't method=dist be used (as per the
 examples)?
 
 Thanks, in advance, for any help on this error.
 
 Oliver
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of ones in a vector

2010-02-27 Thread Gavin Simpson
On Fri, 2010-02-26 at 10:43 -0800, David Reinke wrote:
 The length will remain the same no matter what expression appears in
 the subscript.

No it won't! x == 1 evaluates to logical and when used to *subset* x, it
*will* return the required answer. As observed with this example:

 set.seed(1)
 x - sample(rep(1:3, times = 20))
 x
 [1] 1 1 1 1 3 2 3 3 3 1 2 3 3 1 2 2 2 1 3 2 2 1 1 2 1 2 1 1 1 1
[31] 3 3 2 3 2 2 2 3 3 3 1 3 3 1 3 2 1 1 2 2 1 1 3 2 2 3 2 2 3 3
 
 ## compare
 sum(x == 1, na.rm = TRUE)
[1] 20
 length(x)
[1] 60
 length(x[x == 1])
[1] 20

G

  I suggest this:
 
 sum(x == 1)
 
 David Reinke
 
 Senior Transportation Engineer/Economist
 Dowling Associates, Inc.
 180 Grand Avenue, Suite 250
 Oakland, California 94612-3774
 510.839.1742 x104 (voice)
 510.839.0871 (fax)
 www.dowlinginc.com
 
  Please consider the environment before printing this e-mail.
 
 Confidentiality Notice:  This e-mail message, including any attachments, is 
 for the sole use of the intended recipient(s), and may contain confidential  
 and privileged information. Any unauthorized review, use, disclosure or 
 distribution is prohibited. If you are not the intended recipient, please 
 contact the sender by reply e-mail and destroy all copies of the original 
 message.
 
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Randall Wrong
 Sent: Friday, February 26, 2010 6:44 AM
 To: r-help@r-project.org
 Subject: [R] counting the number of ones in a vector
 
  Dear R users,
 
 I want to count the number of ones in a vector x.
 
 That's what I did : length( x[x==1] )
 
 Is that a good solution ?
 
 Thank you very much,
 Randall
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preserving lists in a function

2010-02-27 Thread Gabor Grothendieck
Or use modifyList which is in the core of R.

On Sat, Feb 27, 2010 at 5:22 AM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Hi,

 I think I would follow this approach too, using updatelist() from the
 reshape package,


 updatelist - function (x, y)
 {
    common - intersect(names(x), names(y))
    x[common] - y[common]
    x
 }

 myfunction=function(list1=NULL, list2=NULL, list3=NULL){
   list1=updatelist(list(variable1=1,
     variable2=2,
     variable3=3), list1)

   list2=updatelist(list(variable1=variable1,
     variable2=variable2,
     variable3=variable3), list2)

   list3=updatelist(list(variable1=character,
     variable2=24,
     variable3=c(0.1,0.1,0.1,0.1),
     variable4=TRUE), list3)

   return(list(list1=list1,list2=list2,list3=list3))

  }


 Best regards,

 baptiste

 On 27 February 2010 01:51, Don MacQueen m...@llnl.gov wrote:
 Barry explained your first puzzle, but let  me add some explanation and
 examples.


  tmpfun - function( a =3 ) {a}
  tmpfun()

 [1] 3

  tmpfun(a='x')

 [1] x

 Inside the function, the value of the argument is whatever the user
 supplied. The default is replaced by what the user supplies. There is no
 mechanism for retaining the default structure and filling in any missing
 parts. R never preserves the defaults when the user supplies something other
 than the default.

 For example, and using your function,

  myfunction(list1='x')

 $list1
 [1] x

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable2
 [1] variable2

 $list2$variable3
 [1] variable3


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable3
 [1] 0.1 0.1 0.1 0.1

 $list3$variable4
 [1] TRUE


  myfunction(list1=data.frame(a=1:2, b=c('x','y')))

 $list1
  a b
 1 1 x
 2 2 y

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable2
 [1] variable2

 $list2$variable3
 [1] variable3


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable3
 [1] 0.1 0.1 0.1 0.1

 $list3$variable4
 [1] TRUE

 What you put in is what you get out.

 I don't know that I would deal with this the way Barry did. I would probably
 write code to examine the structure of what the user supplies, compare it to
 the required structure, and then fill in.

 myf - function(l1, l2, l3) {
  if (missing(l1)) {
   ## user did not supply l1, so set it = to the default
    l1 - list(v1=1, v2=2, v3=3)
  }  else if (!is.list(l1)) {
   ## user must supply a list, if not, it's an error
   stop('l1 must be a list')
 } else {
   ## user has at least supplied a list
   ## now write code to check the names of the list that the user supplied
   ## make sure the names that the user supplied are valid, if not, stop()
   ## if the user supplied too few elements, fill in the missing ones
   ## if the user supplied too many elements stop()
   ## if the user supplied all the correct elements, with all the correct
 names, use what the user supplied
 }

 Looks complicated; maybe Barry's way is better...

 -Don

 At 5:56 PM -0500 2/26/10, Shang Gao wrote:

 Dear R users,

 A co-worker and I are writing a function to facilitate graph plotting in
 R. The function makes use of a lot of lists in its defaults.

 However, we discovered that R does not necessarily preserve the defaults
 if we were to input them in the form of list() when initializing the
 function. For example, if you feed the function codes below into R:

 myfunction=function(
    list1=list  (variable1=1,
                variable2=2,
                variable3=3),

    list2=list  (variable1=variable1,
                variable2=variable2,
                variable3=variable3),

    list3=list  (variable1=character,
                variable2=24,
                variable3=c(0.1,0.1,0.1,0.1),
                variable4=TRUE))

 {return(list(list1=list1,list2=list2,list3=list3))}

 By definition, the values associated with each variable in the lists would
 be the default unless the user impute a different value while executing the
 function. But a problem arises when a variable in the list is left out
 completely (not imputed at all). An example is shown below:

 myfunction( list1=list  (variable1=1,
                        variable2=2), #variable 3 deliberately left out

            list2=list  (variable1=variable1,
                        variable3=position changed,
                        variable2=variable2),

            list3=list  (variable1=character,
                        variable2=24,
                        variable4=FALSE)) #variable 3 deliberately left out

 #The outcome of the above execution is shown below:

 $list1
 $list1$variable1
 [1] 1

 $list1$variable2
 [1] 2
 #list1$variable3 is missing. Defaults in function not assigned in this
 execution

 $list2
 $list2$variable1
 [1] variable1

 $list2$variable3
 [1] position changed

 $list2$variable2
 [1] variable2


 $list3
 $list3$variable1
 [1] character

 $list3$variable2
 [1] 24

 $list3$variable4
 [1] FALSE
 #list3$variable3 is missing. Defaults in 

Re: [R] two questions for R beginners

2010-02-27 Thread Johannes Huesing
Dieter Menne dieter.me...@menne-biomed.de [Fri, Feb 26, 2010 at 08:39:14AM 
CET]:
 
 
 Patrick Burns wrote:
  
  * What were your biggest misconceptions or
  stumbling blocks to getting up and running
  with R?
  
  
 (This derives partly from teaching)
 
[...]
 
 The concept of environment. With S it was worse, though.
 

Agreed, though a beginner shouldn't be exposed to this aspect.
In the beginning you can analyse away before you are drowning
in variable names if you start with simple examples.

Which plotting parameters can be passed with basic plot functions,
and which ones have to be declared with par()? How do I 
set the min and max values for the x and y axis? (This 
aspect is drowned among all the options under ?par.)

Generally, the help pages are built like man pages, where
all options are given more or less equal consideration, even
if one option is used almost always and one only for esoteric
purposes. Given that help() is the most intuitive thing to
look for, it may be nice to include references to other
sources like rwiki if the respective page is good, even if
it may be disruptive wrt display device.

-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Computing Probit Marginal Effects

2010-02-27 Thread Peter Ehlers


On 2010-02-27 1:38, (Ted Harding) wrote:

On 27-Feb-10 03:52:19, Cardinals_Fan wrote:


Hi,  I am a stata user trying to transition to R.  Typically I
compute marginal effects plots for (example) probit models by
drawing simulated betas by using the coefficient/standard error
estimates after I run a probit model. I then use these simulated
betas to compute first difference marginal effects.  My question
is, can I do this in R?  Specifically, I was wondering if anyone
knows how R stores the coefficient/standard error estimates after
you estimate the model?  I assume it's  vector, but what is it
called?

Cheers
--


Here is an example which sets up (X,Y) data using a probit mechaism,
then fits a probit model, and then extracts the information which
you seek.

   set.seed(54321)
   X- 0.2*(-10:10)
   U- rnorm(21)
   Y- 1*(U= X)  ## binary outcome 0/1, = 1 if N(0,1)= X
   GLM- glm(Y ~ X, family=binomial(link=probit)) ## fit a probit
   Coef- summary(GLM)$coef  ## apply summary() to the fit

GLM is a list with a large number of components: enter the command

   str(GLM)

and have a look at what you get! Only a few of these are displayed
when you apply print() to it:

   print(GLM)
   # Call:  glm(formula = Y ~ X, family = binomial(link = probit))
   # Coefficients:
   # (Intercept)X
   # 0.08237  0.56982
   #
   # Degrees of Freedom: 20 Total (i.e. Null);  19 Residual
   # Null Deviance:  29.06
   # Residual Deviance: 23.93AIC: 27.93

Note that you do *not* get Standard Errors from this.

However, all the information in GLM is available for processing
by other functions. In particular, summary(GLM) produces another
list with several components -- have a look at the output from

   str(summary(GLM))

One of these components (listed near the end of this output)
is coef, and it can be accessed as summary(GLM)$coef as in the
above command

   Coef- summary(GLM)$coef

This is a matrix (in this case 2 named rows, 4 named columns):

   Coef
   #  Estimate Std. Error   z value   Pr(|z|)
   # (Intercept) 0.0823684  0.2974595 0.2769063 0.78185207
   # X   0.5698200  0.2638657 2.1595076 0.03081081

So there is one row for each coefficient in the model (here 2,
one for Intercept, one for variable X), and four columns
(for the Estimate itself of the coefficient, for its Standard
Error, for the z-value (Est/SE), and for the P-value).

Hence you can access the estimates as

   Coef[,1]   # (the first column of the matrix)
   # (Intercept)   X
   #   0.0823684   0.5698200

and their respective SEs as

   Coef[,2]   # (the second column of the matrix)
   # (Intercept)   X
   #   0.2974595   0.2638657

I have spelled this out in detail to demonstrate that the key
to accessing information in objects constructed by R lies in
its structures (especially lists, vectors and matrices). You
can find out what is involved for any function by looking for
the section Value in its help page. For instance, the function
summary() when applied to a GLM uses the method summary.glm(),
so you can enter the command

   ?summary.glm

and then read what is in the section Value. This shows that
it is a list with components whose names are

   call, family, deviance, ... , coefficients, ... , symbolic.cor

and a component with name Name can be accessed using $Name as
in GLM$coef (you can use coef instead of coefficients since
the first four letters are [more than] enough to identify the name
uniquely).


I would just add one suggestion to Ted's excellent tutorial:
R has the extractor function(s) coef() for getting the coefficients
(and SEs) for various types of models.

coef(GLM)
coef(summary(GLM))

While these will produce precisely the same output in the above
example, they may be the better way to go with, say, nonlinear
models. Using the first example in ?nls:

DNase1 - subset(DNase, Run == 1)
fm1DNase1 - nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)

fm1DNase1$coef
# NULL  # - probably not what was expected

coef(fm1DNase1)
# Asym xmid scal
# 2.345180 1.483090 1.041455

Of course, looking at str(fm1DNase1) would show that there is no
component called coefficients, but it might take a bit of head
scratching to realize that component m has as a subcomponent
the getAllPars() function which produces the ouput given by
coef(fm1DNase1).

I would recommend using extractor funtions like coef(), resid(),
etc. where available.

  -Peter Ehlers


Once you get used to this, things become straightforward!
Ted.


E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 27-Feb-10   Time: 08:38:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] reading data from web data sources

2010-02-27 Thread Tim Coote

Hullo
I'm trying to read some time series data of meteorological records  
that are available on the web (eg http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat) 
. I'd like to be able to read in the digital data directly into R.  
However, I cannot work out the right function and set of parameters to  
use.  It could be that the only practical route is to write a parser,  
possibly in some other language,  reformat the files and then read  
these into R. As far as I can tell, the informal grammar of the file is:


comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+

and the daily readings are of the form:
whitespace day number [whitespace reading on day of month] 12

Readings for days in months where a day does not exist have special  
values. Missing values have a different special value.


And then I've got the problem of iterating over all relevant files to  
get a whole timeseries.


Is there a way to read in this type of file into R? I've read all of  
the examples that I can find, but cannot work out how to do it. I  
don't think that read.table can handle the separate sections of data  
representing each year. read.ftable maybe can be coerced to parse the  
data, but I cannot see how after reading the documentation and  
experimenting with the parameters.


I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

Any help/suggestions would be greatly appreciated. I can see that this  
type of issue is likely to grow in importance, and I'd also like to  
give the data owners suggestions on how to reformat their data so that  
it is easier to consume by machines, while being easy to read for  
humans.


The early records are a serious machine parsing challenge as they are  
tiff images of old notebooks ;-)


tia

Tim
Tim Coote
t...@coote.org
vincit veritas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Overlap plot

2010-02-27 Thread abotaha

Hello, 

I have plot in R (which is curve during time series) and it is working well.
i want to add a circle symbol to one place within the plot but i do not know
how to do that?

I used matplot() because i have many data in the plot. 

any help please, 

cheers

-- 
View this message in context: 
http://n4.nabble.com/Overlap-plot-tp1571803p1571803.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Gabor Grothendieck
Try this.  First we read the raw lines into R using grep to remove any
lines containing a character that is not a number or space.  Then we
look for the year lines and repeat them down V1 using cumsum.  Finally
we omit the year lines.

myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)


On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org wrote:
 Hullo
 I'm trying to read some time series data of meteorological records that are
 available on the web (eg
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat). I'd
 like to be able to read in the digital data directly into R. However, I
 cannot work out the right function and set of parameters to use.  It could
 be that the only practical route is to write a parser, possibly in some
 other language,  reformat the files and then read these into R. As far as I
 can tell, the informal grammar of the file is:

 comments terminated by a blank line
 [year number on a line on its own
 daily readings lines ]+

 and the daily readings are of the form:
 whitespace day number [whitespace reading on day of month] 12

 Readings for days in months where a day does not exist have special values.
 Missing values have a different special value.

 And then I've got the problem of iterating over all relevant files to get a
 whole timeseries.

 Is there a way to read in this type of file into R? I've read all of the
 examples that I can find, but cannot work out how to do it. I don't think
 that read.table can handle the separate sections of data representing each
 year. read.ftable maybe can be coerced to parse the data, but I cannot see
 how after reading the documentation and experimenting with the parameters.

 I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

 Any help/suggestions would be greatly appreciated. I can see that this type
 of issue is likely to grow in importance, and I'd also like to give the data
 owners suggestions on how to reformat their data so that it is easier to
 consume by machines, while being easy to read for humans.

 The early records are a serious machine parsing challenge as they are tiff
 images of old notebooks ;-)

 tia

 Tim
 Tim Coote
 t...@coote.org
 vincit veritas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlap plot

2010-02-27 Thread jim holtman
points(x, y, pch=1, cex=10)

adjust cex to the size circle you want.

On Sat, Feb 27, 2010 at 4:22 AM, abotaha yaseen0...@gmail.com wrote:

 Hello,

 I have plot in R (which is curve during time series) and it is working well.
 i want to add a circle symbol to one place within the plot but i do not know
 how to do that?

 I used matplot() because i have many data in the plot.

 any help please,

 cheers

 --
 View this message in context: 
 http://n4.nabble.com/Overlap-plot-tp1571803p1571803.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread John Sorkin
I don't think I am a tyro but neither am I a wizard. This being said R has a 
number of aspects that make it difficult.

Error messages that are not helpful
Manual pages that are written in Martin.
Lack of examples on some manual pages
Lack of comments in code

There are other hurdles. The concept of vectorization and its related syntax 
took a long time to understand.
John
John Sorkin
jsor...@grecc.umaryland.edu 
-Original Message-
From: Saeed Abu Nimeh sabun...@gmail.com
Cc:  r-help@r-project.org
To:  ivan.calan...@uni-hamburg.de

Sent: 2/26/2010 11:36:38 PM
Subject: Re: [R] two questions for R beginners

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:
 You are definitely right...
 What to do with bad beginner's questions is not a simple issue.

 If a beginner's mailing list is created, who will answer to such
 questions?

If I subscribe to the beginners mailing list, then I have to expect 
novice questions and I should be willing to help. Otherwise, I should 
not be there.

And moreover, the beginners won't take advantage of the other
 questions (I've personally learned a lot trying to understand the
 questions and answers to other's problems).

They can still subscribe to the advanced, but they will know that they 
are here to observe and learn, not to ask novice questions. You want to 
ask basic stuff, go to the beginners list :)

Not sure if you guys have been on some of the linux mailing lists out 
there, but man let me tell you, some of these lists have a RTFM attitude 
and they will fry you if you ask novice questions. Frankly, that is 
understandable, as most of the members are geeks and they have higher 
expectations. This mailing list is different, I have seen posts from 
different disciplines; biology, biostats, stats, computer science, 
oceanography, etc. So, IMO, there should be a beginners list to cope 
with such broad committee.

Thanks,
Saeed

And also, as you said, the
 problems might persist.
 The beginner's mailing list might be good in one aspect though: the
 experts who subscribe to it would be willing to help the beginners to
 get started with R, knowing that the questions might not be clearly stated.

 As you pointed out, the mailing list is not the best for basic stuff
 (the question is of course what is basic?). Not everybody knows some
 colleagues who work with R (I'm personally the 1st one to use R in my lab).
 I think, somehow and I have no idea how, documentation and guidance to
 search for help should be more accessible as soon as you start with R.
 Maybe a _*clear*_ section on the R homepage or in the introduction to
 R manual like where to find help, including all of the most common
 and useful resources available (from ? and RSiteSearch() to R Wiki and
 Crantastic).

 I hope that this whole discussion might help to make the R world better.
 Thank you Patrick for initiating it!
 Regards,
 Ivan

 Le 2/26/2010 15:09, Paul Hiemstra a écrit :
 Ivan Calandra wrote:
 Since you want input from beginners, here are some thoughts

 I had and still have two big problems with R:
 - this vectorization thing. I've read many manuals (including R
 inferno), but I'm still not completely clear about it. In simple
 examples, it's fine. But when it gets a bit more complex, then...
 Related to it, the *apply functions are still a bit difficult to
 understand. When I have to use them, I just try one and see what
 happens. I don't understand them well enough to know which one I need.
 - the second problem is where to find the functions/packages I need.
 There are many options, and that's actually the problem. R Wiki,
 Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
 discover that the capabilities of R are almost unlimited and you
 don't really know where to start, where to find what you need.

 As noted in earlier posts, the mailing list is really great, but some
 people are really hard with beginners. It was noted in a discussion a
 few days ago, but it looks like some don't realize how difficult it
 is at the beginning to formulate a good question, clear, with
 self-contained example and so on. Moreover, not everybody speaks
 English natively. I don't mean that you must help, even when the
 question is really vague and not clear and whatever. I'm just saying
 that if you don't want to help (whatever the reason), you don't have
 to say it badly. But in any cases, the mailing list is still really
 helpful. As someone noted (sorry I erased the email so I don't
 remember who), it might be a good idea to split it.
 Hi everyone,

 My 2ct about the mailing list :). I understand that beginners have a
 hard time formulating a good question. But the problem is that we
 can't answer the question when it is unclear. So either I:

 - Don't bother answering
 - Try do discuss with the author of the question, taking lots of time
 to find out what exactly is the question.
 - Send a read the posting guide answer

 I mostly do the first, as I have to get things done during my PhD 

Re: [R] R Aerodynamic Package(s)?

2010-02-27 Thread Jason Rupert
I received zero responses to this post, so I guess this confirms that R is not 
the correct target language for this project.  

Maybe Octave is better suited...

Thank you again.



- Original Message 
From: Jason Rupert jasonkrup...@yahoo.com
To: R-help@r-project.org
Cc: Me jasonkrup...@yahoo.com
Sent: Tue, February 23, 2010 6:33:18 AM
Subject: R Aerodynamic Package(s)?

By any chance is anyone aware of any R Packages that contain or expand the 
aerodynamic capabilities mentioned on the following website? 

http://www.aoe.vt.edu/~mason/Mason_f/MRsoft.html


Typically I know R packages have focused on extending the statistical and 
graphing capability within R, so I was just curious if there might be a package 
that contains some aerodynamics.  

If by chance there isn't a package, is there any interest, in the development 
of a package or the use of a such an R package?

Just curious what the user/developer community thought about this?  I know 
MatLab and proprietary software executables is the typical place where 
aerodynamic analysis is performed, so any feedback about such a package 
existing/being created in R is great. 

Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preserving lists in a function

2010-02-27 Thread Barry Rowlingson
On Sat, Feb 27, 2010 at 11:29 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Or use modifyList which is in the core of R.

 All these solutions appear to be adding on more and more code with
less and less semantics. Result: messy code which is harder to read
and debug.

 It seems that the arguments should have proper constructors with
meaningful names. A bit like an optimisation function with a control
argument. What's better:

 o = optimmy(f, control=list(niter=10,ftol=0.01))

or

o = optimmy(f,control=control(niter=10,ftol=0.01))

 here you are explicitly constructing a 'control' object that has the
options for controlling the optimiser, and it will have its own
sensible defaults which the user can selectively override. It seems to
be the correct paradigm for collecting related parameters, and indeed
is used in lots of R code.

 Done this way you get selectively overridable arguments, a meaningful
name, readable code, leveraging the built-in defaults system, and the
possibility of sticking consistency checks in your control() function.

 Tell me where that is not made of pure win?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest

2010-02-27 Thread Dror

Hi,
I'm working with randomForest package and i have 2 questions:
1. how can i drop a specific tree from the forest?
2. i'm trying to get the voting of each tree in a prediction datum using the
folowing code
pr-predict(RF,NewData,type=prob,predict.all=TRUE)
my forest has 300 trees and i get lower number of votes:
 length(pr$individual)
[1] 275
  RF
Call:
 randomForest(formula = RFformula, data = adult, ntree = 300,  mtry = 1,
keep.forest = TRUE) 
   Type of random forest: classification
 Number of trees: 300
No. of variables tried at each split: 1
Am i doing something wrong? how can i know which of the 300 trees didn't
cast a vote?

Thanks in advance
Dror
-- 
View this message in context: 
http://n4.nabble.com/Random-Forest-tp1557464p1571952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple main effect.

2010-02-27 Thread Or Duek
I am very new to R and thus find those examples a bit confusing although I
believe the solution to my problems lies there.
Lets take for example an experiment in which I had two between subject
variables - Strain and treatment, and one within - exposure. all the
variables had 2 levels each.

I found an interaction between exposure and Strain and I want to compare
Strain A and B under every exposure (first and second).
The general model was with that function:
aov(duration~(Strain*exposure*treatment)+Error(subject/exposure),data)

in summary(aovmodel) there was a significant interaction between exposure
and strain.
how (using those HH packages) can I compare Strains under the conditions of
exposure?


BTW - I don't have to use aov (although its seems to be the simplest one).

Thank you very much.


On Mon, Dec 21, 2009 at 12:16 AM, Richard M. Heiberger r...@temple.eduwrote:

 For simple effects in the presence of interaction there are several
 options included in the HH package.  If you don't already have the HH
 package, you can get it with
  install.packages(HH)

 Graphically, you can plot them with the function
  interaction2wt(..., simple=TRUE)
 See the examples in
  ?HH::interaction2wt

 For tests on the simple effect of A conditional on a level of B, you
 can use the model formula B/A and look at the partition of the sums of
 squares using the split= argument
  summary(mymodel.aov, split=put your details here)

 For multiple comparisons from designs with Error() terms, you need to
 specify the same sums of squares with an equivalent formula that doesn't
 use the Error() function.  See the maiz example in
  ?HH::MMC
 Read the example all the way to the end of the help file.

 Rich



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in mvpart example

2010-02-27 Thread Peter Ehlers

On 2010-02-26 6:55, Wearn, Oliver wrote:

Dear all,

I'm getting an error in one of the stock examples in the 'mvpart' package. I 
tried:

require(mvpart)
data(spider)
fit3- 
rpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water+twigs+reft+herbs+moss+sand,spider,method=dist)
 #directly from ?rpart
summary(fit3)

...which returned the following:

Error in apply(formatg(yval, digits - 3), 1, paste, collapse = ,, sep = ) :
   dim(X) must have a positive length

This seems to be a problem with the cross-validation, since the xerror and 
xstd columns are missing from the summary table as well.

Using the mpart() wrapper results in the same error:

fit4-mvpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water+twigs+reft+herbs+moss+sand,spider,method=dist)
summary(fit4)

Note, changing the 'method' argument to =mrt seems, superficially, to solve the 
problem. However, when the dependent variable is a dissimilarity matrix, shouldn't 
method=dist be used (as per the examples)?

Thanks, in advance, for any help on this error.

Oliver


The cross-validation idea is a red herring; the documentation clearly
states:
  Weights and cross-validation are currently not implemented for
  method=dist.

The error message provides a clue: apply() is not happy with what it's
being fed. Since it mentions dim, we can guess that the problem is
with the X in apply(X, .).
This in turn suggests that formatg() may not be returning an array
and indeed in your example it returns a vector. I don't know what will
be broken if the last line in formatg() is changed to force the
returned value to be a matrix, but this will work for your example:


formatg -
function(x, digits= unlist(options('digits')),
 format= paste(%., digits, g, sep='')) {
if (!is.numeric(x)) stop(x must be a numeric vector)

n - length(x)
#
# the resultant strings could be up to 8 characters longer,
#   assume that digits =4,  -0.e+104 is a worst case, where
#    are the 4 significant digits.
dummy  - paste(rep( , digits+8), collapse='')
temp - .C(formatg, as.integer(n),
  as.double(x),
  rep(format,n),
  out= rep(dummy, n), NAOK=TRUE,
   PACKAGE=mvpart)$out
if (is.matrix(x)) matrix(temp, nrow=nrow(x))
#else temp
else matrix(temp, nrow=1)
}

Source this and

  summary(fit3)

seem to return reasonable values.

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preserving lists in a function

2010-02-27 Thread baptiste auguie
Point well taken --- grid::gpar() is also a good example; I'll make
use of your suggestion in my future coding.

Best,

baptiste

On 27 February 2010 15:02, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Sat, Feb 27, 2010 at 11:29 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 Or use modifyList which is in the core of R.

  All these solutions appear to be adding on more and more code with
 less and less semantics. Result: messy code which is harder to read
 and debug.

  It seems that the arguments should have proper constructors with
 meaningful names. A bit like an optimisation function with a control
 argument. What's better:

  o = optimmy(f, control=list(niter=10,ftol=0.01))

 or

 o = optimmy(f,control=control(niter=10,ftol=0.01))

  here you are explicitly constructing a 'control' object that has the
 options for controlling the optimiser, and it will have its own
 sensible defaults which the user can selectively override. It seems to
 be the correct paradigm for collecting related parameters, and indeed
 is used in lots of R code.

  Done this way you get selectively overridable arguments, a meaningful
 name, readable code, leveraging the built-in defaults system, and the
 possibility of sticking consistency checks in your control() function.

  Tell me where that is not made of pure win?

 Barry


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R from Java (cluster heatmaps)

2010-02-27 Thread Rameswara Sashi Kiran Challa
Hello All,

I am trying to get cluster heatmaps using R from Java in my application.
I got the Rserve using which I am able to make TCP/IP connection to R.

I am trying to send a double[][] array (say 5x8 dimensions) to R and convert
it into matrix using as.matrix() function in R. Is it correct to do this?
Can I directly pass this array to dist() function to generate the distance
matrix ?  if not could someone please direct me how to do it ?
I want to be able to pass the matrix into R, compute a distance matrix using
dist() and then plot hierarchial cluster using hclust() and then further
plot cluster heatmaps calling the bioconductor library. Is Rserve enough for
this or will I also need rJava ?

Please Reply

Thanks


-- 
Sashikiran Challa
MS Cheminformatics,
School of Informatics and Computing,
Indiana University, Bloomington,IN
scha...@indiana.edu
812-606-3254

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in mvpart example

2010-02-27 Thread Glenn De'ath
Thanks for the info -- I'll check it out ASAP.

Regards

Glenn

+++
Glenn De'ath
Principal Research Scientist
Australian Institute of Marine Science
Ph: +61-7-4758-1747; +61-7-4753-4314

In a world without walls and fences, who needs windows and gates.

+




-Original Message-
From: Gavin Simpson [mailto:gavin.simp...@ucl.ac.uk]
Sent: Sat 27-Feb-10 8:43 PM
To: Wearn, Oliver
Cc: r-help@r-project.org; Glenn De'ath
Subject: Re: [R] Error in mvpart example
 
These functions (rpart, the mvpart wrapper, and summary.rpart) are
fairly complex doing many things.

For contributed packages you'd be best served by contacting the
author/maintainer. I've CC'd Glenn (the maintainer) here.

HTH

G

On Fri, 2010-02-26 at 13:55 +, Wearn, Oliver wrote:
 Dear all,
 
 I'm getting an error in one of the stock examples in the 'mvpart'
 package. I tried:
 
 require(mvpart)
 data(spider)
 fit3 - rpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water
 +twigs+reft+herbs+moss+sand,spider,method=dist) #directly
 from ?rpart
 summary(fit3)
 
 ...which returned the following:
 
 Error in apply(formatg(yval, digits - 3), 1, paste, collapse = ,,
 sep = ) : 
   dim(X) must have a positive length
 
 This seems to be a problem with the cross-validation, since the
 xerror and xstd columns are missing from the summary table as
 well.
 
 Using the mpart() wrapper results in the same error:
 
 fit4-mvpart(gdist(spider[,1:12],meth=bray,full=TRUE,sq=TRUE)~water
 +twigs+reft+herbs+moss+sand,spider,method=dist)
 summary(fit4)
 
 Note, changing the 'method' argument to =mrt seems, superficially,
 to solve the problem. However, when the dependent variable is a
 dissimilarity matrix, shouldn't method=dist be used (as per the
 examples)?
 
 Thanks, in advance, for any help on this error.
 
 Oliver
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%





--  

The information contained in this communication is for t...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] somebody help me about this error message...

2010-02-27 Thread John Kane

paste(a,2,sep=) is simply creating a new character a2

Why not just 
a2 - 4
?

--- On Sat, 2/27/10, Joseph Lee seokhyun...@gmail.com wrote:

 From: Joseph Lee seokhyun...@gmail.com
 Subject: [R] somebody help me about this error message...
 To: r-help@r-project.org
 Received: Saturday, February 27, 2010, 12:13 AM
 
 I created variables automatically like this way
 
 for(i in 1:5){
     nam - paste(a,i,sep=)
     assign(nam,1:i)
 }
 
 and then, i want to insert a new data into a2 variable.
 so, i did next
 sentence
 
 paste(a,2,sep=) - 4
 
 so, i got this error message
 
 Error in get(paste(a, 2, sep = ))[1] - 4 : 
   target of assignment expands to non-language object
 
 anyone knows abou this error message and tell me how to
 solve thie problem,
 please..
 -- 
 View this message in context: 
 http://n4.nabble.com/somebody-help-me-about-this-error-message-tp1571700p1571700.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 


  __
Looking for the perfect gift? Give the gift of Flickr! 

http://www.flickr.com/gift/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable

2010-02-27 Thread Greg Snow
Here is another approach (I think this is the simplest):

daylkp - c(SAT=1, SUN=2, MON=3, TUE=4, WED=5, THU=6, FRI=7)

tmp.in - sample( names(daylkp), 25, TRUE )
tmp.out - daylkp[tmp.in]

names(tmp.out) - NULL # optional


hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Steve Matco
 Sent: Friday, February 26, 2010 12:32 PM
 To: r-help@r-project.org
 Subject: [R] How to add a variable to a dataframe whose values are
 conditional upon the values of an existing variable
 
 Hi everyone,
 
 I am at my wits end with what I believe would be considered simple by a
 more experienced R user. I want to know how to add a variable to a
 dataframe whose values are conditional on the values of an
 existing variable. I can't seem to make an ifelse statement work for my
 situation. The existing variable in my dataframe is a character
 variable named DOW which contains abbreviated day names (SAT, SUN,
 MON.FRI). I want to add a numerical variable named DOW1 to my
 dataframe that will take on the value 1 if DOW equals SAT, 2 if DOW
 equals SUN, 3 if DOW equals MON,.,7 if DOW equals FRI.
 I  know this must be a simple problem but I have searched everywhere
 and tried everything I could think of. Any help would be greatly
 appreciated.
 
 Thank you,
 
 Mike
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Newbie help with ANOVA and lm.

2010-02-27 Thread rkevinburton
Would someone be so kind as to explain in English what the ANOVA code 
(anova.lm) is doing? I am having a hard time reconciling what the text books 
have as a brute force regression and the formula algorithm in 'R'. Specifically 
I see:

p - object$rank
if (p  0L) {
p1 - 1L:p
comp - object$effects[p1]
asgn - object$assign[object$qr$pivot][p1]
nmeffects - c((Intercept), attr(object$terms, term.labels))
tlabels - nmeffects[1 + unique(asgn)]
ss - c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
df - c(unlist(lapply(split(asgn, asgn), length)), dfr)
}
else {
ss - ssr
df - dfr
tlabels - character(0L)
}
ms - ss/df
f - ms/(ssr/dfr)
P - pf(f, df, dfr, lower.tail = FALSE)
 

I think I understand the check for 'p' being non-zero. 'p' is essentially the 
number of terms in the model matrix (including the intercept term if it 
exists). So in a mathematical description of a regression that included the 
intercept and one term (like dist ~ speed) you would have a model matrix of a 
column of '1's and then a column of data. The 'assign' would be a vector 
containing [0,1]. So then in finding the degrees of freedom you split the 
asssign matrix with itself. I am having a hard time seeing that this ever 
produces degrees of freedom that are different. So I get that the vector 'df' 
would always be something like [2,2,dfr]. But that is obviously wrong. Would 
someone care to elighten me on what the code above is doing?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple main effect.

2010-02-27 Thread DrorD
I tried to implement Ista's procedure and would like to provide it as
a working example, with the intention to get feedback from the R
community:

The data contains three variables:
One dependent var: t.total
and two independent vars: group (between: D2C2, C2D2) and present.type
(within: C2, D2).

# First I do the overall ANOVA:
m.full=aov(t.total ~ group * present.type + Error(subj/present.type),
data=dat.net)
summary(m.full)

Error: subj
  Df Sum Sq Mean Sq F value Pr(F)
group  1   14301430  0.4224  0.528
Residuals 12  406343386
Error: subj:present.type
   Df  Sum Sq Mean Sq F valuePr(F)
present.type1   603.1   603.1  0.7988 0.3890145
group:present.type  1 22775.8 22775.8 30.1661 0.0001379 ***
Residuals  12  9060.1   755.0
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
   Df Sum Sq Mean Sq F value Pr(F)
Residuals 840 148493 177
---

Now, since the interaction is significant, I want to compute two
simple main effects: to find out if there is a significant difference
between C2 and D2 (present.type var) (i) in group D2C2 and then also
(ii) in group C2D2 (won't be shown to avoid redundancy). To achieve
that:

(1) I run the model separately for each level of group:

dat.g1 = subset(dat.net, group==D2C2)
m.g1 = aov(t.total ~ present.type + Error(subj/present.type),
data=dat.g1)
summary(m.g1)

Error: subj
  Df Sum Sq Mean Sq F value Pr(F)
Residuals  6  227883798
Error: subj:present.type
 Df  Sum Sq Mean Sq F value   Pr(F)
present.type  1 15395.8 15395.8  18.694 0.004963 **
Residuals 6  4941.4   823.6
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
   Df Sum Sq Mean Sq F value Pr(F)
Residuals 420  80658 192
---

(2) I use the error term from the overall model (dat.net) to calculate
the MS-Error term:

MS-Effect(from model m.g1) for present.type = 15395.8 with df = 1
MS-Error(from model m.full) for present.type = 755.0 with df = 12
(from Error: subj:present.type)

so we have F(1,12) = 15395.8 / 755.0
which means F = 20.4 and to calculate p-sig:
1 - pf(20.4,1,12)
- p=0.0007070375

Well, is this the way to do it?
Is it equivalent to or different from using the HH package?

Thanks in advance and best to all,
dror

--

On Feb 27, 4:18 pm, Or Duek ord...@gmail.com wrote:
 I am very new to R and thus find those examples a bit confusing although I
 believe the solution to my problems lies there.
 Lets take for example an experiment in which I had two between subject
 variables - Strain and treatment, and one within - exposure. all the
 variables had 2 levels each.

 I found an interaction between exposure and Strain and I want to compare
 Strain A and B under every exposure (first and second).
 The general model was with that function:
 aov(duration~(Strain*exposure*treatment)+Error(subject/exposure),data)

 in summary(aovmodel) there was a significant interaction between exposure
 and strain.
 how (using those HH packages) can I compare Strains under the conditions of
 exposure?

 BTW - I don't have to use aov (although its seems to be the simplest one).

 Thank you very much.

 On Mon, Dec 21, 2009 at 12:16 AM, Richard M. Heiberger r...@temple.eduwrote:





  For simple effects in the presence of interaction there are several
  options included in the HH package.  If you don't already have the HH
  package, you can get it with
   install.packages(HH)

  Graphically, you can plot them with the function
   interaction2wt(..., simple=TRUE)
  See the examples in
   ?HH::interaction2wt

  For tests on the simple effect of A conditional on a level of B, you
  can use the model formula B/A and look at the partition of the sums of
  squares using the split= argument
   summary(mymodel.aov, split=put your details here)

  For multiple comparisons from designs with Error() terms, you need to
  specify the same sums of squares with an equivalent formula that doesn't
  use the Error() function.  See the maiz example in
   ?HH::MMC
  Read the example all the way to the end of the help file.

  Rich

         [[alternative HTML version deleted]]

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scan and skip - without line breaks in the input file

2010-02-27 Thread Balzer Susanne
Dear all,

I am trying to read in big amounts of data with scan. It's only one variable, 
numeric values, separated by tabs,.. and it's many of them. So I was thinking 
that I could use the skip option and read in 10 values at a time - but skip 
doesn't work, probably because I don't have line breaks in the txt file. So any 
value specified for skip makes the scan function jump to the end of the file.

Does anyone have a good idea? I would be extremely grateful.

Kind regards,

Susanne Balzer




Susanne Balzer
PhD Student
Institute of Marine Research
N-5073 Bergen, Norway
Phone: +47 55 23 69 45
susanne.bal...@imr.no
www.imr.no

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 11:24 AM, Balzer Susanne wrote:


Dear all,

I am trying to read in big amounts of data with scan. It's only one  
variable, numeric values, separated by tabs,.. and it's many of  
them. So I was thinking that I could use the skip option and read in  
10 values at a time - but skip doesn't work, probably because I  
don't have line breaks in the txt file. So any value specified for  
skip makes the scan function jump to the end of the file.


?scan

Without a working example it is hard to be sure, but it appears from a  
rapid look at the help page that nmax is the argument you want.


 scan(textConnection('1 2 3 4 5 6 7'), nmax=4)
Read 4 items
[1] 1 2 3 4


(Ignores line-feeds)
 scan(textConnection('1 2 \n 3 4 5 6 7'), nmax=4)
Read 4 items
[1] 1 2 3 4


--
David.


Does anyone have a good idea? I would be extremely grateful.

Kind regards,

Susanne Balzer




Susanne Balzer
PhD Student
Institute of Marine Research
N-5073 Bergen, Norway
Phone: +47 55 23 69 45
susanne.bal...@imr.no
www.imr.no

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Defective help pages

2010-02-27 Thread Duncan Murdoch

On 26/02/2010 3:22 PM, Peter Danenberg wrote:

This seems to be plain text help, right?


It is.


Does the html version give the same result?


Interestingly, the html seems to be whole; but it's less convenient to
access from ESS, though.

Do you know what program generates the plain text; and are there any
options that govern where R looks for the plain text help files?


As of R 2.10.0, the plain text is generated by the tools::Rd2txt 
function from the same source as the HTML, i.e. the parsed Rd files 
stored in the .rdb file in the package help directories.


It looks as though ESS is the problem here, but I don't really know what 
it is doing that could cause the symptoms you saw.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie help with ANOVA and lm.

2010-02-27 Thread Peter Ehlers

On 2010-02-27 8:53, rkevinbur...@charter.net wrote:

Would someone be so kind as to explain in English what the ANOVA code 
(anova.lm) is doing? I am having a hard time reconciling what the text books 
have as a brute force regression and the formula algorithm in 'R'. Specifically 
I see:

 p- object$rank
 if (p  0L) {
 p1- 1L:p
 comp- object$effects[p1]
 asgn- object$assign[object$qr$pivot][p1]
 nmeffects- c((Intercept), attr(object$terms, term.labels))
 tlabels- nmeffects[1 + unique(asgn)]
 ss- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
 df- c(unlist(lapply(split(asgn, asgn), length)), dfr)
 }
 else {
 ss- ssr
 df- dfr
 tlabels- character(0L)
 }
 ms- ss/df
 f- ms/(ssr/dfr)
 P- pf(f, df, dfr, lower.tail = FALSE)


I think I understand the check for 'p' being non-zero. 'p' is essentially the 
number of terms in the model matrix (including the intercept term if it 
exists). So in a mathematical description of a regression that included the 
intercept and one term (like dist ~ speed) you would have a model matrix of a 
column of '1's and then a column of data. The 'assign' would be a vector 
containing [0,1]. So then in finding the degrees of freedom you split the 
asssign matrix with itself. I am having a hard time seeing that this ever 
produces degrees of freedom that are different. So I get that the vector 'df' 
would always be something like [2,2,dfr]. But that is obviously wrong. Would 
someone care to elighten me on what the code above is doing?



split(asgn, asgn) splits the vector (not matrix) 'asgn' into
list components. Then lapply() applies length() to each list
component which gives the associated degrees of freedom.
unlist() removes the list structure, producing a vector of dfs.
For simple regression, this results in c(1,1). The residual
dfs are then tacked on to give the df-vector df=c(1,1,dfr).
For models with an intercept the first component of df should
always be 1. But this is discarded in the output matrix.

With two numerical predictors: y ~ x1 + x2,
you should find that asgn = c(0,1,2) leading to df = c(1,1,1,dfr).

  -Peter Ehlers


Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using grep

2010-02-27 Thread Greg Snow
Look at the gsubfn package, it gives more options and will probably make what 
you are trying to do easier.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of kayj
 Sent: Friday, February 26, 2010 11:27 AM
 To: r-help@r-project.org
 Subject: [R] using grep
 
 
 Hi All,
 
 I have a character vector with naems of cities in the us. I need to
 extract
 the number that appear after the word New York, for example,
 
 x-c(P Los Angeles44AZ, P New York722AZ, K New York20)
 
 I want the results to be
 
 722, 20
 
 
 cab I use the grep function, if so how?
 I appreciate your help, thanks,
 
 --
 View this message in context: http://n4.nabble.com/using-grep-
 tp1571102p1571102.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Gabor Grothendieck
Mark Leeds pointed out to me that the code wrapped around in the post
so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]


On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Try this.  First we read the raw lines into R using grep to remove any
 lines containing a character that is not a number or space.  Then we
 look for the year lines and repeat them down V1 using cumsum.  Finally
 we omit the year lines.

 myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
 raw.lines - readLines(myURL)
 DF - read.table(textConnection(raw.lines[!grepl([^
 0-9.],raw.lines)]), fill = TRUE)
 DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
 DF - na.omit(DF)
 head(DF)


 On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org 
 wrote:
 Hullo
 I'm trying to read some time series data of meteorological records that are
 available on the web (eg
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat). I'd
 like to be able to read in the digital data directly into R. However, I
 cannot work out the right function and set of parameters to use.  It could
 be that the only practical route is to write a parser, possibly in some
 other language,  reformat the files and then read these into R. As far as I
 can tell, the informal grammar of the file is:

 comments terminated by a blank line
 [year number on a line on its own
 daily readings lines ]+

 and the daily readings are of the form:
 whitespace day number [whitespace reading on day of month] 12

 Readings for days in months where a day does not exist have special values.
 Missing values have a different special value.

 And then I've got the problem of iterating over all relevant files to get a
 whole timeseries.

 Is there a way to read in this type of file into R? I've read all of the
 examples that I can find, but cannot work out how to do it. I don't think
 that read.table can handle the separate sections of data representing each
 year. read.ftable maybe can be coerced to parse the data, but I cannot see
 how after reading the documentation and experimenting with the parameters.

 I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

 Any help/suggestions would be greatly appreciated. I can see that this type
 of issue is likely to grow in importance, and I'd also like to give the data
 owners suggestions on how to reformat their data so that it is easier to
 consume by machines, while being easy to read for humans.

 The early records are a serious machine parsing challenge as they are tiff
 images of old notebooks ;-)

 tia

 Tim
 Tim Coote
 t...@coote.org
 vincit veritas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using grep

2010-02-27 Thread Gabor Grothendieck
Here it is using strapply in gsubfn. x is the input, followed by the
regular expression which is just New York followed by a parenthesized
string of digits.  The parenthesized portion is passed to the
function, as.numeric, and then everything is simplified using c
(otherwise we would get a list as in similar R core functions such as
strsplit).

 strapply(x, New York(\\d+), as.numeric, simplify = c)
[1] 722  20

On Sat, Feb 27, 2010 at 12:25 PM, Greg Snow greg.s...@imail.org wrote:
 Look at the gsubfn package, it gives more options and will probably make what 
 you are trying to do easier.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of kayj
 Sent: Friday, February 26, 2010 11:27 AM
 To: r-help@r-project.org
 Subject: [R] using grep


 Hi All,

 I have a character vector with naems of cities in the us. I need to
 extract
 the number that appear after the word New York, for example,

 x-c(P Los Angeles44AZ, P New York722AZ, K New York20)

 I want the results to be

 722, 20


 cab I use the grep function, if so how?
 I appreciate your help, thanks,

 --
 View this message in context: http://n4.nabble.com/using-grep-
 tp1571102p1571102.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting IEEE Float in 16 char hex back to float

2010-02-27 Thread Duncan Murdoch

On 27/02/2010 12:43 AM, xlr82sas wrote:

Hi,

If I do the following

sprintf(%A,pi)
0X1.921FB54442D18

I have this 16 byte character string

hx-400921FB54442D18

This is the exact hex16 representation of PI in
IEEE float that R uses in Intel 32bit(little endian) Windows
SAS uses the same representation. 11 bit exponent and 53 bit mantissa.

I want to do is recreate the float exactly from the 16 char hex

something like

MyPI-readChar(hx,numeric(),16)

or in SAS

MyPI=input(400921FB54442D18,hex16.);
put MyPI=;

MYPI=3.1415926536

What I am trying to do is set up a lossless
transfer method from SAS to R


The way I would do it is to convert the hx string to raw bytes, then 
read the raw bytes as a binary value.  I think this works for one 
string; it would need some work to handle more than one:


hexdigits - function(s) {
   digits - 0:15
   names(digits) - c(0:9, LETTERS[1:6])
   digits[strsplit(s, )[[1]]]
}

bytes - function(s) {
   digits - matrix(hexdigits(s), ncol=2, byrow=TRUE)
   as.raw(digits %*% c(16,1))
}

todouble - function(bytes) {
   con - rawConnection(bytes)
   val - readBin(con, double, endian=big)
   close(con)
   val
}

todouble(bytes(400921FB54442D18))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 11:47 AM, Balzer Susanne wrote:


Hei David,

Thanks for your quick response, but unfortunately n and nmax alone  
don't do the job. If I want to read items no. 11 to 20, the  
n=10 option will work, but skip=10 (to NOT read the first  
10 items) won't.


Or with your example,

scan(textConnection('1 2 3 4 5 6 7'), skip=3) will never work, while


True.


scan(textConnection('1 2 3 4 \n 5 \n 6 \n 7'), skip=3) will. But I  
don't have line breaks in my file.


Right. That was what I was trying to help you deal with.



Is there no way to specify the character for a line break in scan /  
read.table / etc.?


Why are you fixating on linefeeds when you don't have any?

 closeAllConnections()
 tc - textConnection(paste(1:100, sep= , collapse= ))
 scan(tc, nmax=10)
Read 10 items
 [1]  1  2  3  4  5  6  7  8  9 10
 scan(tc, nmax=10)
Read 10 items
 [1] 11 12 13 14 15 16 17 18 19 20
 scan(tc, nmax=10)
Read 10 items
 [1] 21 22 23 24 25 26 27 28 29 30
 scan(tc, nmax=10)
Read 10 items
 [1] 31 32 33 34 35 36 37 38 39 40
 scan(tc, nmax=10)
Read 10 items
 [1] 41 42 43 44 45 46 47 48 49 50
 scan(tc, nmax=10)
Read 10 items
 [1] 51 52 53 54 55 56 57 58 59 60
 scan(tc, nmax=10)
Read 10 items
 [1] 61 62 63 64 65 66 67 68 69 70
 scan(tc, nmax=10)
Read 10 items
 [1] 71 72 73 74 75 76 77 78 79 80
 scan(tc, nmax=10)
Read 10 items
 [1] 81 82 83 84 85 86 87 88 89 90
 scan(tc, nmax=10)
Read 10 items
 [1]  91  92  93  94  95  96  97  98  99 100
 scan(tc, nmax=10)
Read 0 items
numeric(0)

--
David.






Kind regards,

Susanne


-Opprinnelig melding-
Fra: David Winsemius [mailto:dwinsem...@comcast.net]
Sendt: 27. februar 2010 17:38
Til: Balzer Susanne
Kopi: 'r-help@r-project.org'
Emne: Re: [R] scan and skip - without line breaks in the input file


On Feb 27, 2010, at 11:24 AM, Balzer Susanne wrote:


Dear all,

I am trying to read in big amounts of data with scan. It's only one
variable, numeric values, separated by tabs,.. and it's many of
them. So I was thinking that I could use the skip option and read in
10 values at a time - but skip doesn't work, probably because I
don't have line breaks in the txt file. So any value specified for
skip makes the scan function jump to the end of the file.


?scan

Without a working example it is hard to be sure, but it appears from a
rapid look at the help page that nmax is the argument you want.


scan(textConnection('1 2 3 4 5 6 7'), nmax=4)

Read 4 items
[1] 1 2 3 4


(Ignores line-feeds)

scan(textConnection('1 2 \n 3 4 5 6 7'), nmax=4)

Read 4 items
[1] 1 2 3 4


--
David.


Does anyone have a good idea? I would be extremely grateful.

Kind regards,

Susanne Balzer




Susanne Balzer
PhD Student
Institute of Marine Research
N-5073 Bergen, Norway
Phone: +47 55 23 69 45
susanne.bal...@imr.no
www.imr.no

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Computing Probit Marginal Effects

2010-02-27 Thread Cardinals_Fan

One last question.  I'm trying to use the rnorm() function to draw a
distribution for my coefficient estimates.  Let's say I have a model y* = a
+ b1x1.  I have the coefficient estimate for b1 stored as b1 and the
standard error estimate for b1 stored as s1.  I run rnorm function as

a - rnorm(1000,b1,s1) 

and I get NA values in the vector.  If i dont use scalars it works fine.  Is
there a special way scalars are entered to make it work?  I have also tried
the dnorm command.

-- 
View this message in context: 
http://n4.nabble.com/Help-Computing-Probit-Marginal-Effects-tp1571672p1572139.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread Balzer Susanne
Hi David,

That looks magic and works - if and only if you keep the file connection open. 
Cool, that was the hint I needed! 

 scan(myfile.txt, nmax=10) 

will always give you the first 10 items, obviously.

However, I did the workaround with tr under unix now and changed all the tabs 
into line breaks (thanks @Claudia Beleites). 

But good to know that scan also does the job.

Thanks again,

Susanne


-Opprinnelig melding-
Fra: David Winsemius [mailto:dwinsem...@comcast.net] 
Sendt: 27. februar 2010 18:46
Til: Balzer Susanne
Kopi: r-help@r-project.org help
Emne: Re: SV: [R] scan and skip - without line breaks in the input file


On Feb 27, 2010, at 11:47 AM, Balzer Susanne wrote:

 Hei David,

 Thanks for your quick response, but unfortunately n and nmax alone  
 don't do the job. If I want to read items no. 11 to 20, the  
 n=10 option will work, but skip=10 (to NOT read the first  
 10 items) won't.

 Or with your example,

 scan(textConnection('1 2 3 4 5 6 7'), skip=3) will never work, while

True.

 scan(textConnection('1 2 3 4 \n 5 \n 6 \n 7'), skip=3) will. But I  
 don't have line breaks in my file.

Right. That was what I was trying to help you deal with.


 Is there no way to specify the character for a line break in scan /  
 read.table / etc.?

Why are you fixating on linefeeds when you don't have any?

  closeAllConnections()
  tc - textConnection(paste(1:100, sep= , collapse= ))
  scan(tc, nmax=10)
Read 10 items
  [1]  1  2  3  4  5  6  7  8  9 10
  scan(tc, nmax=10)
Read 10 items
  [1] 11 12 13 14 15 16 17 18 19 20
  scan(tc, nmax=10)
Read 10 items
  [1] 21 22 23 24 25 26 27 28 29 30
  scan(tc, nmax=10)
Read 10 items
  [1] 31 32 33 34 35 36 37 38 39 40
  scan(tc, nmax=10)
Read 10 items
  [1] 41 42 43 44 45 46 47 48 49 50
  scan(tc, nmax=10)
Read 10 items
  [1] 51 52 53 54 55 56 57 58 59 60
  scan(tc, nmax=10)
Read 10 items
  [1] 61 62 63 64 65 66 67 68 69 70
  scan(tc, nmax=10)
Read 10 items
  [1] 71 72 73 74 75 76 77 78 79 80
  scan(tc, nmax=10)
Read 10 items
  [1] 81 82 83 84 85 86 87 88 89 90
  scan(tc, nmax=10)
Read 10 items
  [1]  91  92  93  94  95  96  97  98  99 100
  scan(tc, nmax=10)
Read 0 items
numeric(0)

-- 
David.





 Kind regards,

 Susanne


 -Opprinnelig melding-
 Fra: David Winsemius [mailto:dwinsem...@comcast.net]
 Sendt: 27. februar 2010 17:38
 Til: Balzer Susanne
 Kopi: 'r-help@r-project.org'
 Emne: Re: [R] scan and skip - without line breaks in the input file


 On Feb 27, 2010, at 11:24 AM, Balzer Susanne wrote:

 Dear all,

 I am trying to read in big amounts of data with scan. It's only one
 variable, numeric values, separated by tabs,.. and it's many of
 them. So I was thinking that I could use the skip option and read in
 10 values at a time - but skip doesn't work, probably because I
 don't have line breaks in the txt file. So any value specified for
 skip makes the scan function jump to the end of the file.

 ?scan

 Without a working example it is hard to be sure, but it appears from a
 rapid look at the help page that nmax is the argument you want.

 scan(textConnection('1 2 3 4 5 6 7'), nmax=4)
 Read 4 items
 [1] 1 2 3 4


 (Ignores line-feeds)
 scan(textConnection('1 2 \n 3 4 5 6 7'), nmax=4)
 Read 4 items
 [1] 1 2 3 4


 -- 
 David.

 Does anyone have a good idea? I would be extremely grateful.

 Kind regards,

 Susanne Balzer



 
 Susanne Balzer
 PhD Student
 Institute of Marine Research
 N-5073 Bergen, Norway
 Phone: +47 55 23 69 45
 susanne.bal...@imr.no
 www.imr.no

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread Peter Ehlers

David, Susanne,

 There may be a misunderstanding here. As I understand it,
Susanne wants to be able to read the _second_ (and third, etc)
100K values after reading the first 100K, presumably to be
processed separately for reasons I can't imagine. If that is
correct, then I have no solution other than inserting
delimiters before passing off to R.

But Susanne, why do you need to read your data in this
piece-meal fashion?

  -Peter Ehlers

On 2010-02-27 10:46, David Winsemius wrote:


On Feb 27, 2010, at 11:47 AM, Balzer Susanne wrote:


Hei David,

Thanks for your quick response, but unfortunately n and nmax alone
don't do the job. If I want to read items no. 11 to 20, the
n=10 option will work, but skip=10 (to NOT read the first
10 items) won't.

Or with your example,

scan(textConnection('1 2 3 4 5 6 7'), skip=3) will never work, while


True.


scan(textConnection('1 2 3 4 \n 5 \n 6 \n 7'), skip=3) will. But I
don't have line breaks in my file.


Right. That was what I was trying to help you deal with.



Is there no way to specify the character for a line break in scan /
read.table / etc.?


Why are you fixating on linefeeds when you don't have any?

  closeAllConnections()
  tc - textConnection(paste(1:100, sep= , collapse= ))
  scan(tc, nmax=10)
Read 10 items
[1] 1 2 3 4 5 6 7 8 9 10
  scan(tc, nmax=10)
Read 10 items
[1] 11 12 13 14 15 16 17 18 19 20
  scan(tc, nmax=10)
Read 10 items
[1] 21 22 23 24 25 26 27 28 29 30
  scan(tc, nmax=10)
Read 10 items
[1] 31 32 33 34 35 36 37 38 39 40
  scan(tc, nmax=10)
Read 10 items
[1] 41 42 43 44 45 46 47 48 49 50
  scan(tc, nmax=10)
Read 10 items
[1] 51 52 53 54 55 56 57 58 59 60
  scan(tc, nmax=10)
Read 10 items
[1] 61 62 63 64 65 66 67 68 69 70
  scan(tc, nmax=10)
Read 10 items
[1] 71 72 73 74 75 76 77 78 79 80
  scan(tc, nmax=10)
Read 10 items
[1] 81 82 83 84 85 86 87 88 89 90
  scan(tc, nmax=10)
Read 10 items
[1] 91 92 93 94 95 96 97 98 99 100
  scan(tc, nmax=10)
Read 0 items
numeric(0)



--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread xlr82sas

Hi,

   I don't think you should split the list for beginners.

   On the SAS list we get questions from novices such as secretaries,
janitorial services, human resources and even top executives.

  They often approach SAS from a very intuitive standpoint. These questions
often shake the experts to the core. They ask themselves, why didn't I allow
R to do this.

For instance I novice might ask of the SAS datastep language:
Why can't I just 
Array X[3] (A,1.ROGER,26) 

You can do the above in several other integrated SAS languages
(MACRO,SCL,SAS-C,IML-sort of) at ~$5000+ per year for each except macro)

 A user asked recently
   array x[2,3,4,5] x1-x120;
Do i=1 to 2;
  Do j=1 to 3;
Do k=1 to 4;
  Do l=1 to 5; 
 Xijkl = i*j*k*l;
  End;
End;
  End;
End;

R can do this nicely with lists but SAS can do it with SCL,Macro,IML
and C. I think SAS-IML has the most intuitive solution.

I read Nabble, perl and SAS lists religiously, what I would like to
see is one list that somehow integrated R, SAS and perl solutions. SAS users
are trying to create integrated  'DROP DOWN' capabilties that allow
programmers to switch languages mid stream to get the best solution. I often
want to respond with SAS solutions, just so R and perl can think about
adding functionality.

ie
  data new;
set data;
perl on;
  perl code;
  ...
perl off;
   sas code;
   .
   R on;
 R code;
 ;
   R off;
run;

I am trying to get SAS users to do some of their processing in R(within
SAS). I am toying with a set of tips that show SAS intuitive code beside R
code, so SAS users can become more comfortable with R.

SAS is much more intuitive than R
for instance R 'for' loops with funny '}s' next to the more intuitive SAS
do/ends.

I could expound on the type of problems perl handles better than SAS or R, 
problems R handles better than SAS or perl and problems SAS handles better
than R or perl.
 
  
   


   
   
 
   

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1572149.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Computing Probit Marginal Effects

2010-02-27 Thread Peter Ehlers

On 2010-02-27 10:57, Cardinals_Fan wrote:


One last question.  I'm trying to use the rnorm() function to draw a
distribution for my coefficient estimates.  Let's say I have a model y* = a
+ b1x1.  I have the coefficient estimate for b1 stored as b1 and the
standard error estimate for b1 stored as s1.  I run rnorm function as

a- rnorm(1000,b1,s1)

and I get NA values in the vector.  If i dont use scalars it works fine.  Is


I don't understand what don't use scalars means.
I think that your problem is with the word stored? *How* are these
values 'stored'?
If you have

 b1 - 3.14
 s1 - 1.41

then

 rnorm(1000, b1, s1)

will not produce NAs.


there a special way scalars are entered to make it work?  I have also tried
the dnorm command.


This is a bit worrying - why would you consider dnorm when you want a
random sample? Or do you just want to plot the Normal curve with mean
equal to b1 and SD equal to s1?

--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread Peter Ehlers

Talk about asleep at the switch.
My sincere apologies to both Susanne and David for my
stupid message and to group for wasting everyone's time.

Ouch, that headslap hurt.

 -Peter

On 2010-02-27 11:05, Peter Ehlers wrote:

David, Susanne,

There may be a misunderstanding here. As I understand it,
Susanne wants to be able to read the _second_ (and third, etc)
100K values after reading the first 100K, presumably to be
processed separately for reasons I can't imagine. If that is
correct, then I have no solution other than inserting
delimiters before passing off to R.

But Susanne, why do you need to read your data in this
piece-meal fashion?

-Peter Ehlers

On 2010-02-27 10:46, David Winsemius wrote:


On Feb 27, 2010, at 11:47 AM, Balzer Susanne wrote:


Hei David,

Thanks for your quick response, but unfortunately n and nmax alone
don't do the job. If I want to read items no. 11 to 20, the
n=10 option will work, but skip=10 (to NOT read the first
10 items) won't.

Or with your example,

scan(textConnection('1 2 3 4 5 6 7'), skip=3) will never work, while


True.


scan(textConnection('1 2 3 4 \n 5 \n 6 \n 7'), skip=3) will. But I
don't have line breaks in my file.


Right. That was what I was trying to help you deal with.



Is there no way to specify the character for a line break in scan /
read.table / etc.?


Why are you fixating on linefeeds when you don't have any?

 closeAllConnections()
 tc - textConnection(paste(1:100, sep= , collapse= ))
 scan(tc, nmax=10)
Read 10 items
[1] 1 2 3 4 5 6 7 8 9 10
 scan(tc, nmax=10)
Read 10 items
[1] 11 12 13 14 15 16 17 18 19 20
 scan(tc, nmax=10)
Read 10 items
[1] 21 22 23 24 25 26 27 28 29 30
 scan(tc, nmax=10)
Read 10 items
[1] 31 32 33 34 35 36 37 38 39 40
 scan(tc, nmax=10)
Read 10 items
[1] 41 42 43 44 45 46 47 48 49 50
 scan(tc, nmax=10)
Read 10 items
[1] 51 52 53 54 55 56 57 58 59 60
 scan(tc, nmax=10)
Read 10 items
[1] 61 62 63 64 65 66 67 68 69 70
 scan(tc, nmax=10)
Read 10 items
[1] 71 72 73 74 75 76 77 78 79 80
 scan(tc, nmax=10)
Read 10 items
[1] 81 82 83 84 85 86 87 88 89 90
 scan(tc, nmax=10)
Read 10 items
[1] 91 92 93 94 95 96 97 98 99 100
 scan(tc, nmax=10)
Read 0 items
numeric(0)





--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help Computing Probit Marginal Effects

2010-02-27 Thread Ted Harding
On 27-Feb-10 17:57:56, Cardinals_Fan wrote:
 
 One last question.  I'm trying to use the rnorm() function to
 draw a distribution for my coefficient estimates.  Let's say
 I have a model y* = a + b1x1.  I have the coefficient estimate
 for b1 stored as b1 and the standard error estimate for b1
 stored as s1.  I run rnorm function as
 
 a - rnorm(1000,b1,s1) 
 
 and I get NA values in the vector.  If i dont use scalars it
 works fine.  Is there a special way scalars are entered to
 make it work?  I have also tried the dnorm command.
 
 --

It should not happen that you get NAs *provided s1 = 0 and
neither b1 nor s1 is NA*. So check what values you have stored
in b1 and s1. (In fact if you have s1  0 you will get NaN
rather than NA, so s1  0 should not be the source of the problem).

What I suspect is that, for some reason to do with your data,
you have got NA for 'a' or for 'b1'.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 27-Feb-10   Time: 18:25:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread xlr82sas

Hi,

   I don't think you should split the list for beginners.

   On the SAS list we get questions from novices such as secretaries,
janitorial services, human resources and even top executives.

  They often approach SAS from a very intuitive standpoint. These questions
often shake the experts to the core. They ask themselves, why didn't I allow
R to do this.

For instance I novice might ask of the SAS datastep language:
Why can't I just 
Array X[3] (A,1.ROGER,26) 

You can do the above in several other integrated SAS languages
(MACRO,SCL,SAS-C,IML-sort of) at ~$5000+ per year for each except macro)

 A user asked recently
   array x[2,3,4,5] x1-x120;
Do i=1 to 2;
  Do j=1 to 3;
Do k=1 to 4;
  Do l=1 to 5; 
 Xijkl = i*j*k*l;
  End;
End;
  End;
End;

R can do this nicely with lists but SAS can do it with SCL,Macro,IML
and C. I think SAS-IML has the most intuitive solution.

I read Nabble, perl and SAS lists religiously, what I would like to
see is one list that somehow integrated R, SAS and perl solutions. SAS users
are trying to create integrated  'DROP DOWN' capabilties that allow
programmers to switch languages mid stream to get the best solution. I often
want to respond with SAS solutions, just so R and perl can think about
adding functionality.

ie
  data new;
set data;
perl on;
  perl code;
  ...
perl off;
   sas code;
   .
   R on;
 R code;
 ;
   R off;
run;

I am trying to get SAS users to do some of their processing in R(within
SAS). I am toying with a set of tips that show SAS intuitive code beside R
code, so SAS users can become more comfortable with R.

SAS is much more intuitive than R
for instance R 'for' loops with funny '}s' next to the more intuitive SAS
do/ends.

I could expound on the type of problems perl handles better than SAS or R, 
problems R handles better than SAS or perl and problems SAS handles better
than R or perl.
 
  
   


   
   
 
   

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1572165.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Aerodynamic Package(s)?

2010-02-27 Thread Charles Annis, P.E.
Jason:

What are you trying to do?  Your reference link provides several Fortran
programs.  Why can't you use those?  Or you could translate them into R code
if you would like to take advantage of R's wonderful graphics and
multitudinous other statistical adjuncts.

Your request seems too broad to allow a more focused response.  Perhaps we
could be more helpful if you told us what you are trying to accomplish.

Charles Annis, P.E.

charles.an...@statisticalengineering.com
561-352-9699
http://www.StatisticalEngineering.com

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Jason Rupert
Sent: Saturday, February 27, 2010 8:39 AM
To: R-help@r-project.org
Subject: Re: [R] R Aerodynamic Package(s)?

I received zero responses to this post, so I guess this confirms that R is
not the correct target language for this project.  

Maybe Octave is better suited...

Thank you again.



- Original Message 
From: Jason Rupert jasonkrup...@yahoo.com
To: R-help@r-project.org
Cc: Me jasonkrup...@yahoo.com
Sent: Tue, February 23, 2010 6:33:18 AM
Subject: R Aerodynamic Package(s)?

By any chance is anyone aware of any R Packages that contain or expand the
aerodynamic capabilities mentioned on the following website? 

http://www.aoe.vt.edu/~mason/Mason_f/MRsoft.html


Typically I know R packages have focused on extending the statistical and
graphing capability within R, so I was just curious if there might be a
package that contains some aerodynamics.  

If by chance there isn't a package, is there any interest, in the
development of a package or the use of a such an R package?

Just curious what the user/developer community thought about this?  I know
MatLab and proprietary software executables is the typical place where
aerodynamic analysis is performed, so any feedback about such a package
existing/being created in R is great. 

Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting IEEE Float in 16 char hex back to float

2010-02-27 Thread xlr82sas

Thanks

Nice code.

I appreciate the function. I don't know if you ever use SAS datasets but I
am working with the devloper of 'dsread' to create a lossless transfer from
SAS to R. I am also working on an in memory Java interface which would allow
me to mix SAS and R code.

Here is the link to dsread, if any other SAS/R users are interested

http://www.oview.co.uk/dsread 

Also here is the link to SAS list on this topic.

http://tiny.cc/G4xQl
-- 
View this message in context: 
http://n4.nabble.com/Converting-IEEE-Float-in-16-char-hex-back-to-float-tp1571710p1572196.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in ecdf? Or what am I missing?

2010-02-27 Thread Ajay Shah
  x - c(6.6493705109108, 7.1348436721241, 8.76886994525624,
 6.12907548096037, 6.88379118678109, 7.17841879427688,
 7.90737237492867, 7.1207373264833, 7.82949407630692,
 6.90411547316105)
  plot(ecdf(x), log=x)

It does the plot fine, but complains:

  Warning message:
  In xy.coords(x, y, xlabel, ylabel, log) :
1 x value = 0 omitted from logarithmic plot

This seems to be an error since all the values in x are positive.

Thanks,

-- 
Ajay Shah  http://www.mayin.org/ajayshah  
ajays...@mayin.org http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Best Hardware OS For Large Data Sets

2010-02-27 Thread J. Daniel

Greetings,

I am acquiring a new computer in order to conduct data analysis.  I
currently have a 32-bit Vista OS with 3G of RAM and I consistently run into
memory allocation problems.  I will likely be required to run Windows 7 on
the new system, but have flexibility as far as hardware goes.  Can people
recommend the best hardware to minimize memory allocation problems?  I am
leaning towards dual core on a 64-bit system with 8G of RAM.  Given the
Windows constraint, is there anything I am missing here?

I know that Windows limits the RAM that a single application can access. 
Does this fact over-ride many hardware considerations?  Any way around this?

Thanks,

JD


-- 
View this message in context: 
http://n4.nabble.com/Best-Hardware-OS-For-Large-Data-Sets-tp1572129p1572129.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlap plot

2010-02-27 Thread abotaha

Thank you for this simple help. It is sufficient for my plot.

cheers.
-- 
View this message in context: 
http://n4.nabble.com/Overlap-plot-tp1571803p1572061.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan and skip - without line breaks in the input file

2010-02-27 Thread Balzer Susanne
Hei David,

Thanks for your quick response, but unfortunately n and nmax alone don't do the 
job. If I want to read items no. 11 to 20, the n=10 option will 
work, but skip=10 (to NOT read the first 10 items) won't.

Or with your example,

scan(textConnection('1 2 3 4 5 6 7'), skip=3) will never work, while

scan(textConnection('1 2 3 4 \n 5 \n 6 \n 7'), skip=3) will. But I don't have 
line breaks in my file.

Is there no way to specify the character for a line break in scan / read.table 
/ etc.?

Kind regards,

Susanne


-Opprinnelig melding-
Fra: David Winsemius [mailto:dwinsem...@comcast.net] 
Sendt: 27. februar 2010 17:38
Til: Balzer Susanne
Kopi: 'r-help@r-project.org'
Emne: Re: [R] scan and skip - without line breaks in the input file


On Feb 27, 2010, at 11:24 AM, Balzer Susanne wrote:

 Dear all,

 I am trying to read in big amounts of data with scan. It's only one  
 variable, numeric values, separated by tabs,.. and it's many of  
 them. So I was thinking that I could use the skip option and read in  
 10 values at a time - but skip doesn't work, probably because I  
 don't have line breaks in the txt file. So any value specified for  
 skip makes the scan function jump to the end of the file.

?scan

Without a working example it is hard to be sure, but it appears from a  
rapid look at the help page that nmax is the argument you want.

  scan(textConnection('1 2 3 4 5 6 7'), nmax=4)
Read 4 items
[1] 1 2 3 4


(Ignores line-feeds)
  scan(textConnection('1 2 \n 3 4 5 6 7'), nmax=4)
Read 4 items
[1] 1 2 3 4


-- 
David.

 Does anyone have a good idea? I would be extremely grateful.

 Kind regards,

 Susanne Balzer



 
 Susanne Balzer
 PhD Student
 Institute of Marine Research
 N-5073 Bergen, Norway
 Phone: +47 55 23 69 45
 susanne.bal...@imr.no
 www.imr.no

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Aerodynamic Package(s)?

2010-02-27 Thread Sharpie


Charles Annis, P.E. wrote:
 
 Jason:
 
 What are you trying to do?  Your reference link provides several Fortran
 programs.  Why can't you use those?  Or you could translate them into R
 code
 if you would like to take advantage of R's wonderful graphics and
 multitudinous other statistical adjuncts.
 

It's also worth noting that R could most likely access the Fortran code
directly if it was built into a shared library using R CMD SHLIB or, better
yet, placed into the src/ directory of an R package.  Then, rather than
spending the time to  re-write the Fortran in R, R could be as an interface
which provided front end data preparation and back end data analysis and
visualization.


-Charlie


Charles Annis, P.E. wrote:
 
 Your request seems too broad to allow a more focused response.  Perhaps we
 could be more helpful if you told us what you are trying to accomplish.
 
 Charles Annis, P.E.
 
-- 
View this message in context: 
http://n4.nabble.com/R-Aerodynamic-Package-s-tp1565840p1572212.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best Hardware OS For Large Data Sets

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 12:47 PM, J. Daniel wrote:



Greetings,

I am acquiring a new computer in order to conduct data analysis.  I
currently have a 32-bit Vista OS with 3G of RAM and I consistently  
run into
memory allocation problems.  I will likely be required to run  
Windows 7 on
the new system, but have flexibility as far as hardware goes.  Can  
people
recommend the best hardware to minimize memory allocation problems?   
I am
leaning towards dual core on a 64-bit system with 8G of RAM.  Given  
the

Windows constraint, is there anything I am missing here?


Perhaps the fact that the stable CRAN version of R for (any) Windows  
is 32-bit? It would expand your memory space somewhat but not as much  
as you might naively expect.


(There was a recent  announcement that an experimental version of a 64- 
bit R was available (even with an installer) and there are vendors who  
will supply a 64-bit Windows version for an un-announced price. The  
fact that there was not as of January support for binary packages  
seems to a bit of a constraint on who would be able to step up to  
use full 64 bit R capabilities on Win64. I'm guessing from the your  
failure to mention potential software constraints that you are not  
among that more capable group, as I am also not.)


https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html
https://stat.ethz.ch/pipermail/r-devel/2010-January/056411.html



I know that Windows limits the RAM that a single application can  
access.
Does this fact over-ride many hardware considerations?  Any way  
around this?


Thanks,

JD


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Tim Coote
Thanks, Gabor. My take away from this and Phil's post is that I'm  
going to have to construct some code to do the parsing, rather than  
use a standard function. I'm afraid that neither approach works, yet:


Gabor's gets has an off-by-one error (days start on the 2nd, not the  
first), and the years get messed up around the 29th day.  I think that  
na.omit (DF) line is throwing out the baby with the bathwater.  It's  
interesting that this approach is based on read.table, I'd assumed  
that I'd need read.ftable, which I couldn't understand the  
documentation for.  What is it that's removing the -999 and -888  
values in this code -they seem to be gone, but I cannot see why.


Phil's reads in the data, but interleaves rows with just a year and  
all other values as NA.


Tim
On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:


Mark Leeds pointed out to me that the code wrapped around in the post
so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]


On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
Try this.  First we read the raw lines into R using grep to remove  
any

lines containing a character that is not a number or space.  Then we
look for the year lines and repeat them down V1 using cumsum.   
Finally

we omit the year lines.

myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat 


raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)


On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org 
 wrote:

Hullo
I'm trying to read some time series data of meteorological records  
that are

available on the web (eg
http://climate.arm.ac.uk/calibrated/soil/ 
dsoil100_cal_1910-1919.dat). I'd
like to be able to read in the digital data directly into R.  
However, I
cannot work out the right function and set of parameters to use.   
It could
be that the only practical route is to write a parser, possibly in  
some
other language,  reformat the files and then read these into R. As  
far as I

can tell, the informal grammar of the file is:

comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+

and the daily readings are of the form:
whitespace day number [whitespace reading on day of month]  
12


Readings for days in months where a day does not exist have  
special values.

Missing values have a different special value.

And then I've got the problem of iterating over all relevant files  
to get a

whole timeseries.

Is there a way to read in this type of file into R? I've read all  
of the
examples that I can find, but cannot work out how to do it. I  
don't think
that read.table can handle the separate sections of data  
representing each
year. read.ftable maybe can be coerced to parse the data, but I  
cannot see
how after reading the documentation and experimenting with the  
parameters.


I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

Any help/suggestions would be greatly appreciated. I can see that  
this type
of issue is likely to grow in importance, and I'd also like to  
give the data
owners suggestions on how to reformat their data so that it is  
easier to

consume by machines, while being easy to read for humans.

The early records are a serious machine parsing challenge as they  
are tiff

images of old notebooks ;-)

tia

Tim
Tim Coote
t...@coote.org
vincit veritas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





Tim Coote
t...@coote.org
vincit veritas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread Kingsford Jones
On Fri, Feb 26, 2010 at 8:00 AM, Robert Baer rb...@atsu.edu wrote:
[...]
 The things that led from frustration to independence was understanding
 the difference between data types like matrix and dataframe and learning
 there were commands to tell what you were working with at any given time.
 Did the data read in as character, numeric, or factor, etc.  Commands
 like: str, class, mode, ls, search, help, help.search, etc can help you
 figure out what you are doing.

Yes!  I think this is really key.  When I started R I had no
programming experience and thought of projects in terms of statistical
procedures and printed output (cut teeth w/ Minitab -- SPSS -- SAS).
 If I wanted to analyze data using R I looked for examples of using an
analysis function of interest (e.g, lm, princomp, rpart...) and did my
best to adapt to my project.  What was of interest was the printed
output rather than understanding the objects that I was passing and
creating. It wasn't until I buckled down and read the (admittedly
quite dry and often dense) materials describing the language that the
sailing became smooth (or at least much more rapid and took me to more
interesting places).  Important resources I recall using were An
Introduction to R (which I avoided for about the first 6mo because of
language I wasn't yet familiar with), r-help archives, man pages, and
particularly the early chapters of MASS and S Programming by VR.  But
I think the real 'a-ha' moments came by interactively exploring
objects within R.  This was vastly facilitated by the use of str and
indexing tools ([, [[, $, @).

A mantra for R beginners might be In R we work with objects, and str
reveals their essence ;-)

Kingsford Jones



 Rob




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Patrick Burns
 Sent: Thursday, February 25, 2010 11:31 AM
 To: r-help@r-project.org
 Subject: [R] two questions for R beginners

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best Hardware OS For Large Data Sets

2010-02-27 Thread Sharpie


David Winsemius wrote:
 
 
 Perhaps the fact that the stable CRAN version of R for (any) Windows  
 is 32-bit? It would expand your memory space somewhat but not as much  
 as you might naively expect.
 
 (There was a recent  announcement that an experimental version of a 64- 
 bit R was available (even with an installer) and there are vendors who  
 will supply a 64-bit Windows version for an un-announced price. The  
 fact that there was not as of January support for binary packages  
 seems to a bit of a constraint on who would be able to step up to  
 use full 64 bit R capabilities on Win64.
 

According to this post by Dr. Ripley:

  http://n4.nabble.com/R-on-64-Bit-td1563895.html

CRAN is building 64bit Windows packages for R 2.11 which is currently under
development.  From the looks of it, 64 bit support may be coming to Windows
with the next major release of R.



David Winsemius wrote:
 
 I'm guessing from the your failure to mention potential software
 constraints that you are not  
 among that more capable group, as I am also not.)
 
 https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html
 https://stat.ethz.ch/pipermail/r-devel/2010-January/056411.html
 
 

-- 
View this message in context: 
http://n4.nabble.com/Best-Hardware-OS-For-Large-Data-Sets-tp1572129p1572256.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable

2010-02-27 Thread David Freedman

there's a recode function in the Hmisc package, but it's difficult (at least
for me) to find documentation for it

library(Hmisc)
week - c('SAT', 'SUN', 'MON', 'FRI');
recode(week,c('SAT', 'SUN', 'MON', 'FRI'),1:4)

HTH
-- 
View this message in context: 
http://n4.nabble.com/How-to-add-a-variable-to-a-dataframe-whose-values-are-conditional-upon-the-values-of-an-existing-vare-tp1571214p1572261.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best Hardware OS For Large Data Sets

2010-02-27 Thread Tim Coote
Is it possible to run a Linux guest VM on the Wintel box so that you  
can run the 64 bit code?  I used to do this on XP (but not for R).

On 27 Feb 2010, at 20:03, David Winsemius wrote:



On Feb 27, 2010, at 12:47 PM, J. Daniel wrote:



Greetings,

I am acquiring a new computer in order to conduct data analysis.  I
currently have a 32-bit Vista OS with 3G of RAM and I consistently  
run into
memory allocation problems.  I will likely be required to run  
Windows 7 on
the new system, but have flexibility as far as hardware goes.  Can  
people
recommend the best hardware to minimize memory allocation  
problems?  I am
leaning towards dual core on a 64-bit system with 8G of RAM.  Given  
the

Windows constraint, is there anything I am missing here?


Perhaps the fact that the stable CRAN version of R for (any) Windows  
is 32-bit? It would expand your memory space somewhat but not as  
much as you might naively expect.


(There was a recent  announcement that an experimental version of a  
64-bit R was available (even with an installer) and there are  
vendors who will supply a 64-bit Windows version for an un-announced  
price. The fact that there was not as of January support for binary  
packages seems to a bit of a constraint on who would be able to  
step up to use full 64 bit R capabilities on Win64. I'm guessing  
from the your failure to mention potential software constraints that  
you are not among that more capable group, as I am also not.)


https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html
https://stat.ethz.ch/pipermail/r-devel/2010-January/056411.html



I know that Windows limits the RAM that a single application can  
access.
Does this fact over-ride many hardware considerations?  Any way  
around this?


Thanks,

JD


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Tim Coote
t...@coote.org
+44 (0)7866 479 760

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Gabor Grothendieck
No one else posted so the other post you are referring to must have
been an email to you, not a post.  We did not see it.

By one off I think you are referring to the row names, which are
meaningless, rather than the day numbers.  The data for day 1 is
present, not missing.  The example code did replace the day number
column with the year since the days were just sequential and therefore
derivable but its trivial to keep them if that is important to you and
we have made that change below.

The previous code used grep to kick out lines that had any character
not in the set: minus sign, space and digit but in this version we add
minus sign to that set.   We also corrected the year column and added
column names and converted all -999 strings to NA.  Due to this last
point we cannot use na.omit any more but we now have iy available that
distinguishes between year rows and other rows.

Every line here has been indented so anything that starts at the left
column must have been word wrapped in transmission.

  myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
  raw.lines - readLines(myURL)
  DF - read.table(textConnection(raw.lines[!grepl([^- 0-9.], raw.lines)]),
fill = TRUE, col.names = c(day, month.abb), na.strings = -999)

  iy - is.na(DF[[2]]) # is year row
  DF$year - DF[iy, 1][cumsum(iy)]
  DF - DF[!iy, ]

  DF


On Sat, Feb 27, 2010 at 3:28 PM, Tim Coote tim+r-project@coote.org wrote:
 Thanks, Gabor. My take away from this and Phil's post is that I'm going to

I think the other `post`` must have been directly to you.  We didn`t see it.

 have to construct some code to do the parsing, rather than use a standard
 function. I'm afraid that neither approach works, yet:

 Gabor's gets has an off-by-one error (days start on the 2nd, not the first),
 and the years get messed up around the 29th day.  I think that na.omit (DF)
 line is throwing out the baby with the bathwater.  It's interesting that
 this approach is based on read.table, I'd assumed that I'd need read.ftable,
 which I couldn't understand the documentation for.  What is it that's
 removing the -999 and -888 values in this code -they seem to be gone, but I
 cannot see why.

 Phil's reads in the data, but interleaves rows with just a year and all
 other values as NA.

 Tim
 On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:

 Mark Leeds pointed out to me that the code wrapped around in the post
 so it may not be obvious that the regular expression in the grep is
 (i.e. it contains a space):
 [^ 0-9.]


 On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Try this.  First we read the raw lines into R using grep to remove any
 lines containing a character that is not a number or space.  Then we
 look for the year lines and repeat them down V1 using cumsum.  Finally
 we omit the year lines.

 myURL -
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
 raw.lines - readLines(myURL)
 DF - read.table(textConnection(raw.lines[!grepl([^
 0-9.],raw.lines)]), fill = TRUE)
 DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
 DF - na.omit(DF)
 head(DF)


 On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org
 wrote:

 Hullo
 I'm trying to read some time series data of meteorological records that
 are
 available on the web (eg
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat).
 I'd
 like to be able to read in the digital data directly into R. However, I
 cannot work out the right function and set of parameters to use.  It
 could
 be that the only practical route is to write a parser, possibly in some
 other language,  reformat the files and then read these into R. As far
 as I
 can tell, the informal grammar of the file is:

 comments terminated by a blank line
 [year number on a line on its own
 daily readings lines ]+

 and the daily readings are of the form:
 whitespace day number [whitespace reading on day of month] 12

 Readings for days in months where a day does not exist have special
 values.
 Missing values have a different special value.

 And then I've got the problem of iterating over all relevant files to
 get a
 whole timeseries.

 Is there a way to read in this type of file into R? I've read all of the
 examples that I can find, but cannot work out how to do it. I don't
 think
 that read.table can handle the separate sections of data representing
 each
 year. read.ftable maybe can be coerced to parse the data, but I cannot
 see
 how after reading the documentation and experimenting with the
 parameters.

 I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

 Any help/suggestions would be greatly appreciated. I can see that this
 type
 of issue is likely to grow in importance, and I'd also like to give the
 data
 owners suggestions on how to reformat their data so that it is easier to
 consume by machines, while being easy to read for humans.

 The early records are a serious machine parsing challenge as they are
 tiff
 images of old 

Re: [R] simple main effect.

2010-02-27 Thread RICHARD M. HEIBERGER
 Lets take for example an experiment in which I had two between subject
 variables - Strain and treatment, and one within - exposure. all the
 variables had 2 levels each.

 I found an interaction between exposure and Strain and I want to compare
 Strain A and B under every exposure (first and second).
 The general model was with that function:
 aov(duration~(Strain*exposure*treatment)+Error(subject/exposure),data)

 in summary(aovmodel) there was a significant interaction between exposure
 and strain.
 how (using those HH packages) can I compare Strains under the conditions of
 exposure?

Your example is structurally similar to the maiz example in ?MMC.
Therefore the answer will also be similar.  It is not possible to do
any more without the exact
structure of your data.  As I indicated before, the duration variable
can be random numbers.
I need the full dataset with the actual values for Strain, exposure,
treatment, and subject.
You are welcome to use A,B,C,D for treatment levels and 1,2,...,n for
subject ID.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Phil Spector

Sorry, I forgot to cc the group:

Tim -
   Here's a way to read the data into a list, with one entry per year:

x = 
read.table('http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat',
header=FALSE,fill=TRUE,skip=13)
cts = apply(x,1,function(x)sum(is.na(x)))
wh = which(cts == 12)
start = wh+1
end = c(wh[-1] - 1,nrow(x))
ans = mapply(function(i,j)x[i:j,],start,end,SIMPLIFY=FALSE)
names(ans) = x[wh,1]

Hope this helps.
- Phil Spector



On Sat, 27 Feb 2010, Gabor Grothendieck wrote:


No one else posted so the other post you are referring to must have
been an email to you, not a post.  We did not see it.

By one off I think you are referring to the row names, which are
meaningless, rather than the day numbers.  The data for day 1 is
present, not missing.  The example code did replace the day number
column with the year since the days were just sequential and therefore
derivable but its trivial to keep them if that is important to you and
we have made that change below.

The previous code used grep to kick out lines that had any character
not in the set: minus sign, space and digit but in this version we add
minus sign to that set.   We also corrected the year column and added
column names and converted all -999 strings to NA.  Due to this last
point we cannot use na.omit any more but we now have iy available that
distinguishes between year rows and other rows.

Every line here has been indented so anything that starts at the left
column must have been word wrapped in transmission.

 myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
 raw.lines - readLines(myURL)
 DF - read.table(textConnection(raw.lines[!grepl([^- 0-9.], raw.lines)]),
   fill = TRUE, col.names = c(day, month.abb), na.strings = -999)

 iy - is.na(DF[[2]]) # is year row
 DF$year - DF[iy, 1][cumsum(iy)]
 DF - DF[!iy, ]

 DF


On Sat, Feb 27, 2010 at 3:28 PM, Tim Coote tim+r-project@coote.org wrote:

Thanks, Gabor. My take away from this and Phil's post is that I'm going to


I think the other `post`` must have been directly to you.  We didn`t see it.


have to construct some code to do the parsing, rather than use a standard
function. I'm afraid that neither approach works, yet:

Gabor's gets has an off-by-one error (days start on the 2nd, not the first),
and the years get messed up around the 29th day.  I think that na.omit (DF)
line is throwing out the baby with the bathwater.  It's interesting that
this approach is based on read.table, I'd assumed that I'd need read.ftable,
which I couldn't understand the documentation for.  What is it that's
removing the -999 and -888 values in this code -they seem to be gone, but I
cannot see why.

Phil's reads in the data, but interleaves rows with just a year and all
other values as NA.

Tim
On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:


Mark Leeds pointed out to me that the code wrapped around in the post
so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]


On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:


Try this.  First we read the raw lines into R using grep to remove any
lines containing a character that is not a number or space.  Then we
look for the year lines and repeat them down V1 using cumsum.  Finally
we omit the year lines.

myURL -
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)


On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org
wrote:


Hullo
I'm trying to read some time series data of meteorological records that
are
available on the web (eg
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat).
I'd
like to be able to read in the digital data directly into R. However, I
cannot work out the right function and set of parameters to use.  It
could
be that the only practical route is to write a parser, possibly in some
other language,  reformat the files and then read these into R. As far
as I
can tell, the informal grammar of the file is:

comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+

and the daily readings are of the form:
whitespace day number [whitespace reading on day of month] 12

Readings for days in months where a day does not exist have special
values.
Missing values have a different special value.

And then I've got the problem of iterating over all relevant files to
get a
whole timeseries.

Is there a way to read in this type of file into R? I've read all of the
examples that I can find, but cannot work out how to do it. I don't
think
that read.table can handle the separate sections of data representing
each
year. read.ftable maybe can be coerced to parse the data, but I cannot
see
how after reading the 

Re: [R] reading data from web data sources

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 4:33 PM, Gabor Grothendieck wrote:


No one else posted so the other post you are referring to must have
been an email to you, not a post.  We did not see it.

By one off I think you are referring to the row names, which are
meaningless, rather than the day numbers.  The data for day 1 is
present, not missing.  The example code did replace the day number
column with the year since the days were just sequential and therefore
derivable but its trivial to keep them if that is important to you and
we have made that change below.

The previous code used grep to kick out lines that had any character
not in the set: minus sign, space and digit but in this version we add
minus sign to that set.   We also corrected the year column and added
column names and converted all -999 strings to NA.  Due to this last
point we cannot use na.omit any more but we now have iy available that
distinguishes between year rows and other rows.

Every line here has been indented so anything that starts at the left
column must have been word wrapped in transmission.

 myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat 


 raw.lines - readLines(myURL)
 DF - read.table(textConnection(raw.lines[!grepl([^- 0-9.],  
raw.lines)]),

   fill = TRUE, col.names = c(day, month.abb), na.strings = -999)

 iy - is.na(DF[[2]]) # is year row
 DF$year - DF[iy, 1][cumsum(iy)]
 DF - DF[!iy, ]

 DF


Wouldn't they be of more value if they were sequential?

dta - data.matrix(DF[, -c(1,14)])
dtafrm -data.frame(rdta=dta[!is.na(dta)],
dom= DF[row(dta)[!is.na(dta)], 1],
month= col(dta)[!is.na(dta)])
# adding a year column would be trivial.
 sum(dtafrm$month ==2)
[1] 282
 sum(dtafrm$month ==12)
[1] 310

plot(dtafrm$rdta,  type=l)

Yes, I know that zoo() might be better, but I'm still a zoobie, or  
would that be newzer?


So, is there a zooisher function I should learn that would strip out  
the NA's and incorporate the data values?


--
David.




On Sat, Feb 27, 2010 at 3:28 PM, Tim Coote tim+r-project@coote.org 
 wrote:
Thanks, Gabor. My take away from this and Phil's post is that I'm  
going to


I think the other `post`` must have been directly to you.  We didn`t  
see it.


have to construct some code to do the parsing, rather than use a  
standard

function. I'm afraid that neither approach works, yet:

Gabor's gets has an off-by-one error (days start on the 2nd, not  
the first),
and the years get messed up around the 29th day.  I think that  
na.omit (DF)
line is throwing out the baby with the bathwater.  It's interesting  
that
this approach is based on read.table, I'd assumed that I'd need  
read.ftable,

which I couldn't understand the documentation for.  What is it that's
removing the -999 and -888 values in this code -they seem to be  
gone, but I

cannot see why.

Phil's reads in the data, but interleaves rows with just a year and  
all

other values as NA.

Tim
On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:

Mark Leeds pointed out to me that the code wrapped around in the  
post

so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]


On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:


Try this.  First we read the raw lines into R using grep to  
remove any
lines containing a character that is not a number or space.  Then  
we
look for the year lines and repeat them down V1 using cumsum.   
Finally

we omit the year lines.

myURL -
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat 


raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)


On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org 


wrote:


Hullo
I'm trying to read some time series data of meteorological  
records that

are
available on the web (eg
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat) 
.

I'd
like to be able to read in the digital data directly into R.  
However, I
cannot work out the right function and set of parameters to  
use.  It

could
be that the only practical route is to write a parser, possibly  
in some
other language,  reformat the files and then read these into R.  
As far

as I
can tell, the informal grammar of the file is:

comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+

and the daily readings are of the form:
whitespace day number [whitespace reading on day of  
month] 12


Readings for days in months where a day does not exist have  
special

values.
Missing values have a different special value.

And then I've got the problem of iterating over all relevant  
files to

get a
whole timeseries.

Is there a way to read in this type of file into R? I've read  
all of the
examples that I can find, but cannot work out how to do it. I  
don't

think
that 

[R] help with Gantt chart

2010-02-27 Thread Zoppoli, Gabriele (NIH/NCI) [G]
Hi,

I don't know to solve this error that is returned, even though I understand it:

library(plotrix)

Ymd.format-%Y/%m/%d
 gantt.info-list(labels=
  c(First task,Second task (1st part),Third task (1st part),Second task 
(2nd part),Third task (2nd part),
Fourt task,Fifth task,Sixth task),
  starts=
  as.POSIXct(strptime(
  
c(2010/01/01,2010/07/01,2010/10/01,2011/04/01,2011/07/01,2011/07/01,2012/01/01,2012/07/01),
  format=Ymd.format)),
  ends=
  as.POSIXct(strptime(
  
c(2010/06/30,2010/09/31,2011/03/31,2011/06/31,2011/12/31,2012/06/30,2012/06/30,2012/12/31),
  format=Ymd.format)),
  priorities=c(1,2,3,4,5))
 vgridpos-as.POSIXct(strptime(c(2010/01/01,2010/04/01,2010/07/01,
  2010/10/01,2011/01/01,2011/04/01,2011/07/01,2011/10/01,
  2012/01/01,2012/04/01,2012/07/01,2010/10/01),format=Ymd.format))
 vgridlab-
  c(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)
 gantt.chart(gantt.info,main=My First AIRC Grant Gantt Chart,
  priority.legend=TRUE,vgridpos=vgridpos,vgridlab=vgridlab,hgrid=TRUE)

Error in if (any(x$starts  x$ends)) stop(Can't have a start date after an end 
date) : 
  missing value where TRUE/FALSE needed


Thanks 


Gabriele Zoppoli, MD
Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University of 
Genova, Genova, Italy
Guest Researcher, LMP, NCI, NIH, Bethesda MD

Work: 301-451-8575
Mobile: 301-204-5642
Email: zoppo...@mail.nih.gov
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New methods for generic functions show and print : some visible with ls(), some not

2010-02-27 Thread Joris Meys
Thank you both for your answers.

On Fri, Feb 26, 2010 at 7:58 PM, Duncan Murdoch murd...@stats.uwo.cawrote:


 You aren't seeing the print method, you are seeing a newly created print
 generic function.  As Uwe mentioned, print() is not an S4 generic, so when
 you create your print method, a new S4 generic also gets created.  You
 should be using show(), which will be called by print() when necessary.

 When you say clear the memory, I'm not sure what you have in mind, but S4
 methods are not stored in your workspace, so rm(list=ls()) won't delete
 them.  You need removeMethod() to get rid of a method.

 Duncan Murdoch


What I meant with clear the memory is exactly rm(list=ls()). I use the
same analysis on different rather big datasets, so I have to make some space
once in a while. To lose the print generic (thx for the correction) every
time I considered highly inconvenient. I'll use the show method, thank you
both for the tip.

Do I understand it right that every generic I define in a normal scriptfile
is saved in the workspace, and thus can be removed with rm() ?

Cheers
Joris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Phil Spector

Tim -
   I don't understand what you mean about interleaving rows.  I'm guessing
that you want a single large data frame with all the data, and not a 
list with each year separately.  If that's the case:


x = 
read.table('http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat',
header=FALSE,fill=TRUE,skip=13)
cts = apply(x,1,function(x)sum(is.na(x)))
wh = which(cts == 12)
start = wh+1
end = c(wh[-1] - 1,nrow(x))
ans = mapply(function(i,j)x[i:j,],start,end,SIMPLIFY=FALSE)
names(ans) = x[wh,1]
alldat = do.call(rbind,ans)
alldat$year = rep(names(ans),sapply(ans,nrow))
names(alldat) = c('day',month.name,'year')

On the other hand, if you want a long data frame with month, day, year 
and value:


longdat = reshape(alldat,idvar=c('day','year'),
  varying=list(month.name),direction='long',times=month.name)
names(longdat)[c(3,4)] = c('Month','value')

Next , if you want to create a Date variable:

longdat = transform(longdat,date=as.Date(paste(Month,day,year),'%B %d %Y'))
longdat = na.omit(longdat)
longdat = longdat[order(longdat$date),]

and finally:

zoodat = zoo(longdat$value,longdat$date)

which should be suitable for time series analysis.

Hope this helps.
- Phil

On Sat, 27 Feb 2010, Tim Coote wrote:

Thanks, Gabor. My take away from this and Phil's post is that I'm going to 
have to construct some code to do the parsing, rather than use a standard 
function. I'm afraid that neither approach works, yet:


Gabor's gets has an off-by-one error (days start on the 2nd, not the first), 
and the years get messed up around the 29th day.  I think that na.omit (DF) 
line is throwing out the baby with the bathwater.  It's interesting that this 
approach is based on read.table, I'd assumed that I'd need read.ftable, which 
I couldn't understand the documentation for.  What is it that's removing the 
-999 and -888 values in this code -they seem to be gone, but I cannot see 
why.


Phil's reads in the data, but interleaves rows with just a year and all other 
values as NA.


Tim
On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:


Mark Leeds pointed out to me that the code wrapped around in the post
so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]


On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:

Try this.  First we read the raw lines into R using grep to remove any
lines containing a character that is not a number or space.  Then we
look for the year lines and repeat them down V1 using cumsum.  Finally
we omit the year lines.

myURL - 
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;

raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)


On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org 
wrote:

Hullo
I'm trying to read some time series data of meteorological records that 
are

available on the web (eg
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat). I'd
like to be able to read in the digital data directly into R. However, I
cannot work out the right function and set of parameters to use.  It 
could

be that the only practical route is to write a parser, possibly in some
other language,  reformat the files and then read these into R. As far as 
I

can tell, the informal grammar of the file is:

comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+

and the daily readings are of the form:
whitespace day number [whitespace reading on day of month] 12

Readings for days in months where a day does not exist have special 
values.

Missing values have a different special value.

And then I've got the problem of iterating over all relevant files to get 
a

whole timeseries.

Is there a way to read in this type of file into R? I've read all of the
examples that I can find, but cannot work out how to do it. I don't think
that read.table can handle the separate sections of data representing 
each
year. read.ftable maybe can be coerced to parse the data, but I cannot 
see
how after reading the documentation and experimenting with the 
parameters.


I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.

Any help/suggestions would be greatly appreciated. I can see that this 
type
of issue is likely to grow in importance, and I'd also like to give the 
data

owners suggestions on how to reformat their data so that it is easier to
consume by machines, while being easy to read for humans.

The early records are a serious machine parsing challenge as they are 
tiff

images of old notebooks ;-)

tia

Tim
Tim Coote
t...@coote.org
vincit veritas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] Combining 2 columns into 1 column many times in a very large dataset

2010-02-27 Thread Sherri Rose
*Combining  2 columns into 1 column many times in a very large dataset*

The clumsy solutions I am working on are not going to be very fast if I can
get them to work and the true dataset is ~1500 X 45000 so they need to be
efficient. I've searched the R help files and the archives for this list and
have some possible workable solutions for 2) and 3) but not my question 1).
However, I include 2) and 3) in case anyone has recommendations that would
be efficient.

Here is a toy example of the data structure:
pop = data.frame(status = rbinom(n, 1, .42), sex = rbinom(n, 1, .5),
age = round(rnorm(n, mean=40, 10)), disType = rbinom(n, 1, .2),
rs123=c(1,3,1,3,3,1,1,1,3,1), rs123.1=rep(1, n),
rs157=c(2,4,2,2,2,4,4,4,2,2),
rs157.1=c(4,4,4,2,4,4,4,4,2,2),  rs132=c(4,4,4,4,4,4,4,4,2,2),
rs132.1=c(4,4,4,4,4,4,4,4,4,4))

Thus, there are a few columns of basic demographic info and then the rest of
the columns are biallelic SNP info.  Ex: rs123 is allele 1 of rs123 and
rs123.1 is the second allele of rs123.

1) I need to merge all the biallelic SNP data that is currently in 2 columns
into 1 column, so, for example: rs123 and rs123.1 into one column (but
within the dataset):
11
31
11
31
31
11
11
11
31
11
2) I need to identify the least frequent SNP value (in the above example it
is 31).
3) I need to replace the least frequent SNP value with 1 and the other(s)
with 0.

Thank you for any assistance,
-S.R.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New methods for generic functions show and print : some visible with ls(), some not

2010-02-27 Thread Duncan Murdoch

On 27/02/2010 6:15 PM, Joris Meys wrote:

Thank you both for your answers.

On Fri, Feb 26, 2010 at 7:58 PM, Duncan Murdoch murd...@stats.uwo.cawrote:


You aren't seeing the print method, you are seeing a newly created print
generic function.  As Uwe mentioned, print() is not an S4 generic, so when
you create your print method, a new S4 generic also gets created.  You
should be using show(), which will be called by print() when necessary.

When you say clear the memory, I'm not sure what you have in mind, but S4
methods are not stored in your workspace, so rm(list=ls()) won't delete
them.  You need removeMethod() to get rid of a method.

Duncan Murdoch



What I meant with clear the memory is exactly rm(list=ls()). I use the
same analysis on different rather big datasets, so I have to make some space
once in a while. To lose the print generic (thx for the correction) every
time I considered highly inconvenient. I'll use the show method, thank you
both for the tip.

Do I understand it right that every generic I define in a normal scriptfile
is saved in the workspace, and thus can be removed with rm() ?


I think so if you do it in the normal way.  It's possible to create them 
elsewhere, so every generic might not mean every generic.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change the scale on a barplot's y axis

2010-02-27 Thread Thomas Levine
I have grades data. I read them from a csv in letter-grade format. I
then converted them to levels

levels(grades$grade)=c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

And then to numbers

grades$gp=grades$grade
levels(grades$gp)=c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)
grades$gp=as.numeric(as.character(grades$gp))

And I'm plotting them in a barplot

barplot(gp[order(gp)],width=n[order(gp)],ylab=Class Median
Grade,xlab=Class, scaled to number of students in the
class,main=Class Median Grades for Cornell University weighted by
class size)

I would like to change the scale on the bar graph such that it reads

c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

in the locations

c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)

Any ideas?

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reducing a matrix

2010-02-27 Thread Juliet Ndukum
I wish to rearrange the matrix, df, such that all there are not repeated x 
values. Particularly, for each value of x that is reated, the corresponded y 
value should fall under the appropriate column.  For example, the x value 3 
appears 4 times under the different columns of y, i.e. y1,y2,y3,y4. The output 
should be such that for the lone value of 3 selected for x, the corresponding 
row entries with be 7 under column y1, 16 under column y2, 12 under column y3 
and 18 under column y4. This should work for  the other rows of x with repeated 
values.
df
   x y1 y2 y3 y4
1   3  7 NA NA NA
2   3 NA 16 NA NA
3   3 NA NA 12 NA
4   3 NA NA NA 18
5   6  8 NA NA NA
6  10 NA NA  2 NA
7  10 NA 11 NA NA
8  14 NA NA NA  8
9  14 NA  9 NA NA
10 15 NA NA NA 11
11 50 NA NA 13 NA
12 50 20 NA NA NA

The output should be:

   x y1 y2 y3 y4
1   3  7 16 12 18
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA 9 NA  8
5 15 NA NA NA 11
6 50 20 NA 13 NA

Can any write for me a code that would produce these results.
Thank you in advance for your help.

JN


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing a matrix

2010-02-27 Thread Linlin Yan
Try this:
df[!duplicated(df[,'x']),]

On Sun, Feb 28, 2010 at 8:56 AM, Juliet Ndukum jpnts...@yahoo.com wrote:
 I wish to rearrange the matrix, df, such that all there are not repeated x 
 values. Particularly, for each value of x that is reated, the corresponded y 
 value should fall under the appropriate column.  For example, the x value 3 
 appears 4 times under the different columns of y, i.e. y1,y2,y3,y4. The 
 output should be such that for the lone value of 3 selected for x, the 
 corresponding row entries with be 7 under column y1, 16 under column y2, 12 
 under column y3 and 18 under column y4. This should work for  the other rows 
 of x with repeated values.
 df
   x y1 y2 y3 y4
 1   3  7 NA NA NA
 2   3 NA 16 NA NA
 3   3 NA NA 12 NA
 4   3 NA NA NA 18
 5   6  8 NA NA NA
 6  10 NA NA  2 NA
 7  10 NA 11 NA NA
 8  14 NA NA NA  8
 9  14 NA  9 NA NA
 10 15 NA NA NA 11
 11 50 NA NA 13 NA
 12 50 20 NA NA NA

 The output should be:

   x y1 y2 y3 y4
 1   3  7 16 12 18
 2   6  8 NA NA NA
 3  10 NA 11  2 NA
 4  14 NA 9 NA  8
 5 15 NA NA NA 11
 6 50 20 NA 13 NA

 Can any write for me a code that would produce these results.
 Thank you in advance for your help.

 JN



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 6:17 PM, Phil Spector wrote:


Tim -
  I don't understand what you mean about interleaving rows.  I'm  
guessing
that you want a single large data frame with all the data, and not a  
list with each year separately.  If that's the case:


x = read.table('http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat' 
,

   header=FALSE,fill=TRUE,skip=13)
cts = apply(x,1,function(x)sum(is.na(x)))
wh = which(cts == 12)
start = wh+1
end = c(wh[-1] - 1,nrow(x))
ans = mapply(function(i,j)x[i:j,],start,end,SIMPLIFY=FALSE)
names(ans) = x[wh,1]
alldat = do.call(rbind,ans)
alldat$year = rep(names(ans),sapply(ans,nrow))
names(alldat) = c('day',month.name,'year')

On the other hand, if you want a long data frame with month, day,  
year and value:


longdat = reshape(alldat,idvar=c('day','year'),
  
varying=list(month.name),direction='long',times=month.name)

names(longdat)[c(3,4)] = c('Month','value')

Next , if you want to create a Date variable:

longdat = transform(longdat,date=as.Date(paste(Month,day,year),'%B  
%d %Y'))

longdat = na.omit(longdat)
longdat = longdat[order(longdat$date),]

and finally:

zoodat = zoo(longdat$value,longdat$date)

which should be suitable for time series analysis.


OK, I think I get it:

(From Gabor's DF)

 dta - data.matrix(DF[, -c(1,14)])
 dtafrm -data.frame(rdta=dta[!is.na(dta)],
  d.o.m= DF[row(dta)[!is.na(dta)], 1],
  month= col(dta)[!is.na(dta)],
  year=DF[row(dta)[!is.na(dta)], 14])

 library(zoo)

 zoodat2 - with(dtafrm, zoo(rdta, as.Date(paste(month, d.o.m,  
year), %m %d %Y)))

 str(zoodat2)
‘zoo’ series from 1910-01-01 to 1919-12-31
  Data: num [1:3652] 6.4 6.5 6.3 6.7 6.7 6.8 7 7.1 7.1 7.2 ...
  Index: Class 'Date'  num [1:3652] -21915 -21914 -21913 -21912  
-21911 ...







Hope this helps.
   - Phil

On Sat, 27 Feb 2010, Tim Coote wrote:

Thanks, Gabor. My take away from this and Phil's post is that I'm  
going to have to construct some code to do the parsing, rather than  
use a standard function. I'm afraid that neither approach works, yet:


Gabor's gets has an off-by-one error (days start on the 2nd, not  
the first), and the years get messed up around the 29th day.  I  
think that na.omit (DF) line is throwing out the baby with the  
bathwater.  It's interesting that this approach is based on  
read.table, I'd assumed that I'd need read.ftable, which I couldn't  
understand the documentation for.  What is it that's removing the  
-999 and -888 values in this code -they seem to be gone, but I  
cannot see why.


Phil's reads in the data, but interleaves rows with just a year and  
all other values as NA.


Tim
On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:

Mark Leeds pointed out to me that the code wrapped around in the  
post

so it may not be obvious that the regular expression in the grep is
(i.e. it contains a space):
[^ 0-9.]
On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
Try this.  First we read the raw lines into R using grep to  
remove any
lines containing a character that is not a number or space.  Then  
we
look for the year lines and repeat them down V1 using cumsum.   
Finally

we omit the year lines.
myURL - http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat 


raw.lines - readLines(myURL)
DF - read.table(textConnection(raw.lines[!grepl([^
0-9.],raw.lines)]), fill = TRUE)
DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
DF - na.omit(DF)
head(DF)
On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org 
 wrote:

Hullo
I'm trying to read some time series data of meteorological  
records that are

available on the web (eg
http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat) 
. I'd
like to be able to read in the digital data directly into R.  
However, I
cannot work out the right function and set of parameters to  
use.  It could
be that the only practical route is to write a parser, possibly  
in some
other language,  reformat the files and then read these into R.  
As far as I

can tell, the informal grammar of the file is:
comments terminated by a blank line
[year number on a line on its own
daily readings lines ]+
and the daily readings are of the form:
whitespace day number [whitespace reading on day of  
month] 12
Readings for days in months where a day does not exist have  
special values.

Missing values have a different special value.
And then I've got the problem of iterating over all relevant  
files to get a

whole timeseries.
Is there a way to read in this type of file into R? I've read  
all of the
examples that I can find, but cannot work out how to do it. I  
don't think
that read.table can handle the separate sections of data  
representing each
year. read.ftable maybe can be coerced to parse the data, but I  
cannot see
how after reading the documentation and experimenting with the  
parameters.


Re: [R] Automate generation of multiple reports using odfWeave

2010-02-27 Thread Max Kuhn
 On a more complicated note, is there a way to embed the station name in a
 header or footer of the document? It seems there is no way to evaluate a
 chunk or an inline \Sexpr{...} in a header or footer?
 This would put station ID on every report page, making reading  comparing
 multiple reports much easier. Right now, if reports are converted to PDF,
 they all have title \Sexpr{listString(letters[1:5])} making navigation
 between them very cumbersome. I could adjust the title in ODT, but again,
 cannot embed any variable into it. Is there a way to set the title from
 odfWeave?

I have a feeling that we took that functionality out at some point (I
think when we moved to only sweaving the content.xml file). The bit
with listString was a test that I used and it has remained in the
document.

-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change the scale on a barplot's y axis

2010-02-27 Thread S Ellison
Thomas,

You could perhaps do a tad better by simply adding a right-hand-side
axis using axis():

axis(4, at=c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7),
labels=c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-'),
las=1)

That way you have both numeric and grade scales.

if you want a left-hand grade scale only, first suppress the axes in the
barplot using axes=FALSE, and then add the axes using axis(1) and
axis(2,..) with the ... as above.

Incidentally, I'm not sure I'd have converted your numbers that way, but
if it's worked it's worked.

Steve E
 Thomas Levine thomas.lev...@gmail.com 02/28/10 12:44 AM 
I have grades data. I read them from a csv in letter-grade format. I
then converted them to levels

levels(grades$grade)=c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

And then to numbers

grades$gp=grades$grade
levels(grades$gp)=c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)
grades$gp=as.numeric(as.character(grades$gp))

And I'm plotting them in a barplot

barplot(gp[order(gp)],width=n[order(gp)],ylab=Class Median
Grade,xlab=Class, scaled to number of students in the
class,main=Class Median Grades for Cornell University weighted by
class size)

I would like to change the scale on the bar graph such that it reads

c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

in the locations

c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)

Any ideas?

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing a matrix

2010-02-27 Thread Jorge Ivan Velez
Hi Juliet,

Here is a suggestion using aggregate():

# aux function
foo - function(x){
y - sum(x, na.rm = TRUE)
ifelse(y==0, NA, y)
}

# result
aggregate(df[,-1], list(df$x), foo)

Here, df is your data.

HTH,
Jorge


On Sat, Feb 27, 2010 at 7:56 PM, Juliet Ndukum  wrote:

 I wish to rearrange the matrix, df, such that all there are not repeated x
 values. Particularly, for each value of x that is reated, the corresponded y
 value should fall under the appropriate column.  For example, the x value 3
 appears 4 times under the different columns of y, i.e. y1,y2,y3,y4. The
 output should be such that for the lone value of 3 selected for x, the
 corresponding row entries with be 7 under column y1, 16 under column y2, 12
 under column y3 and 18 under column y4. This should work for  the other rows
 of x with repeated values.
 df
   x y1 y2 y3 y4
 1   3  7 NA NA NA
 2   3 NA 16 NA NA
 3   3 NA NA 12 NA
 4   3 NA NA NA 18
 5   6  8 NA NA NA
 6  10 NA NA  2 NA
 7  10 NA 11 NA NA
 8  14 NA NA NA  8
 9  14 NA  9 NA NA
 10 15 NA NA NA 11
 11 50 NA NA 13 NA
 12 50 20 NA NA NA

 The output should be:

   x y1 y2 y3 y4
 1   3  7 16 12 18
 2   6  8 NA NA NA
 3  10 NA 11  2 NA
 4  14 NA 9 NA  8
 5 15 NA NA NA 11
 6 50 20 NA 13 NA

 Can any write for me a code that would produce these results.
 Thank you in advance for your help.

 JN



[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change the scale on a barplot's y axis

2010-02-27 Thread Thomas Levine
Yay! That's perfect. Thanks, Steve!

Tom

2010/2/27 S Ellison s.elli...@lgc.co.uk:
 Thomas,

 You could perhaps do a tad better by simply adding a right-hand-side
 axis using axis():

 axis(4, at=c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7),
 labels=c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-'),
 las=1)

 That way you have both numeric and grade scales.

 if you want a left-hand grade scale only, first suppress the axes in the
 barplot using axes=FALSE, and then add the axes using axis(1) and
 axis(2,..) with the ... as above.

 Incidentally, I'm not sure I'd have converted your numbers that way, but
 if it's worked it's worked.

 Steve E
 Thomas Levine thomas.lev...@gmail.com 02/28/10 12:44 AM 
 I have grades data. I read them from a csv in letter-grade format. I
 then converted them to levels

 levels(grades$grade)=c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

 And then to numbers

 grades$gp=grades$grade
 levels(grades$gp)=c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)
 grades$gp=as.numeric(as.character(grades$gp))

 And I'm plotting them in a barplot

 barplot(gp[order(gp)],width=n[order(gp)],ylab=Class Median
 Grade,xlab=Class, scaled to number of students in the
 class,main=Class Median Grades for Cornell University weighted by
 class size)

 I would like to change the scale on the bar graph such that it reads

 c('A+','A','A-','B+','B','B-','C+','C','C-','D+','D','D-')

 in the locations

 c(4.3,4.0,3.7, 3.3,3.0,2.7, 2.3,2.0,1.7, 1.3,1.0,0.7)

 Any ideas?

 Tom

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 ***
 This email and any attachments are confidential. Any u...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining 2 columns into 1 column many times in a very large datasetB

2010-02-27 Thread Phil Spector

Sherri -
   Here's one way:

nms = c('rs123','rs157','rs132')
lowf = function(one,two){
 both = paste(pop[[one]],pop[[two]],sep='')
 tt = table(both)
 lowfreq = names(tt)[which.min(tt)]
 ifelse(both == lowfreq,1,0)
}
res = mapply(lowf,nms,paste(nms,'.1',sep=''),SIMPLIFY=FALSE)
names(res) = paste(names(res),'_new',sep='')
pop = data.frame(pop,res)

It doesn't deal with the case of ties with regard to the frequency
of the SNP value, but it should be easy to modify if that's 
an issue.


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Sat, 27 Feb 2010, Sherri Rose wrote:


*Combining  2 columns into 1 column many times in a very large dataset*

The clumsy solutions I am working on are not going to be very fast if I can
get them to work and the true dataset is ~1500 X 45000 so they need to be
efficient. I've searched the R help files and the archives for this list and
have some possible workable solutions for 2) and 3) but not my question 1).
However, I include 2) and 3) in case anyone has recommendations that would
be efficient.

Here is a toy example of the data structure:
pop = data.frame(status = rbinom(n, 1, .42), sex = rbinom(n, 1, .5),
age = round(rnorm(n, mean=40, 10)), disType = rbinom(n, 1, .2),
rs123=c(1,3,1,3,3,1,1,1,3,1), rs123.1=rep(1, n),
rs157=c(2,4,2,2,2,4,4,4,2,2),
rs157.1=c(4,4,4,2,4,4,4,4,2,2),  rs132=c(4,4,4,4,4,4,4,4,2,2),
rs132.1=c(4,4,4,4,4,4,4,4,4,4))

Thus, there are a few columns of basic demographic info and then the rest of
the columns are biallelic SNP info.  Ex: rs123 is allele 1 of rs123 and
rs123.1 is the second allele of rs123.

1) I need to merge all the biallelic SNP data that is currently in 2 columns
into 1 column, so, for example: rs123 and rs123.1 into one column (but
within the dataset):
11
31
11
31
31
11
11
11
31
11
2) I need to identify the least frequent SNP value (in the above example it
is 31).
3) I need to replace the least frequent SNP value with 1 and the other(s)
with 0.

Thank you for any assistance,
-S.R.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which system.time() component to use?

2010-02-27 Thread Ravi Varadhan

Hi,

The `system.time(expr)' command provide 3 different times for evaluating the 
expression `expr'; the first two are user and system CPUs and the third one is 
total elapsed time.  Suppose I want to compare two different computational 
procedures for performing the same task, which component of `system.time' is 
most meaningful in the sense that it most accurately reflects the computational 
effort of the algorithm, and does not depend upon the idiosyncrasies of the 
operating system.  

I have always been using the first component of `system.time', which is the 
user CPU.  Should I use the sum of user and system CPU or is the total elapsed 
time a better measure?  I would appreciate UseR's feedback on this.  

Thanks very much.

Best,
Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing a matrix

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 8:43 PM, Jorge Ivan Velez wrote:


Hi Juliet,

Here is a suggestion using aggregate():

# aux function
foo - function(x){
   y - sum(x, na.rm = TRUE)
   ifelse(y==0, NA, y)
   }

# result
aggregate(df[,-1], list(df$x), foo)


That does work in this example but might give unexpected results if  
there were sums to 0 of paired -7 and 7's or even multiple values of  
any sort. (Throwing an error might be a good thing if multiple values  
in groups were not expected, but such is not reported as an error in  
this code. ) If the OP :wanted just the first non-NA value within her  
groups then:


 aggregate(df[,-1], list(df$x), function(x) ifelse(
 all(is.na(x)),
  NA,
  na.exclude(x)[1]))
  Group.1 y1 y2 y3 y4
1   3  7 16 12 18
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA  9 NA  8
5  15 NA NA NA 11
6  50 20 NA 13 NA

Munging the example

 df[2,2] - 6

 aggregate(df[,-1], list(df$x), function(x) ifelse(all(is.na(x)),  
NA, na.exclude(x)[1]))

  Group.1 y1 y2 y3 y4
1   3  7 16 12 18   # first value taken.
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA  9 NA  8
5  15 NA NA NA 11
6  50 20 NA 13 NA
 foo - function(x){
+y - sum(x, na.rm = TRUE)
+ifelse(y==0, NA, y)
+}

 # result
 aggregate(df[,-1], list(df$x), foo)
  Group.1 y1 y2 y3 y4
1   3 13 16 12 18# summed values appear.
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA  9 NA  8
5  15 NA NA NA 11
6  50 20 NA 13 NA



Here, df is your data.

HTH,
Jorge


On Sat, Feb 27, 2010 at 7:56 PM, Juliet Ndukum  wrote:

I wish to rearrange the matrix, df, such that all there are not  
repeated x
values. Particularly, for each value of x that is reated, the  
corresponded y
value should fall under the appropriate column.  For example, the x  
value 3
appears 4 times under the different columns of y, i.e. y1,y2,y3,y4.  
The
output should be such that for the lone value of 3 selected for x,  
the
corresponding row entries with be 7 under column y1, 16 under  
column y2, 12
under column y3 and 18 under column y4. This should work for  the  
other rows

of x with repeated values.
df
 x y1 y2 y3 y4
1   3  7 NA NA NA
2   3 NA 16 NA NA
3   3 NA NA 12 NA
4   3 NA NA NA 18
5   6  8 NA NA NA
6  10 NA NA  2 NA
7  10 NA 11 NA NA
8  14 NA NA NA  8
9  14 NA  9 NA NA
10 15 NA NA NA 11
11 50 NA NA 13 NA
12 50 20 NA NA NA

The output should be:

 x y1 y2 y3 y4
1   3  7 16 12 18
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA 9 NA  8
5 15 NA NA NA 11
6 50 20 NA 13 NA

Can any write for me a code that would produce these results.
Thank you in advance for your help.

JN



  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which system.time() component to use?

2010-02-27 Thread Gabor Grothendieck
Try this:

 system.time(Sys.sleep(60))
   user  system elapsed
   0.000.00   60.05
 pt - proc.time(); Sys.sleep(60); proc.time() - pt
   user  system elapsed
   0.000.00   60.01

On Sat, Feb 27, 2010 at 9:33 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:

 Hi,

 The `system.time(expr)' command provide 3 different times for evaluating the 
 expression `expr'; the first two are user and system CPUs and the third one 
 is total elapsed time.  Suppose I want to compare two different computational 
 procedures for performing the same task, which component of `system.time' is 
 most meaningful in the sense that it most accurately reflects the 
 computational effort of the algorithm, and does not depend upon the 
 idiosyncrasies of the operating system.

 I have always been using the first component of `system.time', which is the 
 user CPU.  Should I use the sum of user and system CPU or is the total 
 elapsed time a better measure?  I would appreciate UseR's feedback on this.

 Thanks very much.

 Best,
 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lapply with data frame

2010-02-27 Thread Noah Silverman

I'm a bit confused on how to use lapply with a data.frame.

For example.

lapply(data, function(x) print(x))

WHAT exactly is passed to the function.  Is it each ROW in the data 
frame, one by one, or each column, or the entire frame in one shot?


What I want to do apply a function to each row in the data frame.  Is 
lapply the right way.


A second application is to normalize a column value by group.  For 
example, if I have the following table:

idgroupvalue  norm
1A3.2
2A3.0
3A3.1
4B5.5
5B6.0
6B6.2
etc...

The long version would be:
foreach (group in unique(data$group)){
data$norm[group==group] - data$value[group==group] / 
sum(data$value[group==group])

}

There must be a faster way to do this with lapply.  (Ideally, I'd then 
use mclapply to run on multi-cores and really crank up the speed.)


Any suggestions?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in tapply when reordering levels of a factor

2010-02-27 Thread Thomas Levine
I have this

 grades$grade
...
[4009] A  B  A- A- A- B+ A  A- B+ B  A  B  B  B  A  A- A  A- A- B+ A- A  A  B+
[4033] A- A- A- A  A- B  A  A  A- A
Levels: A A- A+ B B- B+ C  C+

I want to change the order of the levels

 reorder(grades$grade,c('A+','A','A-','B+','B','B-','C+','C'))
Error in tapply(X, x, FUN, ...) : arguments must have same length

What am I doing wrong? Thanks

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply with data frame

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 9:49 PM, Noah Silverman wrote:


I'm a bit confused on how to use lapply with a data.frame.

For example.

lapply(data, function(x) print(x))

WHAT exactly is passed to the function.  Is it each ROW in the data  
frame,


No.


one by one, or each column,


Yes. Dataframes are lists of columns.


or the entire frame in one shot?

What I want to do apply a function to each row in the data frame.   
Is lapply the right way.


No. Use apply(dtfrm, 1, ..)



A second application is to normalize a column value by group.


Which is, as you suggested, a different problem for which apply()  
would not be particularly useful because you have a group. Hence  
tapply or one of its variants, aggregate() or by() would be used:


For your example, I am guessing that:

tapply(dfrm$value, dtrm$group, sum)

... might be more economical (at least in single core practice.)


--
David


 For example, if I have the following table:
idgroupvalue  norm
1A3.2
2A3.0
3A3.1
4B5.5
5B6.0
6B6.2


I could not quite figure out how that might have been printed on a  
console,  since there are more variable names than columns



etc...


Yes. I do think there is more than you are revealing.



The long version would be:


foreach is not a base function:


foreach (group in unique(data$group)){
   data$norm[group==group] - data$value[group==group] / sum(data 
$value[group==group])

}

There must be a faster way to do this with lapply.  (Ideally, I'd  
then use mclapply to run on multi-cores and really crank up the  
speed.)


Learn your basics first. libraries or packages need to be specified.



Any suggestions?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply with data frame

2010-02-27 Thread jim holtman
 x - read.table(textConnection(idgroupvalue
+ 1A3.2
+ 2A3.0
+ 3A3.1
+ 4B5.5
+ 5B6.0
+ 6B6.2), header=TRUE)
 # dataframe is processed by column by lapply
 lapply(x, c)
$id
[1] 1 2 3 4 5 6

$group
[1] 1 1 1 2 2 2

$value
[1] 3.2 3.0 3.1 5.5 6.0 6.2

 # normalize by group
 x$norm - ave(x$value, x$group, FUN=function(a) a / sum(a))
 x
  id group value  norm
1  1 A   3.2 0.3440860
2  2 A   3.0 0.3225806
3  3 A   3.1 0.333
4  4 B   5.5 0.3107345
5  5 B   6.0 0.3389831
6  6 B   6.2 0.3502825


On Sat, Feb 27, 2010 at 9:49 PM, Noah Silverman n...@smartmediacorp.com wrote:
 I'm a bit confused on how to use lapply with a data.frame.

 For example.

 lapply(data, function(x) print(x))

 WHAT exactly is passed to the function.  Is it each ROW in the data frame,
 one by one, or each column, or the entire frame in one shot?

 What I want to do apply a function to each row in the data frame.  Is lapply
 the right way.

 A second application is to normalize a column value by group.  For example,
 if I have the following table:
 id    group    value      norm
 1    A            3.2
 2    A            3.0
 3    A            3.1
 4    B            5.5
 5    B            6.0
 6    B            6.2
 etc...

 The long version would be:
 foreach (group in unique(data$group)){
    data$norm[group==group] - data$value[group==group] /
 sum(data$value[group==group])
 }

 There must be a faster way to do this with lapply.  (Ideally, I'd then use
 mclapply to run on multi-cores and really crank up the speed.)

 Any suggestions?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in tapply when reordering levels of a factor

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 10:01 PM, Thomas Levine wrote:


I have this


grades$grade

...
[4009] A  B  A- A- A- B+ A  A- B+ B  A  B  B  B  A  A- A  A- A- B+  
A- A  A  B+

[4033] A- A- A- A  A- B  A  A  A- A
Levels: A A- A+ B B- B+ C  C+

I want to change the order of the levels


reorder(grades$grade,)


Try instead:

grades$grades - factor(grades$grades,
 levels= c('A+','A','A-','B+','B','B-','C+','C')


Error in tapply(X, x, FUN, ...) : arguments must have same length

What am I doing wrong? Thanks

Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from web data sources

2010-02-27 Thread Gabor Grothendieck
Here is a continuation to turn DF into a zoo series:   It depends on
the fact that all NAs are structural, i.e. they indicate dates which
cannot exist such as Feb 31 as opposed to missing data.  dd is the
data as one long series with component names being the dates in the
indicated format.  That is converted to a zoo series in the next
statement using Date class:

dd - na.omit(unlist(by(DF[2:13], DF$year, c)))

library(zoo)
z - zoo(unname(dd), as.Date(names(dd), %Y.%b%d))

Here are the first few and last few in z:
 head(z)
1910-01-01 1910-01-02 1910-01-03 1910-01-04 1910-01-05 1910-01-06
   6.46.56.36.76.76.8
 tail(z)
1919-12-26 1919-12-27 1919-12-28 1919-12-29 1919-12-30 1919-12-31
   6.76.66.66.56.46.4



On Sat, Feb 27, 2010 at 4:33 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 No one else posted so the other post you are referring to must have
 been an email to you, not a post.  We did not see it.

 By one off I think you are referring to the row names, which are
 meaningless, rather than the day numbers.  The data for day 1 is
 present, not missing.  The example code did replace the day number
 column with the year since the days were just sequential and therefore
 derivable but its trivial to keep them if that is important to you and
 we have made that change below.

 The previous code used grep to kick out lines that had any character
 not in the set: minus sign, space and digit but in this version we add
 minus sign to that set.   We also corrected the year column and added
 column names and converted all -999 strings to NA.  Due to this last
 point we cannot use na.omit any more but we now have iy available that
 distinguishes between year rows and other rows.

 Every line here has been indented so anything that starts at the left
 column must have been word wrapped in transmission.

  myURL - 
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
  raw.lines - readLines(myURL)
  DF - read.table(textConnection(raw.lines[!grepl([^- 0-9.], raw.lines)]),
    fill = TRUE, col.names = c(day, month.abb), na.strings = -999)

  iy - is.na(DF[[2]]) # is year row
  DF$year - DF[iy, 1][cumsum(iy)]
  DF - DF[!iy, ]

  DF


 On Sat, Feb 27, 2010 at 3:28 PM, Tim Coote tim+r-project@coote.org 
 wrote:
 Thanks, Gabor. My take away from this and Phil's post is that I'm going to

 I think the other `post`` must have been directly to you.  We didn`t see it.

 have to construct some code to do the parsing, rather than use a standard
 function. I'm afraid that neither approach works, yet:

 Gabor's gets has an off-by-one error (days start on the 2nd, not the first),
 and the years get messed up around the 29th day.  I think that na.omit (DF)
 line is throwing out the baby with the bathwater.  It's interesting that
 this approach is based on read.table, I'd assumed that I'd need read.ftable,
 which I couldn't understand the documentation for.  What is it that's
 removing the -999 and -888 values in this code -they seem to be gone, but I
 cannot see why.

 Phil's reads in the data, but interleaves rows with just a year and all
 other values as NA.

 Tim
 On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote:

 Mark Leeds pointed out to me that the code wrapped around in the post
 so it may not be obvious that the regular expression in the grep is
 (i.e. it contains a space):
 [^ 0-9.]


 On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Try this.  First we read the raw lines into R using grep to remove any
 lines containing a character that is not a number or space.  Then we
 look for the year lines and repeat them down V1 using cumsum.  Finally
 we omit the year lines.

 myURL -
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat;
 raw.lines - readLines(myURL)
 DF - read.table(textConnection(raw.lines[!grepl([^
 0-9.],raw.lines)]), fill = TRUE)
 DF$V1 - DF[cumsum(is.na(DF[[2]])), 1]
 DF - na.omit(DF)
 head(DF)


 On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote tim+r-project@coote.org
 wrote:

 Hullo
 I'm trying to read some time series data of meteorological records that
 are
 available on the web (eg
 http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat).
 I'd
 like to be able to read in the digital data directly into R. However, I
 cannot work out the right function and set of parameters to use.  It
 could
 be that the only practical route is to write a parser, possibly in some
 other language,  reformat the files and then read these into R. As far
 as I
 can tell, the informal grammar of the file is:

 comments terminated by a blank line
 [year number on a line on its own
 daily readings lines ]+

 and the daily readings are of the form:
 whitespace day number [whitespace reading on day of month] 12

 Readings for days in months where a day does not exist have special
 values.
 Missing values have a different special value.

 And then 

[R] Editing a function

2010-02-27 Thread learner1978

I am beginner to R.

I have written a function:

f= function(n=100,p=0.5){
X=rbinom(100,n,p)
(mean(X)-n*P)/sqrt(n*p*(1-p))
}

But I made a mistake by typing P instead of p. How do I edit this
function and improve my mistake. If I use edit(f) it opens an edit window
where I am able to change the function but when I type f I see the same
old function. R does not seem to save my change even though it prompts me to
save before I close the edit window. I do not want to retype the whole
function all over again.

-- 
View this message in context: 
http://n4.nabble.com/Editing-a-function-tp1572251p1572251.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pass an array of array from Java to R- Rserve

2010-02-27 Thread Rameswara Sashi Kiran Challa
hello all,

Could someone please tell me how should I pass a double[][] (matrix of any
size) that I have in Java, into R using Rserve.

Thanks
Sashikiran



-- 
Sashikiran Challa
MS Cheminformatics,
School of Informatics and Computing,
Indiana University, Bloomington,IN
scha...@indiana.edu
812-606-3254

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bwplot() {lattice}

2010-02-27 Thread Peng Cai
Thanks a lot Deepayan, one question:

Is it possible to place these barplots side-by-side instaed of super
imposing? Something like this:
http://www.imachordata.com/wp-content/uploads/2009/09/boxplot.png

library(lattice)
bwplot(yield ~ variety, data = barley, col = 1, pch = 16,
  panel = panel.superpose, panel.groups = panel.bwplot,
  auto.key=list(space=right),
  groups = year, scales=(x=list(rot=45)))

Thanks,
Peng

On Fri, Feb 26, 2010 at 3:51 AM, Deepayan Sarkar
deepayan.sar...@gmail.comwrote:

 On Fri, Feb 26, 2010 at 8:30 AM, Peng Cai pengcaimaill...@gmail.com
 wrote:
  Hi All,
 
  I'm trying to plot boxplot graph. I tried barchart with groups= option
 and
  it worked fine. But when I try to generate same kind of graph using
  bwplot(), groups= option doesn't seem to work. Though this works,
 
  yield ~ variety | site * year
 
  I'm thinking why groups= doesn't work in this case, can anyone help
  please...

 Let's see...you have exactly one observation per site/variety/year
 combination (otherwise the barchart wouldn't have made sense). So in
 the boxplot you want (which is supposed to summarize a distribution,
 not a single point), you only have that single point to plot. For
 that, you can use

 dotplot(yield ~ variety | site, data = barley, auto.key = TRUE,
groups = year, layout = c(6,1), scales=(x=list(rot=45)))

 If you try to come up with a more sensible example, you would realize
 that boxplots are already grouped (the grouping variable is the
 categorical variable in the formula y ~ x, not the 'groups' argument).
 Compare

 ## Is this really what you want?
 bwplot(yield ~ variety, data = barley, col = 1, pch = 16,
   panel = panel.superpose, panel.groups = panel.bwplot,
   groups = year, scales=(x=list(rot=45)))

 bwplot(yield ~ year | variety, data = barley,
   scales=(x=list(rot=45)), layout = c(10, 1))

 -Deepayan


 
  #Code:
  library(lattice)
  barchart(yield ~ variety | site, data = barley,
  groups = year, layout = c(1,6),
   auto.key = list(points = FALSE, rectangles = TRUE, space = right))
 
  bwplot(yield ~ variety | site, data = barley,
  groups = year, layout = c(6,1), scales=(x=list(rot=45)),
   auto.key = list(points = FALSE, rectangles = TRUE, space = right))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Editing a function

2010-02-27 Thread Ben Bolker
learner1978 sakp4mcl at gmail.com writes:

 
 
 I am beginner to R.
 
 I have written a function:
 
 f= function(n=100,p=0.5){
 X=rbinom(100,n,p)
 (mean(X)-n*P)/sqrt(n*p*(1-p))
 }
 
 But I made a mistake by typing P instead of p. How do I edit this
 function and improve my mistake.

  Two answers:

(1) [short term] fix(f)

(2) [long term] develop your code in a text editor (Tinn-R, emacs,
  R source code editor accessible from the menu ...) and cut  paste 
  or source() as necessary.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Editing a function

2010-02-27 Thread David Winsemius


On Feb 27, 2010, at 11:15 PM, Ben Bolker wrote:


learner1978 sakp4mcl at gmail.com writes:




I am beginner to R.

I have written a function:

f= function(n=100,p=0.5){
X=rbinom(100,n,p)
(mean(X)-n*P)/sqrt(n*p*(1-p))
}

But I made a mistake by typing P instead of p. How do I edit this
function and improve my mistake.


 Two answers:

(1) [short term] fix(f)

(2) [long term] develop your code in a text editor (Tinn-R, emacs,
 R source code editor accessible from the menu ...) and cut  paste
 or source() as necessary.


3) up-arrow?



--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which system.time() component to use?

2010-02-27 Thread Ravi Varadhan

Thanks, Gabor.  Your reply is helpful, but it still doesn't answer whether I 
should use the sum of the first two components of system.time (user + system 
CPU) or only the first one (user CPU).  

Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Gabor Grothendieck ggrothendi...@gmail.com
Date: Saturday, February 27, 2010 9:47 pm
Subject: Re: [R] Which system.time() component to use?
To: Ravi Varadhan rvarad...@jhmi.edu
Cc: r-help@r-project.org


 Try this:
 
  system.time(Sys.sleep(60))
user  system elapsed
0.000.00   60.05
  pt - proc.time(); Sys.sleep(60); proc.time() - pt
user  system elapsed
0.000.00   60.01
 
 On Sat, Feb 27, 2010 at 9:33 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Hi,
 
  The `system.time(expr)' command provide 3 different times for 
 evaluating the expression `expr'; the first two are user and system 
 CPUs and the third one is total elapsed time.  Suppose I want to 
 compare two different computational procedures for performing the same 
 task, which component of `system.time' is most meaningful in the sense 
 that it most accurately reflects the computational effort of the 
 algorithm, and does not depend upon the idiosyncrasies of the 
 operating system.
 
  I have always been using the first component of `system.time', which 
 is the user CPU.  Should I use the sum of user and system CPU or is 
 the total elapsed time a better measure?  I would appreciate UseR's 
 feedback on this.
 
  Thanks very much.
 
  Best,
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
 
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing a matrix

2010-02-27 Thread jim holtman
Will this work for you:

 x - read.table(textConnection(  x y1 y2 y3 y4
+ 1   3  7 NA NA NA
+ 2   3 NA 16 NA NA
+ 3   3 NA NA 12 NA
+ 4   3 NA NA NA 18
+ 5   6  8 NA NA NA
+ 6  10 NA NA  2 NA
+ 7  10 NA 11 NA NA
+ 8  14 NA NA NA  8
+ 9  14 NA  9 NA NA
+ 10 15 NA NA NA 11
+ 11 50 NA NA 13 NA
+ 12 50 20 NA NA NA), header=TRUE)

 t(sapply(split(x, x$x), function(.grp){
+ sapply(.grp, function(.col) .col[which.max(!is.na(.col))])
+ }))
x y1 y2 y3 y4
3   3  7 16 12 18
6   6  8 NA NA NA
10 10 NA 11  2 NA
14 14 NA  9 NA  8
15 15 NA NA NA 11
50 50 20 NA 13 NA



On Sat, Feb 27, 2010 at 7:56 PM, Juliet Ndukum jpnts...@yahoo.com wrote:
 I wish to rearrange the matrix, df, such that all there are not repeated x 
 values. Particularly, for each value of x that is reated, the corresponded y 
 value should fall under the appropriate column.  For example, the x value 3 
 appears 4 times under the different columns of y, i.e. y1,y2,y3,y4. The 
 output should be such that for the lone value of 3 selected for x, the 
 corresponding row entries with be 7 under column y1, 16 under column y2, 12 
 under column y3 and 18 under column y4. This should work for  the other rows 
 of x with repeated values.
 df
   x y1 y2 y3 y4
 1   3  7 NA NA NA
 2   3 NA 16 NA NA
 3   3 NA NA 12 NA
 4   3 NA NA NA 18
 5   6  8 NA NA NA
 6  10 NA NA  2 NA
 7  10 NA 11 NA NA
 8  14 NA NA NA  8
 9  14 NA  9 NA NA
 10 15 NA NA NA 11
 11 50 NA NA 13 NA
 12 50 20 NA NA NA

 The output should be:

   x y1 y2 y3 y4
 1   3  7 16 12 18
 2   6  8 NA NA NA
 3  10 NA 11  2 NA
 4  14 NA 9 NA  8
 5 15 NA NA NA 11
 6 50 20 NA 13 NA

 Can any write for me a code that would produce these results.
 Thank you in advance for your help.

 JN



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which system.time() component to use?

2010-02-27 Thread Gabor Grothendieck
The last component seems the most meaningful since its the amount of
time you actually waited for the code to run.

On Sat, Feb 27, 2010 at 11:44 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:

 Thanks, Gabor.  Your reply is helpful, but it still doesn't answer whether I 
 should use the sum of the first two components of system.time (user + system 
 CPU) or only the first one (user CPU).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu


 - Original Message -
 From: Gabor Grothendieck ggrothendi...@gmail.com
 Date: Saturday, February 27, 2010 9:47 pm
 Subject: Re: [R] Which system.time() component to use?
 To: Ravi Varadhan rvarad...@jhmi.edu
 Cc: r-help@r-project.org


 Try this:

  system.time(Sys.sleep(60))
    user  system elapsed
    0.00    0.00   60.05
  pt - proc.time(); Sys.sleep(60); proc.time() - pt
    user  system elapsed
    0.00    0.00   60.01

 On Sat, Feb 27, 2010 at 9:33 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Hi,
 
  The `system.time(expr)' command provide 3 different times for
 evaluating the expression `expr'; the first two are user and system
 CPUs and the third one is total elapsed time.  Suppose I want to
 compare two different computational procedures for performing the same
 task, which component of `system.time' is most meaningful in the sense
 that it most accurately reflects the computational effort of the
 algorithm, and does not depend upon the idiosyncrasies of the
 operating system.
 
  I have always been using the first component of `system.time', which
 is the user CPU.  Should I use the sum of user and system CPU or is
 the total elapsed time a better measure?  I would appreciate UseR's
 feedback on this.
 
  Thanks very much.
 
  Best,
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
 
  __
  R-help@r-project.org mailing list
 
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which system.time() component to use?

2010-02-27 Thread Gabor Grothendieck
Also you might want to try out this which will repeatedly run your
benchmarks to average out the values and make comparisons easier:

http://rbenchmark.googlecode.com



On Sat, Feb 27, 2010 at 11:55 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 The last component seems the most meaningful since its the amount of
 time you actually waited for the code to run.

 On Sat, Feb 27, 2010 at 11:44 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:

 Thanks, Gabor.  Your reply is helpful, but it still doesn't answer whether I 
 should use the sum of the first two components of system.time (user + system 
 CPU) or only the first one (user CPU).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu


 - Original Message -
 From: Gabor Grothendieck ggrothendi...@gmail.com
 Date: Saturday, February 27, 2010 9:47 pm
 Subject: Re: [R] Which system.time() component to use?
 To: Ravi Varadhan rvarad...@jhmi.edu
 Cc: r-help@r-project.org


 Try this:

  system.time(Sys.sleep(60))
    user  system elapsed
    0.00    0.00   60.05
  pt - proc.time(); Sys.sleep(60); proc.time() - pt
    user  system elapsed
    0.00    0.00   60.01

 On Sat, Feb 27, 2010 at 9:33 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Hi,
 
  The `system.time(expr)' command provide 3 different times for
 evaluating the expression `expr'; the first two are user and system
 CPUs and the third one is total elapsed time.  Suppose I want to
 compare two different computational procedures for performing the same
 task, which component of `system.time' is most meaningful in the sense
 that it most accurately reflects the computational effort of the
 algorithm, and does not depend upon the idiosyncrasies of the
 operating system.
 
  I have always been using the first component of `system.time', which
 is the user CPU.  Should I use the sum of user and system CPU or is
 the total elapsed time a better measure?  I would appreciate UseR's
 feedback on this.
 
  Thanks very much.
 
  Best,
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
 
  __
  R-help@r-project.org mailing list
 
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which system.time() component to use?

2010-02-27 Thread jim holtman
A lot depends on what you are trying to measure.

You should add the system and user CPU times to get a better idea of
the CPU utilization.  For some classes of problems it might be good to
separate them if you were doing a lot of I/O or other system calls
that might be using time, but for 99% of the cases adding them is the
way to go.

You also want to look at elapsed times.  If the script is CPU bound
the elapsed and total CPU times should be close.  In the case that
Gabor gave of sleeping for 60 seconds, no CPU time was used, but it
was 60 seconds of elapsed time.  If there is a big difference, it
might be due to a lot of I/O or possible paging if you did not have
enough memory.

On Sat, Feb 27, 2010 at 11:44 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:

 Thanks, Gabor.  Your reply is helpful, but it still doesn't answer whether I 
 should use the sum of the first two components of system.time (user + system 
 CPU) or only the first one (user CPU).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu


 - Original Message -
 From: Gabor Grothendieck ggrothendi...@gmail.com
 Date: Saturday, February 27, 2010 9:47 pm
 Subject: Re: [R] Which system.time() component to use?
 To: Ravi Varadhan rvarad...@jhmi.edu
 Cc: r-help@r-project.org


 Try this:

  system.time(Sys.sleep(60))
    user  system elapsed
    0.00    0.00   60.05
  pt - proc.time(); Sys.sleep(60); proc.time() - pt
    user  system elapsed
    0.00    0.00   60.01

 On Sat, Feb 27, 2010 at 9:33 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Hi,
 
  The `system.time(expr)' command provide 3 different times for
 evaluating the expression `expr'; the first two are user and system
 CPUs and the third one is total elapsed time.  Suppose I want to
 compare two different computational procedures for performing the same
 task, which component of `system.time' is most meaningful in the sense
 that it most accurately reflects the computational effort of the
 algorithm, and does not depend upon the idiosyncrasies of the
 operating system.
 
  I have always been using the first component of `system.time', which
 is the user CPU.  Should I use the sum of user and system CPU or is
 the total elapsed time a better measure?  I would appreciate UseR's
 feedback on this.
 
  Thanks very much.
 
  Best,
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
 
  __
  R-help@r-project.org mailing list
 
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with Gantt chart

2010-02-27 Thread jim holtman
You might want to debug your data.  Anytime there is an error message,
take a look at your data.  You have some illegal dates in your 'end'
(2010/9/31  2010/6/31 are not legal and are probably causing your
error).  Simply printing out your test data would have shown that:

 gantt.info
$labels
[1] First task Second task (1st part) Third task (1st
part)  Second task (2nd part) Third task (2nd part)
[6] Fourt task Fifth task Sixth task

$starts
[1] 2010-01-01 EST 2010-07-01 EDT 2010-10-01 EDT 2011-04-01
EDT 2011-07-01 EDT 2011-07-01 EDT 2012-01-01 EST
[8] 2012-07-01 EDT

$ends
[1] 2010-06-30 EDT NA   2011-03-31 EDT NA
 2011-12-31 EST 2012-06-30 EDT 2012-06-30 EDT
[8] 2012-12-31 EST

$priorities
[1] 1 2 3 4 5




On Sat, Feb 27, 2010 at 6:07 PM, Zoppoli, Gabriele (NIH/NCI) [G]
zoppo...@mail.nih.gov wrote:
 Hi,

 I don't know to solve this error that is returned, even though I understand 
 it:

 library(plotrix)

 Ymd.format-%Y/%m/%d
  gantt.info-list(labels=
  c(First task,Second task (1st part),Third task (1st part),Second task 
 (2nd part),Third task (2nd part),
        Fourt task,Fifth task,Sixth task),
  starts=
  as.POSIXct(strptime(
  c(2010/01/01,2010/07/01,2010/10/01,2011/04/01,2011/07/01,2011/07/01,2012/01/01,2012/07/01),
  format=Ymd.format)),
  ends=
  as.POSIXct(strptime(
  c(2010/06/30,2010/09/31,2011/03/31,2011/06/31,2011/12/31,2012/06/30,2012/06/30,2012/12/31),
  format=Ymd.format)),
  priorities=c(1,2,3,4,5))
  vgridpos-as.POSIXct(strptime(c(2010/01/01,2010/04/01,2010/07/01,
  2010/10/01,2011/01/01,2011/04/01,2011/07/01,2011/10/01,
  2012/01/01,2012/04/01,2012/07/01,2010/10/01),format=Ymd.format))
  vgridlab-
  c(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)
  gantt.chart(gantt.info,main=My First AIRC Grant Gantt Chart,
  priority.legend=TRUE,vgridpos=vgridpos,vgridlab=vgridlab,hgrid=TRUE)

 Error in if (any(x$starts  x$ends)) stop(Can't have a start date after an 
 end date) :
  missing value where TRUE/FALSE needed


 Thanks


 Gabriele Zoppoli, MD
 Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University 
 of Genova, Genova, Italy
 Guest Researcher, LMP, NCI, NIH, Bethesda MD

 Work: 301-451-8575
 Mobile: 301-204-5642
 Email: zoppo...@mail.nih.gov
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Types of missingness

2010-02-27 Thread Christian Raschke
Dear R-List,

My questions concerns missing values. Specifically, is is possible to 
use different types of missingness in a dataset and not a 
one-size-fits-all NA?
For example, data may be missing because of an outright refusal by a 
respondent to answer a question, or because she didn't know an answer, 
or because the item simply did not apply. In later analysis it is 
sometimes useful to be able to distinguish between the cases, but 
nonetheless have them all treated as missing when using, say, lm( ).
In Stata this is possible by using different missing value indicators. 
The standard one is a period '.' whereas '.a' and '.b' etc are treated 
as missing too, but can all be distinguished from another (they are even 
ordinal such that .  .a  .b).
To give a simplistic example in R, let

  dat - data.frame(
+ hours = c(36, 40, 40, 0, 37.5, 0, 36, 20, 40),
+ wage = c( 15.5, 7.5, 8, -1, 17.5, -1, -2, 13, -2))
  dat
   hours wage
1  36.0 15.5
2  40.0  7.5
3  40.0  8.0
4   0.0 -1.0
5  37.5 17.5
6   0.0 -1.0
7  36.0 -2.0
8  20.0 13.0
9  40.0 -2.0


where for wages -1 indicates didn't work and -2 indicates refused to 
respond. How could I replace the negative values for wages with 
missingness indicators to use the data frame in for instance lm( ), but 
later operate only on those observations who refused to respond?
Of course I can always work around this somehow, especially in this easy 
example, but as data frames get larger and cases more complex the 
workarounds seem more and more klutzy to me.
So, if there is an easy way to do this that I have overlooked, I would 
be grateful for any advice or references.

Best,
Christian

-- 
Christian Raschke
Department of Economics
and
ISDS Research Lab (HSRG)
Louisiana State University
Patrick Taylor Hall, Rm 2128
Baton Rouge, LA 70803
cras...@lsu.edu


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing a matrix

2010-02-27 Thread Dimitris Rizopoulos

if you don't mind having zeros instead of NAs, then yet another solution is:

df - read.table(textConnection(x y1 y2 y3 y4
1   3  7 NA NA NA
2   3 NA 16 NA NA
3   3 NA NA 12 NA
4   3 NA NA NA 18
5   6  8 NA NA NA
6  10 NA NA  2 NA
7  10 NA 11 NA NA
8  14 NA NA NA  8
9  14 NA  9 NA NA
10 15 NA NA NA 11
11 50 NA NA 13 NA
12 50 20 NA NA NA), header = TRUE)
closeAllConnections()

out - rowsum(df[-1], df$x, na.rm = TRUE)
out$x - as.numeric(row.names(out))
out


I hope it helps.

Best,
Dimitris


On 2/28/2010 1:56 AM, Juliet Ndukum wrote:

I wish to rearrange the matrix, df, such that all there are not repeated x 
values. Particularly, for each value of x that is reated, the corresponded y 
value should fall under the appropriate column.  For example, the x value 3 
appears 4 times under the different columns of y, i.e. y1,y2,y3,y4. The output 
should be such that for the lone value of 3 selected for x, the corresponding 
row entries with be 7 under column y1, 16 under column y2, 12 under column y3 
and 18 under column y4. This should work for  the other rows of x with repeated 
values.
df
x y1 y2 y3 y4
1   3  7 NA NA NA
2   3 NA 16 NA NA
3   3 NA NA 12 NA
4   3 NA NA NA 18
5   6  8 NA NA NA
6  10 NA NA  2 NA
7  10 NA 11 NA NA
8  14 NA NA NA  8
9  14 NA  9 NA NA
10 15 NA NA NA 11
11 50 NA NA 13 NA
12 50 20 NA NA NA

The output should be:

x y1 y2 y3 y4
1   3  7 16 12 18
2   6  8 NA NA NA
3  10 NA 11  2 NA
4  14 NA 9 NA  8
5 15 NA NA NA 11
6 50 20 NA 13 NA

Can any write for me a code that would produce these results.
Thank you in advance for your help.

JN



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.