date:20110801



On Aug 1, 2011, at 3:41 AM, Paola Lecca wrote:


Dear R users,

I'm trying to fit a set an ODE to an experimental time series. In the
attachment you find the R code I wrote using modFit and modCost of FME
package and the file of the time series.


This is getting a bit tiresome. None of the three duplicate such  
messages have had successful attachment of any code. Why don't you  
look at what got distributed to the list?


The rule I have developed is that I should assume that any file not  
ending in .pdf or .txt will get scrubbed by the mail server. I realize  
it is not an exact rule, but it keeps me from submitting files ending  
in .r or .rdata because I know they will get scrubbed. For some reason  
a recent rewrite of the Posting Guide appears to have left out this  
information which my memory tells me used to be there last year.


--
David.


When I run summary(Fit) I obtain this error message, and the values  
of the

parameters are equal to the initial guesses I gave  to them.

The problem is not due to the fact that I have only one equation (I  
tried

also with more equations, but I still obtain this error).

I would appreciate if someone could help me in understanding the  
reason of

the error and in fixing it.

Thanks for your attention,
Paola Lecca.

Here the error:


summary(Fit)


Parameters:
Estimate Std. Error t value Pr(|t|)
pro1_strength1 NA  NA   NA

Residual standard error: 2.124 on 10 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(Fit) : Cannot estimate covariance; system is  
singular





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
*Paola Lecca, PhD*
*The Microsoft Research - University of Trento*
*Centre for Computational and Systems Biology*
*Piazza Manci 17 38123 Povo/Trento, Italy*
*Phome: +39 0461282843*
*Fax: +39 0461282814*
wild_pp1_mrna.txt__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with modFit of FME package 2

2011-08-01 Thread Paola Lecca

* Apologies for multiple posting *
I attached to my previous e-mail a .r file, and it was not permitted by the
rules of the mailing lis. Again, please receive my sincere apologies for
this.

I re-send again the e-mail with .txt attachemnt in the hope someone an help
me to solve my problem.


I'm trying to fit a set an ODE to an experimental time series. In the
attachment you find the R code I wrote using modFit and modCost of FME
package and the file of the time series.

When I run summary(Fit) I obtain this error message, and the values of the
parameters are equal to the initial guesses I gave  to them.

The problem is not due to the fact that I have only one equation (I tried
also with more equations, but I still obtain this error).

I would appreciate if someone could help me in understanding the reason of
the error and in fixing it.

Thanks for your attention,
Paola Lecca.

Here the error:

 summary(Fit)

Parameters:
 Estimate Std. Error t value Pr(|t|)
pro1_strength1 NA  NA   NA

Residual standard error: 2.124 on 10 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(Fit) : Cannot estimate covariance; system is singular




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
*Paola Lecca, PhD*
*The Microsoft Research - University of Trento*
*Centre for Computational and Systems Biology*
*Piazza Manci 17 38123 Povo/Trento, Italy*
*Phome: +39 0461282843*
*Fax: +39 0461282814*
timepp1_mrna
0   0
2   2.754
4   2.958
6   4.058
8   3.41
10  3.459
12  2.453
14  1.234
16  2.385
18  3.691
20  3.252
require(deSolve)
require(FME)


##
#  PART 1   
 #
##

# Differential equations
model_1_part_1 - function(t, S, parameters)
 {
  with(as.list(parameters), {
  #
  cod1 = pro1_strength
  #
  pp1_mrna_degradation_rate - 1
  
###
  #
  v1 = cod1
  v2 = pp1_mrna_degradation_rate * S[1]
  #
  
#
  #
  dS1 = v1 - v2
  #
  

  #
  list(c(dS1))
 })
}


# Parameters
parms_part_1 - c(pro1_strength = 1.0)

# Initial values of the species concentration
S - c(pp1_mrna = 0)

times - seq(0, 20, by = 2)

# Solve the system
ode_solutions_part_1 - ode(S, times, model_1_part_1, parms = parms_part_1)
ode_solutions_part_1

summary(ode_solutions_part_1)

## Default plot method
plot(ode_solutions_part_1)





# Estimate of the parameters
experiment - read.table(./wild_pp1_mrna.txt, header=TRUE)

rw - dim(experiment)[1]

names - array(, rw)
for (i in 1:rw)
{
 names[i] - pp1_mrna
}
names
  
observed_data_part_1 - data.frame(name = names,
   time = experiment[,1], val = experiment[,2])

observed_data_part_1

ode_solutions_part_1

Cost_function - function (pars) {
 out - ode_solutions_part_1
 cost - modCost(model = out, obs = observed_data_part_1, y = val)
 cost
 }

Cost_function(parms)

# Fit the model to the observed data
Fit - modFit(f = Cost_function, p = parms_part_1)
Fit


# Summary of the fit
summary(Fit)

# Model coefficients
coef(Fit)

# Deviance of the fit
deviance(Fit)__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odp: converting factor to numeric gives NAs introduced by coercion

If you are not going to be using factors, then you can keep everything
a character (if there are non-numerics in a column) by adding
'as.is=TRUE' as a parameter on the 'rad.table' functions.

On Mon, Aug 1, 2011 at 7:55 AM, Petr PIKAL petr.pi...@precheza.cz wrote:
 Hi

 Hi,

 I have a dataframe that I imported from a .txt file by:

 skogTemp - read.delim2(Skogaryd_shoot_data.txt, header=TRUE,
 fill=TRUE)

 and the data are factors, how can avoid factors from the beginning?
 Although
 the file contains both characters and numbers.

 You have got an answer but here are some comments. If you have characters
 and numbers in one column the character values are converted to NA by
 as.numeric


 I tried to convert some of the columns from factor to numeric and as I
 understood it you can not use only as.numeric but as.character first. I
 got
 this warning message:

  skogTemp_1 - as.numeric(as.character(skogTemp_1[,2:4]))
 Warning message:
 NAs introduced by coercion

 What is skogTemp_1? I presume skogTemp is data frame and in that case you
 can not use such construction directly.


 I have lots of NAs in my data. Tries to check what class I had now but
 another warning is given me:
  class(skogTemp_1[,2])


 skogTemp_1 is probably a vector with only one dimension therefore you get
 this error.
 class(skogTemp_1)

 shall give you the desired result, however I prefer

 ?str

 Regards
 Petr

 Error in skogTemp_1[, 2] : incorrect number of dimensions
  class(skogTemp_1[1,2])
 Error in skogTemp_1[1, 2] : incorrect number of dimensions

 frustrating... I don't know what this mean.

 Can anyone help?

 Thank you,
 Angelica



 --
 View this message in context: http://r.789695.n4.nabble.com/converting-

 factor-to-numeric-gives-NAs-introduced-by-coercion-tp3703408p3703408.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] export/import matrix



On Jul 31, 2011, at 7:54 PM, Rosario Garcia Gil wrote:


Hello

I have a problem on keeping the format when I export a matrix file  
with the write.table() function.




The quick answer is ... don't do that. Use save() if you want to  
preserve the attributes of an R object. And that especially applies if  
you don't understand the differences between R object types.


I have discarded a longer answer that complained about your failure to  
provide complete code.

--
David

When I import the data volcano from rgl package it looks like this  
in R:




data[1:5,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [, 
13] [,14]
[1,]  100  100  101  101  101  101  101  100  100   100   101
101   102   102
[2,]  101  101  102  102  102  102  102  101  101   101   102
102   103   103
[3,]  102  102  103  103  103  103  103  102  102   102   103
103   104   104
[4,]  103  103  104  104  104  104  104  103  103   103   103
104   104   104
[5,]  104  104  105  105  105  105  105  104  104   103   104
104   105   105


I use this data to represent a 3D map with the follwing script and  
it works PEFECT!



y- 2*data
x - 10* (1:nrow(y))
z - 10* (1:ncol(y))
ylim - range(y)
ylen -ylim[2] - ylim[1] + 1
colorlut - terrain.colors(ylen)
col - colorlut[y-ylim[1] + 1]
rgl.open()
rgl.surface(x,z,y, color=col, back=lines)



Then I export it as write.table(data, file=datam.txt,  
row.names=TRUE, col.names=TRUE),


when I import it back into R again with read.table(datam.txt) it  
looks like this in R:



  V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17  
V18 V19
1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102  
103 104 103
2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103  
104 105 104
3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104  
105 106 105
4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105  
106 107 106
5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106  
107 108 108


The script I mention before does not anymore work on it, if I  
converted to matrix with as.matrix still does not work.


I have read the pdf on import/export of R and searched by googleling  
but I have not found any answer to my problem.


I am sorry if the answer is very obvious but I have tried for more  
than a week.


Any help is really wellcome, thanks in advance.
Rosario
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] export/import matrix

If you are just exporting it so you can read it back into R later, it
is better to use save/load since it keep the data in the internal
format so it will look the same.

Can you describe what you are going to be doing with the data that you
'export'; that might help us come up with a solution to your problem.

On Sun, Jul 31, 2011 at 7:54 PM, Rosario Garcia Gil
m.rosario.gar...@slu.se wrote:
 Hello

 I have a problem on keeping the format when I export a matrix file with the 
 write.table() function.

 When I import the data volcano from rgl package it looks like this in R:


 data[1:5,]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]  100  100  101  101  101  101  101  100  100   100   101   101   102   
 102
 [2,]  101  101  102  102  102  102  102  101  101   101   102   102   103   
 103
 [3,]  102  102  103  103  103  103  103  102  102   102   103   103   104   
 104
 [4,]  103  103  104  104  104  104  104  103  103   103   103   104   104   
 104
 [5,]  104  104  105  105  105  105  105  104  104   103   104   104   105   
 105

 I use this data to represent a 3D map with the follwing script and it works 
 PEFECT!

 y- 2*data
 x - 10* (1:nrow(y))
 z - 10* (1:ncol(y))
 ylim - range(y)
 ylen -ylim[2] - ylim[1] + 1
 colorlut - terrain.colors(ylen)
 col - colorlut[y-ylim[1] + 1]
 rgl.open()
 rgl.surface(x,z,y, color=col, back=lines)


 Then I export it as write.table(data, file=datam.txt, row.names=TRUE, 
 col.names=TRUE),

 when I import it back into R again with read.table(datam.txt) it looks like 
 this in R:


   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103
 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 104 105 104
 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 105 106 105
 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 106 107 106
 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 107 108 108

 The script I mention before does not anymore work on it, if I converted to 
 matrix with as.matrix still does not work.

 I have read the pdf on import/export of R and searched by googleling but I 
 have not found any answer to my problem.

 I am sorry if the answer is very obvious but I have tried for more than a 
 week.

 Any help is really wellcome, thanks in advance.
 Rosario
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use dump or write? or what?

Can you define better exactly what you what to do with the data.  I
would suggest that you keep each of the outputs (objects) of the test
in a 'list' that way you can access each one and do what you need.
You can also 'save' the list and later 'load' it into another session.

On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Greetings all,
 I am calculating two t-test values for each of many files then save it
 to file calculate another set and append, repeat.
 But I can't figure out how to write it to file and then append
 subsequent t-tests.
 (maybe too tired ;} )
 I have tried to use dump and file.append to no avial.

 ttest_results = tempfile()

 two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE)
 welch_ttest - t.test (tempA, tempB, var.equal = FALSE)

 dump (two_sample_ttest, file = dumpdata.txt, append=TRUE)
 ttest_results - file.append (ttest_results, two_sample_ttest)

 Any suggestions,
 M
 --



 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-08-01 Thread Paul Menzel

Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
 Glad to help -- I haven't taken a look at Dennis' solution (which may be far
 better than mine), but if you do want to keep going down the path outlined
 below you might consider the following:

I will try Dennis’ solution right away but looked at your suggestions
first. Thank you very much.

 Instead of throwing away a simulation if something starts negative, why not
 just multiply the entire sample by -1: that lets you still use the sample
 and saves you some computations: of course you'll have to remember to adjust
 your final results accordingly.

That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.

 This might avoid the loop:
 
 x = ## Whatever x is.
 xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
 which.max((x=0)  (xLag 0)) + 1 # Depending on how you've decided to count
 things, this +1 may be extraneous.

 The inner expression sets a 0 except where there is a switch from negative
 to positive and a one there: the which.max function returns the location of
 the first maximum, which is the first 1, in the vector. If you are
 guaranteed the run starts negative, then the location of the first positive
 should give you the length of the negative run.

That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(»everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.

Additionally my check `(x=0)  (xLag 0)` is redundant when we know we
start with a negative value. `(x=0)` should be good enough in this
case.

 This all gives you,
 
 f4 - function(n = 10, # number of simulations
length = 10) # length of iterated sum
 {
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 
 R = apply(R,1,cumsum)
 
   R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first 
 element in the row is positive, flip the entire row

The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.

R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]

 fTemp - function(x) {
 
 xLag = c(0,x[-length(x)])
 return(which.max((x=0)  (xLag 0))+1)
 
 countNegative = apply(R,2,fTemp)
 tabulate(as.vector(countNegative), length)
  }
 
 That just crashed my computer though, so I wouldn't recommend it for large
 n,length.

Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.

 Instead, you can help a little by combining the lagging and the 
 all in one.
 
 f4 - function(n = 10, llength = 10)
 {
 R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 R = apply(R,1,cumsum)
 R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row 
 is positive, flip the entire row
 R = (cbind(rep(0,NROW(R)),R)0)(cbind(R,rep(0,NROW(R)))=0)
 countNegative = apply(R,1,which.max) + 1
 return (tabulate(as.vector(countNegative), length) )
 
  }

I left that one out, because as written above the check can be
shortened.

 Of course, this is all starting to approach a very specific question that
 could actually be approached much more efficiently if it's your end goal
 (though I think I remember from your first email a different end goal):

That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.

 We can use the symmetry and restartability of RW to do the following:
 
 x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T)
 D  = diff(which(x == 0))

Nice!

 This will give you a vector of how long x stays positive or negative at a
 time. Thinking through some simple translations lets you see that this set
 has the same distribution as how long a RW that starts negative stays
 negative.

I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.

 Again, this is only good for answering a very specific question
 about random walks and may not be useful if you have other more complicated
 questions in sight.

Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.

I implemented all your suggestions and got

[R] Problem Fixed: axes label

2011-08-01 Thread ogbos okike

Hi Peter,
Many thanks. It worked.
Regards
Ogbos

On 1 August 2011 14:05, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2011-08-01 03:32, ogbos okike wrote:

 Dear All,
 I am trying to put 10^-8 st km^-2day^-1 on x-axis of my plot. I tried
 using
 : ylab = expression(paste(st / , plain(km)^2,  / day)) to see if I can
 at least get the unit before thinking about the power of 10  (10^-8).

 However, ylab = expression(paste(st / , plain(km)^2,  / day)) didn't
 give the result I expected. The power 2 in km was missing.


 Works for me. But I don't see the need for paste() or plain().
 Try this:

  plot(0, ylab=, xlab=)
  title(ylab = expression(10^{-8} ~ st ~ km^{-2} ~ day^{-1}))

 Replace any '~' with '*' if you don't want the space.

 Peter Ehlers

 


 I will be glad for any help on how to label 10^-8 st km^-2day^-1 on the
 axis.
 Many thanks
 Regards
 Ogbos

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use dump or write? or what?

2011-08-01 Thread Matt Curcio

Greetings all,
Thanks for all your help so far.
Let me give a better idea of what I am doing.  I have hundreds of
files that I need to plow thru with a t-test and correlation test.
BTW, 'tempA' and tempB' are simply columns of numbers from a gene-chip
experiment that spits out dna 'amounts'. So I have set up a loop to
read the files and carry out the tests but need to save it for later
inspection (and Jim H-you are probably right, for later inspection).
By inspection I mean I don't know what I want to do with it yet,
Remember: That's why they call it Research.

So it seems that 'save/load' might be a good alternative for my work.
Any suggestions,
M

On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Greetings all,
 I am calculating two t-test values for each of many files then save it
 to file calculate another set and append, repeat.
 But I can't figure out how to write it to file and then append
 subsequent t-tests.
 (maybe too tired ;} )
 I have tried to use dump and file.append to no avial.

 ttest_results = tempfile()

 two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE)
 welch_ttest - t.test (tempA, tempB, var.equal = FALSE)

 dump (two_sample_ttest, file = dumpdata.txt, append=TRUE)
 ttest_results - file.append (ttest_results, two_sample_ttest)

 Any suggestions,
 M
 --



 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com




-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] formula used by R to compute the t-values in a linear regression

2011-08-01 Thread Samuel Le

Hello,



I was wondering if someone knows the formula used by the function lm to compute 
the t-values.



I am trying to implement a linear regression myself. Assuming that I have K 
variables, and N observations, the formula I am using is:

For the k-th variable, t-value= b_k/sigma_k



With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) 
_kk is its standard deviation.



I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))



With sigma: the estimated standard deviation of the residuals,

Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)



With:

N: number of observations

K: number of variables



This formula comes from my old course of econometrics.

For some reason it doesn't match the t-value produced by R (I am off by about 
1%). I can match the other results produced by R (coefficients of the 
regression, r squared, etc.).



I would be grateful if someone could provide some clarifications.



Samuel


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula used by R to compute the t-values in a linear regression



On Aug 1, 2011, at 9:27 AM, Samuel Le wrote:


Hello,


I was wondering if someone knows the formula used by the function lm  
to compute the t-values.


I am trying to implement a linear regression myself. Assuming that I  
have K variables, and N observations, the formula I am using is:


For the k-th variable, t-value= b_k/sigma_k

With b_k is the coefficient for the k-th variable, and sigma_k  
=(t(x) x )^(-1) _kk is its standard deviation.


I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))

With sigma: the estimated standard deviation of the residuals,

Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)

With:

N: number of observations

K: number of variables

This formula comes from my old course of econometrics.

For some reason it doesn't match the t-value produced by R (I am off  
by about 1%). I can match the other results produced by R  
(coefficients of the regression, r squared, etc.).


Usually such a small difference results from using different degrees  
of freedom. Have you reduced the df's appropriately after considering  
the number of other estimated parameters? Just quoting code from you  
econometrics reference is not enough to answer the question. We would  
need to see code... as the message states at the end of every posting.)




I would be grateful if someone could provide some clarifications.



Samuel


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula used by R to compute the t-values in a linear regression

2011-08-01 Thread peter dalgaard


On Aug 1, 2011, at 15:27 , Samuel Le wrote:

 Hello,
 
 
 
 I was wondering if someone knows the formula used by the function lm to 
 compute the t-values.
 
 
 
 I am trying to implement a linear regression myself. Assuming that I have K 
 variables, and N observations, the formula I am using is:
 
 For the k-th variable, t-value= b_k/sigma_k
 
 
 
 With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x 
 )^(-1) _kk is its standard deviation.
 
 
 
 I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))
 
 
 
 With sigma: the estimated standard deviation of the residuals,
 
 Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)
 
 
 
 With:
 
 N: number of observations
 
 K: number of variables
 
 
 
 This formula comes from my old course of econometrics.
 
 For some reason it doesn't match the t-value produced by R (I am off by about 
 1%). I can match the other results produced by R (coefficients of the 
 regression, r squared, etc.).
 
 
 
 I would be grateful if someone could provide some clarifications.


AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k 
involves matrix inversion. Also, even for K=1, beware that textbook formulas 
like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and 
easily loses multiple digits of precision, so software tends to use rather more 
careful algorithms.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
Døden skal tape! --- Nordahl Grieg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading name-value data

2011-08-01 Thread Hadley Wickham

Yes!  Would you mind filing an issue so I dont forget?
Hadley

On Friday, July 29, 2011, Stavros Macrakis macra...@alum.mit.edu wrote:
 Perfect!  Thanks!
 By the way, I see that, unlike base rbind, it does not work for vectors
and lists:
 rbind(c(a=1),c(b=2)) = matrix(1:2,2,1,dimnames=list(NULL,a))
== as.matrix(data.frame(a=1:2))
 but
  rbind.fill(c(a=1),c(b=2)) = NULL
 Shouldn't it give something like
  matrix(c(1,NA,NA,2),2,2,dimnames=list(NULL,c(a,b)))
 or
  data.frame(a=c(1,NA),b=c(NA,2))
 If, on the other hand, it insists on data.frames as input, it should err
out if give non-data-frames.
 -s

 On Thu, Jul 28, 2011 at 19:30, Hadley Wickham had...@rice.edu wrote:

 Use plyr::rbind.fill?   That does match up columns by name.
 Hadley

 On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu
wrote:
  I have a file of data where each line is a series of name-value pairs,
but
  where the names are not necessarily the same from line to line, e.g.
 a=1,b=2,d=5
 b=4,c=3,e=3
 a=5,d=1
  I would like to create a data frame which lines up the data in the
  corresponding columns.  In this case, this would be
 data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5,
NA,
  1), e = (NA, 3, 1) )
  One way I can think of doing this is to read in the data as one 'long'
data
  frame per line with a unique ID, e.g. line one becomes
   cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5)))
  then rbind all the lines and use the reshape package function 'cast'.
  Is there a more straightforward way?  (I'd have thought rbind would
line up
  columns by name, but it doesn't.)
  -s
 
  --
  You received this message because you are subscribed to the Google
Groups
  manipulatr group.
  To post to this group, send email to manipul...@googlegroups.com.
  To unsubscribe from this group, send email to
  manipulatr+unsubscr...@googlegroups.com 
manipulatr%2bunsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/manipulatr?hl=en.
 



 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/ http://had.co.nz/



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula used by R to compute the t-values in a linear regression

2011-08-01 Thread S Ellison

 

 -Original Message-
 [mailto:r-help-boun...@r-project.org] On Behalf Of Samuel Le
 Subject: [R] formula used by R to compute the t-values in a 
 linear regression
 I was wondering if someone knows the formula used by the 
 function lm to compute the t-values.

Typing 
summary.lm

I found the standard error and t calculation (for around line 58-62 of the 
resulting listing. 
resvar - rss/rdf
R - chol2inv(Qr$qr[p1, p1, drop = FALSE])
se - sqrt(diag(R) * resvar)
est - z$coefficients[Qr$pivot[p1]]
tval - est/se

You can also find (rather further up) that the degrees of freedom df used are 
taken directly from the linear model $df (z$df in the function). Others noted 
that incorrect df often cause problems, so checking that you're using the 
correct df is possible by inspecting the lm summary.

The standard errors are apparently (as is usual for a least squares problem, I 
think) taken from the diagonal of  the inverse of the hessian, multiplied by 
the residual variance. Unfortunately I could not get at the hessian calculation 
quite as easily (it looks like it uses a function that's not exported from 
stats) so that's left as an exercise in browsing source code ... 

S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula used by R to compute the t-values in a linear regression

2011-08-01 Thread Samuel Le

Exactly.
My formula holds only for k=1, this is how I generated it.

Do you have any references concerning the  rather more careful algorithms?

Thanks,
Samuel

-Original Message-
From: peter dalgaard [mailto:pda...@gmail.com]
Sent: 01 August 2011 14:45
To: Samuel Le
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] formula used by R to compute the t-values in a linear 
regression


On Aug 1, 2011, at 15:27 , Samuel Le wrote:

 Hello,



 I was wondering if someone knows the formula used by the function lm to 
 compute the t-values.



 I am trying to implement a linear regression myself. Assuming that I have K 
 variables, and N observations, the formula I am using is:

 For the k-th variable, t-value= b_k/sigma_k



 With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x 
 )^(-1) _kk is its standard deviation.



 I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))



 With sigma: the estimated standard deviation of the residuals,

 Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)



 With:

 N: number of observations

 K: number of variables



 This formula comes from my old course of econometrics.

 For some reason it doesn't match the t-value produced by R (I am off by about 
 1%). I can match the other results produced by R (coefficients of the 
 regression, r squared, etc.).



 I would be grateful if someone could provide some clarifications.


AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k 
involves matrix inversion. Also, even for K=1, beware that textbook formulas 
like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and 
easily loses multiple digits of precision, so software tends to use rather more 
careful algorithms.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
Døden skal tape! --- Nordahl Grieg









__ Information from ESET NOD32 Antivirus, version of virus signature 
database 6275 (20110707) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__ Information from ESET NOD32 Antivirus, version of virus signature 
database 6275 (20110707) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula used by R to compute the t-values in a linear regression

2011-08-01 Thread Samuel Le

Yes, that's what I was looking for.
Many thanks,

Samuel

-Original Message-
From: S Ellison [mailto:s.elli...@lgcgroup.com]
Sent: 01 August 2011 15:16
To: Samuel Le; r-h...@stat.math.ethz.ch
Subject: RE: formula used by R to compute the t-values in a linear regression

 -Original Message-
 [mailto:r-help-boun...@r-project.org] On Behalf Of Samuel Le
 Subject: [R] formula used by R to compute the t-values in a
 linear regression
 I was wondering if someone knows the formula used by the
 function lm to compute the t-values.

Typing
summary.lm

I found the standard error and t calculation (for around line 58-62 of the 
resulting listing.
resvar - rss/rdf
R - chol2inv(Qr$qr[p1, p1, drop = FALSE])
se - sqrt(diag(R) * resvar)
est - z$coefficients[Qr$pivot[p1]]
tval - est/se

You can also find (rather further up) that the degrees of freedom df used are 
taken directly from the linear model $df (z$df in the function). Others noted 
that incorrect df often cause problems, so checking that you're using the 
correct df is possible by inspecting the lm summary.

The standard errors are apparently (as is usual for a least squares problem, I 
think) taken from the diagonal of  the inverse of the hessian, multiplied by 
the residual variance. Unfortunately I could not get at the hessian calculation 
quite as easily (it looks like it uses a function that's not exported from 
stats) so that's left as an exercise in browsing source code ...

S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:25}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] General indexing in multidimensional arrays

2011-08-01 Thread Jannis


Dear R community,


I have a general question regarding indexing in multidiemensional arrays.

Imagine I have a three dimensional array and I only want to extract on 
vector along a single dimension from it:



data- array(rnorm(64),dim=c(4,4,4))

result  - data[1,1,]

If I want to extract more than one of these vectors, it would now really 
help me to supply a logical matrix of the size of the first two dimensions:



indices- matrix(FALSE,ncol=4,nrow=4)
indices[1,3]   - TRUE
indices[4,1]   - TRUE

result - data[indices,]

This, however would give me an error. I am used to this kind of indexing 
from Matlab and was wonderingt whether there exists an easy way to do 
this in R without supplying complicated index matrices of all three 
dimensions or logical vectors of the size of the whole matrix?


The only way I could imagine would be to:

result  - data[rep(as.vector(indices),times=4)]

but this seems rather complicated and also depends on the order of the 
dimensions I want to extract.



I do not want R to copy Matlabs behaviour, I am just wondering whether I 
missed one concept of indexing in R?




Thanks a lot
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb

2011-08-01 Thread Dimitris.Kapetanakis

Thanks a lot for the help. 

Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel Leopard
build 32-bit (5751)) but I think I can find access on windows 7 64-bit. What
I am trying to do is a maximization through grid search (because I am not
sure that any of the optim() methods works sufficiently to my case, at least
all of them provide quite different results), the reason that I want the
optimizing is because I want to use it for a Monte Carlo analysis for
Smoothed Maximum Score estimator, and for that reason I want the
optimization to be the most efficient possible, but given that I am kind of
amateur on R and on programming in general, I doubt that I can do that
sufficiently.

Thanks again for your help

Dimitris

--
View this message in context: 
http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with gam()

2011-08-01 Thread pjura


Dear group,
I experience some problems with gam() function after R update to version 2.13.1
The function in both gam and mgcv packages stopped to work. Before, with the 
same code I used, everything was fine.
The function from gam package yields following warning:

Residual degrees of freedom are negative or zero.  This occurs when the sum of 
the parametric and nonparametric degrees of freedom exceeds the number of 
observations.  The model is probably too complex for the amount of data 
available.

while gam() from mgcv crashes R.

Did I miss something?

Thank you in advance.

PJ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with gam() after R update

2011-08-01 Thread Przemek Jura


Dear group,
I experience some problems with gam() function after R update to version 2.13.1
The function in both gam and mgcv packages stopped to work. Before, with the 
same code I used, everything was fine.
The function from gam package yields following warning:

Residual degrees of freedom are negative or zero.  This occurs when the sum of 
the parametric and nonparametric degrees of freedom exceeds the number of 
observations.  The model is probably too complex for the amount of data 
available

while gam() from mgcv crashes R.

Did I miss something?

Thank you in advance.

PJ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting question

2011-08-01 Thread Andrew McCulloch

Hi,

I use R to draw my graphs. I have 100 points on a simple xy-plot. The points 
are 
distinguished by a third variable which is categorical with 10 levels. I have 
been plotting x against y and using gray scales to distinguish the level of the 
categorical variable for each point. It looks ok to me but a journal reviewer 
says this is not any use. I cannot afford to pay for colour prints. Any ideas 
on 
what is the best way to distinguish 10 groups on an xy scatter plot? 



If all else fails I can just remove the graph and give them a table of 
regression coefficients. 


Thanks.

Yours Sincerely
Andrew McCulloch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to colour specific edges in a dendrogram

2011-08-01 Thread Jan Teichmann

Dear Mailing-list

I used hclust to make a dendrogram of 2613 leafs. I also have a list
with the names of certain labels which are of interest and I would like
to visualize their appearance within the dendrogram. I found an example
how to use dendrapply to colour the labels but the problem is that with
2613 leafs I cannot plot the labels as it gets super messy.

I now tried to write a function using dendrapply() to colour the edges
of the leafs of interest red. Unfortunately, I fail writing this
function. Could someone help me out with the stub of a function
colouring edges?

I have the
  dendrogram
  list of labels to colour their edges

I would like to colour the edges between the final leaf node and their
parental node.

Thank you very much for your help!
Jan


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Impact of multiple imputation on correlations

2011-08-01 Thread lifty . gere

Dear all,

I have been attempting to use multiple imputation (MI) to handle missing data 
in my study. I use the mice package in R for this. The deeper I get into this 
process, the more I realize I first need to understand some basic concepts 
which I hope you can help me with.

For example, let us consider two arbitrary variables in my study that have the 
following missingness pattern:

Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%)
Variable 1 available, Variable 2 missing: 37 (31,3%)
Variable 1 missing, Variable 2 available: 10 (8,4%)
Variable 1 missing, Variable 2 missing: 20 (16,9%)

I am interested in the correlation between Variable 1 and Variable 2.

Q1. Does it even make sense for me to use MI (or anything else, really) to 
replace my missing data when such large fractions are not available?

Plot 1 (http://imgur.com/KFV9yCmV1sl) provides a scatter plot of these example 
variables in the original data. The correlation coefficient r = -0.34 and p = 
0.016.

Q2. I notice that correlations between variables in imputed data (pooled 
estimates over all imputations) are much lower and less significant than the 
correlations in the original data. For this example, the pooled estimates for 
the imputed data show r = -0.11 and p = 0.22.

Since this seems to happen in all the variable combinations that I have looked 
at, I would like to know if MI is known to have this behavior, or whether this 
is specific to my imputation. 

Q3. When going through the imputations, the distribution of the individual 
variables (min, max, mean, etc.) matches the original data. However, 
correlations and least-square line fits vary quite a bit from imputation to 
imputation (see Plot 2, http://imgur.com/KFV9ylCmV1s). Is this normal?

Q4. Since my results differ (quite significantly) between the original and 
imputed data, which one should I trust?

Thank you for your help in advance.
Tina
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-Forge] R 2.13.1 can't find package binaries on R-Forge

2011-08-01 Thread Stefan Theussl


Dear all,

this must have been a temporary problem. In this case I assume that the 
build cycle did not finish in time, i.e., binaries were synced to the 
staging area although not all were built.


best,
stefan


On 07/31/2011 05:52 PM, David Winsemius wrote:

On Jul 31, 2011, at 11:26 AM, Michael Friendly wrote:

   

[Env: Win XP]
I've just upgraded from R 2.12.2 to R 2.13.1.  As part of my upgrade
process, I typically install some in-development
packages from R-Forge that are not on cran.  But for the first time,
it
doesn't work.

e.g.,
 

install.packages(p3d, repos=http://R-Forge.R-project.org;)
   

trying URL
'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip'
Error in download.file(url, destfile, method, mode = wb, ...) :
   cannot open URL
'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip'
In addition: Warning message:
In download.file(url, destfile, method, mode = wb, ...) :
   cannot open: HTTP status was '404 Not Found'
Warning in download.packages(pkgs, destdir = tmpd, available =
available,  :
   download of package 'p3d' failed

The list of packages I install this way is:

special- c(p3d, patchDVI, spacemakeR, spida)
install.packages(special,repos=http://R-Forge.R-project.org;)


Is this just an R-Forge problem?
 

I'm not informed about the workings of r-forge, but did you notice
that there were no packages in that bin/windows directory whose
alphabetical collation would be after lowercase i. That seems to
suggest some sort of system error encountered before the next package
after ipreds was completed.

On the project page the binaries for windows are listed as offline.

https://r-forge.r-project.org/R/?group_id=431

I don't see any C modules in the source. Have you tried installing
from source?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] General indexing in multidimensional arrays

2011-08-01 Thread Duncan Murdoch


On 11-08-01 5:38 AM, Jannis wrote:

Dear R community,


I have a general question regarding indexing in multidiemensional arrays.

Imagine I have a three dimensional array and I only want to extract on
vector along a single dimension from it:


data- array(rnorm(64),dim=c(4,4,4))

result- data[1,1,]

If I want to extract more than one of these vectors, it would now really
help me to supply a logical matrix of the size of the first two dimensions:


indices- matrix(FALSE,ncol=4,nrow=4)
indices[1,3]- TRUE
indices[4,1]- TRUE

result- data[indices,]

This, however would give me an error. I am used to this kind of indexing
from Matlab and was wonderingt whether there exists an easy way to do
this in R without supplying complicated index matrices of all three
dimensions or logical vectors of the size of the whole matrix?

The only way I could imagine would be to:

result- data[rep(as.vector(indices),times=4)]

but this seems rather complicated and also depends on the order of the
dimensions I want to extract.


I do not want R to copy Matlabs behaviour, I am just wondering whether I
missed one concept of indexing in R?



Base R doesn't have anything like that as far as I know.  The closest is 
matrix indexing: you construct a 3 column matrix whose rows are the 
indices of each element you want to extract.


Possibly plyr or some other package has functions to do this.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error message jpeg62.dll missing

2011-08-01 Thread Rocky Hyacinth

Dear R-help

We are getting an error message `jpeg62.dll missing'.

We are running Windows 7 64-bit, from a Mac using Boot Camp.

Do you know of this error message, and can you give us help trying to
resolve the problem?

many thanks
Rocky

Rocky Hyacinth
Technician
Department of Archaeology
University of Sheffield
United Kingdom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting question

2011-08-01 Thread Duncan Murdoch


On 11-08-01 5:44 AM, Andrew McCulloch wrote:

Hi,

I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are
distinguished by a third variable which is categorical with 10 levels. I have
been plotting x against y and using gray scales to distinguish the level of the
categorical variable for each point. It looks ok to me but a journal reviewer
says this is not any use. I cannot afford to pay for colour prints. Any ideas on
what is the best way to distinguish 10 groups on an xy scatter plot?


Plot digits or letters or other symbols.

Duncan Murdoch





If all else fails I can just remove the graph and give them a table of
regression coefficients.


Thanks.

Yours Sincerely
Andrew McCulloch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting a sinus curve

2011-08-01 Thread Marianne.ZEYRINGER

Dear David and Hans- Werner,
Thank you very much for your help. I would like to compare now if a
polynomial or the sinus model fits better. How can I see R-squared or
the F- Statistic for the sinus regression, so as to be able to compare
it with the polynomial model?
Thanks a lot and have a nice evening.
Best, 
Mairanne

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Hans W Borchers
Sent: Friday, July 29, 2011 12:21 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] fitting a sinus curve

David Winsemius dwinsemius at comcast.net writes:

 
 
 On Jul 28, 2011, at 1:07 PM, Hans W Borchers wrote:
 
  maaariiianne marianne.zeyringer at ec.europa.eu writes:
 
  Dear R community!
  I am new to R and would be very grateful for any kind of help. I am

  a PhD student and need to fit a model to an electricity load 
  profile of a household (curve with two peaks). I was thinking of 
  looking if a polynomial of 4th order,  a sinus/cosinus combination 
  or a combination of 3 parabels fits the data best. I have problems 
  with the sinus/cosinus
  regression:

time - c(
0.00, 0.15,  0.30,  0.45, 1.00, 1.15, 1.30, 1.45, 2.00, 2.15, 2.30,
2.45, 3.00, 3.15, 3.30, 3.45, 4.00, 4.15, 4.30, 4.45, 5.00, 5.15, 5.30,
5.45, 6.00, 6.15, 6.30, 6.45, 7.00, 7.15, 7.30, 7.45, 8.00, 8.15, 8.30,
8.45, 9.00, 9.15, 9.30, 9.45, 10.00, 10.15, 10.30, 10.45, 11.00, 11.15,
11.30, 11.45, 12.00, 12.15, 12.30, 12.45, 13.00, 13.15, 13.30, 13.45,
14.00, 14.15, 14.30, 14.45, 15.00, 15.15, 15.30, 15.45, 16.00, 16.15,
16.30, 16.45, 17.00, 17.15, 17.30, 17.45, 18.00, 18.15, 18.30, 18.45,
19.00, 19.15, 19.30, 19.45, 20.00, 20.15, 20.30, 20.45, 21.00, 21.15,
21.30, 21.45, 22.00, 22.15, 22.30, 22.45, 23.00, 23.15, 23.30, 23.45) 

watt - c(
94.1, 70.8, 68.2, 65.9, 63.3, 59.5, 55, 50.5, 46.6, 43.9, 42.3, 41.4,
40.8, 40.3, 39.9, 39.5, 39.1, 38.8, 38.5, 38.3, 38.3, 38.5, 39.1, 40.3,
42.4, 45.6, 49.9, 55.3, 61.6, 68.9, 77.1, 86.1, 95.7, 105.8, 115.8,
124.9, 132.3, 137.6, 141.1, 143.3, 144.8, 146, 147.2, 148.4, 149.8,
151.5, 153.5, 156, 159, 162.4, 165.8, 168.4, 169.8, 169.4, 167.6, 164.8,
161.5, 158.1, 154.9, 151.8, 149, 146.5, 144.4, 142.7, 141.5, 140.9,
141.7, 144.9, 151.5, 161.9, 174.6, 187.4, 198.1, 205.2, 209.1, 211.1,
212.2, 213.2, 213, 210.4, 203.9, 192.9, 179, 164.4, 151.5, 141.9, 135.3,
131, 128.2, 126.1, 124.1, 121.6, 118.2, 113.4, 107.4, 100.8)

  df-data.frame(time,  watt)
  lmfit - lm(time ~ watt + cos(time) + sin(time),  data = df)
 
  Your regression formula does not make sense to me.
  You seem to expect a periodic function within 24 hours, and if not 
  it would still be possible to subtract the trend and then look at a 
  periodic solution.
  Applying a trigonometric regression results in the following
  approximations:

library(pracma)
plot(2*pi*time/24, watt, col=red)
ts  - seq(0, 2*pi, len = 100)
xs6 - trigApprox(ts, watt, 6)
xs8 - trigApprox(ts, watt, 8)
lines(ts, xs6, col=blue, lwd=2)
lines(ts, xs8, col=green, lwd=2)
grid()

  where as examples the trigonometric fits of degree 6 and 8 are used.
  I would not advise to use higher orders, even if the fit is not 
  perfect.
 
 Thank you ! That is a real gem of a worked example. Not only did it 
 introduce me to a useful package I was not familiar with, but there 
 was even a worked example in one of the help pages that might have 
 specifically answered the question about getting a 2nd(?) order trig 
 regression. If I understood the commentary on that page, this method 
 might also be appropriate for an irregular time series, whereas 
 trigApprox and trigPoly would not?


That's true. For the moment, the trigPoly() function works correctly
only with equidistant data between 0 and 2*pi.


 This is adapted from the trigPoly help page in Hans Werner's pracma
 package:


The error I made myself was to take the 'time' variable literally,
though obviously the numbers after the decimal point were meant as
minutes. Thus

  time - seq(0, 23.75, len = 96)

would be a better choice.
The rest in your adaptation is absolutely correct.

  A - cbind(1, cos(pi*time/24),   sin(pi*time/24),
cos(2*pi*time/24), sin(2*pi*time/24))
  (ab - qr.solve(A, watt))
  # [1] 127.29131 -26.88824 -10.06134 -36.22793 -38.56219
  ts - seq(0, pi, length.out = 100)
  xs - ab[1] + ab[2]*cos(ts)   + ab[3]*sin(ts)   +
ab[4]*cos(2*ts) + ab[5]*sin(2*ts)
  plot(pi*time/24, watt, col = red,
   xlim=c(0, pi), ylim=range(watt), main = Trigonometric
Regression)
  lines(ts, xs, col=blue)


 Hans:  I corrected the spelling of Trigonometric, but other than 
 that I may well have introduced other errors for which I would be 
 happy to be corrected. For instance, I'm unsure of the terminology 
 regarding the ordinality of this model. I'm also not sure if my pi/24 
 and 2*pi/24 factors were correct in normalizing the time scale, 
 although the prediction seemed sensible.


And yes, this curve is the best

Re: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb

On Aug 1, 2011, at 3:04 AM, Dimitris.Kapetanakis wrote:

Thanks a lot for the help.

Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel
Leopard
build 32-bit (5751)) but I think I can find access on windows 7 64-
bit.

I don't think that was what Holtman was advising. You just need more
available memory, no need to use Win7. The Mac platform has been 64-
bit capable longer than the Windoze OS, anyway. The way you get there
might be as simple as rebooting, not starting any other applications,
and re-running your code. Success depends upon how much addressable
memory you have, which you did not state. All of the stuff below is
immaterial to these considerations.

What
I am trying to do is a maximization through grid search (because I
am not
sure that any of the optim() methods works sufficiently to my case,
at least
all of them provide quite different results), the reason that I want
the

optimizing is because I want to use it for a Monte Carlo analysis for
Smoothed Maximum Score estimator, and for that reason I want the
optimization to be the most efficient possible, but given that I am
kind of

amateur on R and on programming in general, I doubt that I can do that
sufficiently.

Your code ran without problem on my Mac running Leopard using an R64
GUI session with 32 GB RAM (R.app GUI 1.41 (5866)).

str(G.search)
num [1:4000, 1:3] 1 1 1 1 1 1 1 1 1 1 ...

I have no idea whether it produced meaningful results, but a 120
million item matrix is not a problem with enough physical memory. It's
only around a Gig. Your error indicated a problem with allocating
915.5 Mb. That should be possible (although borderline) in 4GB Mac
running 32 bit R. (32 bit R is more memory efficient when working with
physical memory of 4 GB or less because the pointer size is smaller.)

--
david.

--
View this message in context:
http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting problems directional or rose plots

2011-08-01 Thread kitty

Hi again,

I have tried playing around with the code given to me by Alan and Jim, thank
you for the code but unfortunatelyI can't seem to get either of them to
work... Alans does not work with the sample data and Jims is giving the
error :

Error in radial.grid(labels = labels, label.pos = label.pos, radlab =
radlab,  :
  could not find function boxed.labels

I have also tried Rose plots in the (heR.Misc) library to to avail.

Sorry, does anyone know how to get the plots I need?

Thank you all for reading this and for your help

k.

On Tue, Jul 26, 2011 at 10:20 PM, kitty kitty.a1...@gmail.com wrote:

 Hi,

 I'm trying to get a plot that looks somewhat like the attached image
 (sketched in word).
 I think I need somthing called a rose diagram? but I can't get it to do
 what I want. I'm happy to use any library.

 Essentially, I want a circle with degree slices every 10 degrees with 0 at
 the top representing north, and
 'tick marks' around the outside in 10 degree increments to match the slices
 (so the slices need to be ofset by 5 degrees so the 0 degree slice actually
 faces north)
 I then want to be able to colour in the slices depending on the distance
 that the factor extends to; so for example the 9000 dist is the largest in
 the example so should fill the slice,
 a distance in this plot of 4500 would fill halfway up the slice.
 I also want to be able to specify the colour of each slice so that I can
 relate it back to the spatial correlograms I have.

 I have added some sample data below.

 Thank you for reading my post,
 All help is greatly appreciated,
 K

 sample data:

 #distance factor extends to
 dist-c(5000,7000,9000,4500,6000,500)

 #direction
 angle-c(0,10,20,30,40,50)

 #list of desired colour example, order corrisponds to associated
 angle/direction
 color.list-c('red','blue','green','yellow','pink','black')

 (my real data is from 0 to 350 degrees, and so I have corresponding
 distance and colour data for each 10 degree increment).




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] General indexing in multidimensional arrays



On Aug 1, 2011, at 10:50 AM, Duncan Murdoch wrote:


On 11-08-01 5:38 AM, Jannis wrote:

Dear R community,


I have a general question regarding indexing in multidiemensional  
arrays.


Imagine I have a three dimensional array and I only want to extract  
on

vector along a single dimension from it:


data- array(rnorm(64),dim=c(4,4,4))

result- data[1,1,]

If I want to extract more than one of these vectors, it would now  
really
help me to supply a logical matrix of the size of the first two  
dimensions:



indices- matrix(FALSE,ncol=4,nrow=4)
indices[1,3]- TRUE
indices[4,1]- TRUE

result- data[indices,]


Is this the right answer?

 result- which(indices, arr.ind=TRUE)
 result
 row col
[1,]   4   1
[2,]   1   3

 apply(result, 1, function(x) data[x[1], x[2], ])
[,1]   [,2]
[1,]  1.62880528  0.7781005
[2,] -0.08861725 -2.1791674
[3,]  0.78242531 -1.0352826
[4,]  1.40012118 -1.2541230

if so, it should be possible to encapsulate that behavior in a  
function.



--
David Winsemius, MD
West Hartford, CT



This, however would give me an error. I am used to this kind of  
indexing

from Matlab and was wonderingt whether there exists an easy way to do
this in R without supplying complicated index matrices of all three
dimensions or logical vectors of the size of the whole matrix?

The only way I could imagine would be to:

result- data[rep(as.vector(indices),times=4)]

but this seems rather complicated and also depends on the order of  
the

dimensions I want to extract.


I do not want R to copy Matlabs behaviour, I am just wondering  
whether I

missed one concept of indexing in R?



Base R doesn't have anything like that as far as I know. The closest  
is matrix indexing: you construct a 3 column matrix whose rows are  
the indices of each element you want to extract.




Possibly plyr or some other package has functions to do this.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize(stack data) a dataframe inducing names

2011-08-01 Thread Francesca

Dear Contributors
thanks for any help you can provide. I searched the threads
but I could not find any query that satisfied my needs.
This is my database:
 index time values
13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
13733  27974 DATA.Q211.SUM.Index05/10/11 1.45
13734  27984 DATA.Q211.SUM.Index06/01/11 1.22
13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
13747  28624 DATA.Q211.TDS.Index05/20/11 1.40
13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
13755  29272 DATA.Q211.UBS.Index05/03/11 1.48
13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
13775  31215 DATA.Q211.WF.Index 04/14/11 1.44
13776  31225 DATA.Q211.WF.Index 05/12/11 1.42
13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
13790  31875 DATA.Q211.WPC.Index04/08/11 1.42
13791  31883 DATA.Q211.WPC.Index05/10/11 1.43
13804  32515 DATA.Q211.XTB.Index04/29/11 1.50
13805  32525 DATA.Q211.XTB.Index05/30/11 1.40
13806  32532 DATA.Q211.XTB.Index06/28/11 1.43

I need to select only the rows of this database that correspond to each
of the first occurrences of the string represented in column
index. In the example shown I would like to obtain a new
data.frame which is

index time values
13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
13775  31215 DATA.Q211.WF.Index04/14/11 1.44
13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
13804  32515 DATA.Q211.XTB.Index04/29/11 1.50

As you can see, it is not the whole string to change,
rather a substring that is part of it. I want to select
only the first values related to the row that presents for the first time
the different part of the string(substring).
I know how to select rows according to a substring condition on the
index column, but I cannot use it here because the substring changes
and moreover the number of occurrences per substring is variable.

Thank you for any help you can provide.
Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting question

2011-08-01 Thread Gene Leynes

plot(1:10, pch=letters[1:10])

On Mon, Aug 1, 2011 at 4:44 AM, Andrew McCulloch amccu...@yahoo.co.ukwrote:

 Hi,

 I use R to draw my graphs. I have 100 points on a simple xy-plot. The
 points are
 distinguished by a third variable which is categorical with 10 levels. I
 have
 been plotting x against y and using gray scales to distinguish the level of
 the
 categorical variable for each point. It looks ok to me but a journal
 reviewer
 says this is not any use. I cannot afford to pay for colour prints. Any
 ideas on
 what is the best way to distinguish 10 groups on an xy scatter plot?



 If all else fails I can just remove the graph and give them a table of
 regression coefficients.


 Thanks.

 Yours Sincerely
 Andrew McCulloch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Write.table Question

2011-08-01 Thread Margaux Keller

Hi,

I'm trying to create an abbreviated data file from a larger version. I can
use the subset command to create a value for this data:

dat -subset(raw.data, select=c(SNP, Pvalue))

 head (dat)
 SNP Pvalue
1 rs11 0.6516
2 rs12 0.3311
3 rs13 0.5615

but when I try to write.table using:

write.table (dat, file = /path/to/my/data.txt, sep =  , col.names=NA)

I end up with a file that looks like this:

 SNP Pvalue
1 rs11 0.6516
2 rs12 0.3311
3 rs13 0.5615

when what I want is something that looks like this:

rs11   0.6516
rs12   0.3311
rs13   0.5615

What should I be including?

Thanks,
Margaux

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 5 arguments passed to .Internal(matrix) which requires 7

Hello,

I am having a problem with the function matrix. Specifically, when I pass
three arguments (two more being instantiated in the function), I get the
following error message:

Error in matrix(0, 30, 10) :
  5 arguments passed to .Internal(matrix) which requires 7


I looked into it, and someone has suggested that this may be the function
from an old version of R. I recently changed my source path from the lucid
version to the maverick version and installed all of the R packages I need
like so, but why would this change the matrix() function? Also, how does R
know that I passed five arguments (only three being given) if the matrix()
function is supposed to take seven arguments?

Thank you,

Robert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with gam() after R update



On Aug 1, 2011, at 5:01 AM, Przemek Jura wrote:


Dear group,
I experience some problems with gam() function after R update to  
version 2.13.1
The function in both gam and mgcv packages stopped to work. Before,  
with the same code I used, everything was fine.


Reports like this often turn out to be inaccurate because either the  
(not offered) code  was not the same or the (also not offered) data  
was different. Did you reinstall these packages? How? How many  
versions up was the update? sessionInfo()?



The function from gam package yields following warning:

Residual degrees of freedom are negative or zero.  This occurs when  
the sum of the parametric and nonparametric degrees of freedom  
exceeds the number of observations.  The model is probably too  
complex for the amount of data available


That certainly looks like an informative error message. What do you  
want us to do about it?




while gam() from mgcv crashes R.


A  report of a real crash should go to the package maintainer with a  
lot more detail than you have provided above.




Did I miss something?


Perhaps reading the Posting Guide?



Thank you in advance.

PJ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting question

IMHO:

On Mon, Aug 1, 2011 at 7:51 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 11-08-01 5:44 AM, Andrew McCulloch wrote:

 Hi,

 I use R to draw my graphs. I have 100 points on a simple xy-plot. The
 points are
 distinguished by a third variable which is categorical with 10 levels. I
 have
 been plotting x against y and using gray scales to distinguish the level
 of the
 categorical variable for each point. It looks ok to me but a journal
 reviewer
 says this is not any use. I cannot afford to pay for colour prints. Any
 ideas on
 what is the best way to distinguish 10 groups on an xy scatter plot?

 Plot digits or letters or other symbols.

 Duncan Murdoch

No, this does not work. See Cleveland's books (e.g. Visualizing
Data). 10 is too many symbols to constantly refer to a legend to keep
straight, and digits or letters do not allow you to readily perceive
the pattern. (Caveat: If most of the data are only 2 or 3 of the
symbols, then these can work).

I think the OP's idea of using gray scales was better. I would dispute
the reviewer and refer them to appropriate references. Alternatively,
thermometer plots (aka filled rectangle plots) would be best. Again,
Cleveland's books provide scientific justification rather than merely
the (possibly uninformed) aesthetic opinion of a reviewer. Presumably,
the journal editor would accept hard data and psychological research
in preference to opinions.




 If all else fails I can just remove the graph and give them a table of
 regression coefficients.

No. I think your attempt to use a graph is a much better way to go.
Try to resist poor practices such as just publishing summary
statistics.

Cheers,
Bert


 Thanks.

 Yours Sincerely
 Andrew McCulloch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] General indexing in multidimensional arrays

2011-08-01 Thread Gene Leynes

What do you think about this?

apply(data, 3, '[', indices)


On Mon, Aug 1, 2011 at 4:38 AM, Jannis bt_jan...@yahoo.de wrote:

 Dear R community,


 I have a general question regarding indexing in multidiemensional arrays.

 Imagine I have a three dimensional array and I only want to extract on
 vector along a single dimension from it:


 data- array(rnorm(64),dim=c(4,4,4))

 result  - data[1,1,]

 If I want to extract more than one of these vectors, it would now really
 help me to supply a logical matrix of the size of the first two dimensions:


 indices- matrix(FALSE,ncol=4,nrow=4)
 indices[1,3]   - TRUE
 indices[4,1]   - TRUE

 result - data[indices,]

 This, however would give me an error. I am used to this kind of indexing
 from Matlab and was wonderingt whether there exists an easy way to do this
 in R without supplying complicated index matrices of all three dimensions or
 logical vectors of the size of the whole matrix?

 The only way I could imagine would be to:

 result  - data[rep(as.vector(indices),**times=4)]

 but this seems rather complicated and also depends on the order of the
 dimensions I want to extract.


 I do not want R to copy Matlabs behaviour, I am just wondering whether I
 missed one concept of indexing in R?



 Thanks a lot
 Jannis

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Write.table Question

2011-08-01 Thread Ivan Calandra


Hi Margaux,

Check the row.names and col.names arguments of write.table.
See ?write.table

write.table (dat, file = /path/to/my/data.txt, sep =  , 
col.names=FALSE, row.names=FALSE)


HTH,
Ivan


Le 8/1/2011 17:18, Margaux Keller a écrit :

Hi,

I'm trying to create an abbreviated data file from a larger version. I can
use the subset command to create a value for this data:

dat-subset(raw.data, select=c(SNP, Pvalue))


head (dat)

  SNP Pvalue
1 rs11 0.6516
2 rs12 0.3311
3 rs13 0.5615

but when I try to write.table using:

write.table (dat, file = /path/to/my/data.txt, sep =  , col.names=NA)

I end up with a file that looks like this:

 SNP Pvalue
1 rs11 0.6516
2 rs12 0.3311
3 rs13 0.5615

when what I want is something that looks like this:

rs11   0.6516
rs12   0.3311
rs13   0.5615

What should I be including?

Thanks,
Margaux

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Dept. Mammalogy
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make a nomogam and Calibration plot

2011-08-01 Thread Frank Harrell

Kindly do not attach questions in a separate document.

Install and read the documentation for the R rms package, and see handouts
at http://biostat.mc.vanderbilt.edu/rms

Frank

sytangping wrote:

Dear R users,

I am a new R user and something stops me when I try to write a academic
article. I want to make a nomogram to predict the risk of prostate cancer
(PCa) using several factors which have been selected from the Logistic
regression run under the SPSS. Always, a calibration plot is needed to
validate the prediction accuracy of the nomogram.
However, I tried many times and read a lot of posts with respect to this
topic but I still couldn't figure out how to draw the nomogram and the
calibration plot. My dataset and questions in detail are shown in two
attached files. It will be very grateful if someone can save his/her time
to help for my questions.

Warmest regards!

Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls
Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc
R_help.doc

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710126.html
Sent from the R help mailing list archive at Nabble.com.

[R] possible reason for merge not working

2011-08-01 Thread world peace

Hi Guys,

working on a merge for 2 data frames.

Using the command:

x - merge(annotatedData, UCSCgenes, by.x=names,
by.y=Ensembl.Gene.ID, all.x=TRUE)

names and Ensembl.Gene.ID are columns with similar elements from the x
and y data frames.

annotatedData has 8909 entries, so has x(as expected). x has columns
for UCSCgenes, but there is no data in them, all n/a, as if no match
exists.
This is not true as I can manually see and find many similarities
between the names and UCSCgenes columns.

I am wondering if there is any syntax error, or logical.

comments appreciated.

Thanks
Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to make a nomogam and Calibration plot

2011-08-01 Thread sytangping

Dear R users,

I am a new R user and something stops me when I try to write a academic
article. I want to make a nomogram to predict the risk of prostate cancer
(PCa) using several factors which have been selected from the Logistic
regression run under the SPSS. Always, a calibration plot is needed to
validate the prediction accuracy of the nomogram.
However, I tried many times and read a lot of posts with respect to this
topic but I still couldn't figure out how to draw the nomogram and the
calibration plot. My dataset and questions in detail are shown in two
attached files. It will be very grateful if someone can save his/her time to
help for my questions.

Warmest regards!

Ping Tang   http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls
Dataset.xls  http://r.789695.n4.nabble.com/file/n3710068/R_help.doc
R_help.doc 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize(stack data) a dataframe inducing names

Try this:  had to add extra names to your data since it was not clear
how it was organized.  Next time use 'dput' to enclose data.

 x - read.table(textConnection( index time  key date   values
+ 13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
+ 13733  27974 DATA.Q211.SUM.Index05/10/11 1.45
+ 13734  27984 DATA.Q211.SUM.Index06/01/11 1.22
+ 13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
+ 13747  28624 DATA.Q211.TDS.Index05/20/11 1.40
+ 13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
+ 13755  29272 DATA.Q211.UBS.Index05/03/11 1.48
+ 13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
+ 13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
+ 13775  31215 DATA.Q211.WF.Index 04/14/11 1.44
+ 13776  31225 DATA.Q211.WF.Index 05/12/11 1.42
+ 13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
+ 13790  31875 DATA.Q211.WPC.Index04/08/11 1.42
+ 13791  31883 DATA.Q211.WPC.Index05/10/11 1.43
+ 13804  32515 DATA.Q211.XTB.Index04/29/11 1.50
+ 13805  32525 DATA.Q211.XTB.Index05/30/11 1.40
+ 13806  32532 DATA.Q211.XTB.Index06/28/11 1.43)
+ , header = TRUE
+ , as.is = TRUE
+ )
 closeAllConnections()
 x
   index  time key date values
1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
2  13733 27974 DATA.Q211.SUM.Index 05/10/11   1.45
3  13734 27984 DATA.Q211.SUM.Index 06/01/11   1.22
4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
5  13747 28624 DATA.Q211.TDS.Index 05/20/11   1.40
6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
7  13755 29272 DATA.Q211.UBS.Index 05/03/11   1.48
8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
11 13776 31225  DATA.Q211.WF.Index 05/12/11   1.42
12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
13 13790 31875 DATA.Q211.WPC.Index 04/08/11   1.42
14 13791 31883 DATA.Q211.WPC.Index 05/10/11   1.43
15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50
16 13805 32525 DATA.Q211.XTB.Index 05/30/11   1.40
17 13806 32532 DATA.Q211.XTB.Index 06/28/11   1.43
 # get index of first occurance of 'key' column
 indx - !duplicated(x$key)
 x[indx,]
   index  time key date values
1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50





On Mon, Aug 1, 2011 at 11:13 AM, Francesca francesca.panco...@gmail.com wrote:
 Dear Contributors
 thanks for any help you can provide. I searched the threads
 but I could not find any query that satisfied my needs.
 This is my database:
  index time         values
 13732  27965 DATA.Q211.SUM.Index    04/08/11         1.42
 13733  27974 DATA.Q211.SUM.Index    05/10/11         1.45
 13734  27984 DATA.Q211.SUM.Index    06/01/11         1.22
 13746  28615 DATA.Q211.TDS.Index    04/07/11         1.35
 13747  28624 DATA.Q211.TDS.Index    05/20/11         1.40
 13754  29262 DATA.Q211.UBS.Index    05/02/11         1.30
 13755  29272 DATA.Q211.UBS.Index    05/03/11         1.48
 13761  29915 DATA.Q211.UCM.Index    04/28/11         1.43
 13768  30565 DATA.Q211.VDE.Index    05/02/11         1.48
 13775  31215 DATA.Q211.WF.Index     04/14/11         1.44
 13776  31225 DATA.Q211.WF.Index     05/12/11         1.42
 13789  31865 DATA.Q211.WPC.Index    04/01/11         1.40
 13790  31875 DATA.Q211.WPC.Index    04/08/11         1.42
 13791  31883 DATA.Q211.WPC.Index    05/10/11         1.43
 13804  32515 DATA.Q211.XTB.Index    04/29/11         1.50
 13805  32525 DATA.Q211.XTB.Index    05/30/11         1.40
 13806  32532 DATA.Q211.XTB.Index    06/28/11         1.43

 I need to select only the rows of this database that correspond to each
 of the first occurrences of the string represented in column
 index. In the example shown I would like to obtain a new
 data.frame which is

 index time         values
 13732  27965 DATA.Q211.SUM.Index    04/08/11         1.42
 13746  28615 DATA.Q211.TDS.Index    04/07/11         1.35
 13754  29262 DATA.Q211.UBS.Index    05/02/11         1.30
 13761  29915 DATA.Q211.UCM.Index    04/28/11         1.43
 13768  30565 DATA.Q211.VDE.Index    05/02/11         1.48
 13775  31215 DATA.Q211.WF.Index    04/14/11         1.44
 13789  31865 DATA.Q211.WPC.Index    04/01/11         1.40
 13804  32515 DATA.Q211.XTB.Index    04/29/11         1.50

 As you can see, it is not the whole string to change,
 rather a substring that is part of it. I want to select
 only the first values related to the row that presents for the first time
 the different part of the string(substring).
 I know how to select

Re: [R] possible reason for merge not working

What you see and what the data really is may be two different
things.  You should have at least enclosed an 'str' of the two data
frames; even better would be a subset of the data using 'dput'.  Most
likely your problem is that your data is not what you 'expect' it to
be.

On Mon, Aug 1, 2011 at 12:17 PM, world peace buysellrentof...@gmail.com wrote:
 Hi Guys,

 working on a merge for 2 data frames.

 Using the command:

 x - merge(annotatedData, UCSCgenes, by.x=names,
 by.y=Ensembl.Gene.ID, all.x=TRUE)

 names and Ensembl.Gene.ID are columns with similar elements from the x
 and y data frames.

 annotatedData has 8909 entries, so has x(as expected). x has columns
 for UCSCgenes, but there is no data in them, all n/a, as if no match
 exists.
 This is not true as I can manually see and find many similarities
 between the names and UCSCgenes columns.

 I am wondering if there is any syntax error, or logical.

 comments appreciated.

 Thanks
 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] possible reason for merge not working

Dan,

If the variables you are merging by are character variables, there may be 
subtle differences that you haven't noticed, e.g., capitalization or 
spacing.  You can look for differences by listing off the unique values:

table(c(annotatedData$names, UCSCgenes$Ensembl.Gene.ID))

Jean


`·.,,  (((º   `·.,,  (((º   `·.,,  (((º

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA




From:
world peace buysellrentof...@gmail.com
To:
r-help@r-project.org
Date:
08/01/2011 11:24 AM
Subject:
[R] possible reason for merge not working
Sent by:
r-help-boun...@r-project.org



Hi Guys,

working on a merge for 2 data frames.

Using the command:

x - merge(annotatedData, UCSCgenes, by.x=names,
by.y=Ensembl.Gene.ID, all.x=TRUE)

names and Ensembl.Gene.ID are columns with similar elements from the x
and y data frames.

annotatedData has 8909 entries, so has x(as expected). x has columns
for UCSCgenes, but there is no data in them, all n/a, as if no match
exists.
This is not true as I can manually see and find many similarities
between the names and UCSCgenes columns.

I am wondering if there is any syntax error, or logical.

comments appreciated.

Thanks
Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] possible reason for merge not working

2011-08-01 Thread R. Michael Weylandt michael.weyla...@gmail.com



On Aug 1, 2011, at 12:17 PM, world peace wrote:


Hi Guys,

working on a merge for 2 data frames.

Using the command:

x - merge(annotatedData, UCSCgenes, by.x=names,
by.y=Ensembl.Gene.ID, all.x=TRUE)

names and Ensembl.Gene.ID are columns with similar elements from the x
and y data frames.

annotatedData has 8909 entries, so has x(as expected). x has columns
for UCSCgenes, but there is no data in them, all n/a, as if no match
exists.
This is not true as I can manually see and find many similarities


The merge function does not work on similarities. Matches need to be  
exact.



between the names and UCSCgenes columns.

I am wondering if there is any syntax error, or logical.


Probably logical.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is R the right choice for simulating first passage times of random walks?

I've only got a 20 minute layover, but three quick remarks:

1) Do a sanity check on your data size: if you want a million walks of a 
thousand steps, that already gets you to a billion integers to store--even at a 
very low bound of one byte each, thats already 1GB for the data and you still 
have to process it all and run the OS. If you bump this to walks of length 10k, 
you are in big trouble. 

Considered like that, it shouldn't surprise you that you are getting near 
memory limits. 

If you really do need such a large simulation and are willing to make the 
time/space tradeoff, it may be worth doing simulations in smaller batches (say 
50-100) and aggregating the needed stats for analysis. Also, consider direct 
use of the rm() function for memory management. 

2) If you know that which.max()==1 can't happen for your data, might this trick 
be easier than forcing it through some tricky logic inside the which.max()

X=which.max(...)
if(X[1]==1) X=Inf # or whatever value

3) I dont have any texts at hand to confirm this but isn't the expected value 
of the first hit time of a RW infinite? I think a  handwaving proof can be 
squeezed out of the optional stopping theorem with T=min(T_a,T_b) for a0b and 
let a - -Inf. 

If I remember right, this suggests you are trying to calculate a CI for a 
distribution with no finite moments, a difficult task to say the least. 

Hope these help and I'll write a more detailed reply to your notes below later,

Michael Weylandt

PS - what's an iterated RW? This is all outside my field (hence my spitball on 
#2 above)

PS2 - sorry about the row/column mix-up: I usually think of sample paths as 
rows...

On Aug 1, 2011, at 8:49 AM, Paul Menzel paulepan...@users.sourceforge.net 
wrote:

 Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
 Glad to help -- I haven't taken a look at Dennis' solution (which may be far
 better than mine), but if you do want to keep going down the path outlined
 below you might consider the following:
 
 I will try Dennis’ solution right away but looked at your suggestions
 first. Thank you very much.
 
 Instead of throwing away a simulation if something starts negative, why not
 just multiply the entire sample by -1: that lets you still use the sample
 and saves you some computations: of course you'll have to remember to adjust
 your final results accordingly.
 
 That is a nice suggestion. For a symmetric random walk this is indeed
 possible and equivalent to looking when the walk first hits zero.
 
 This might avoid the loop:
 
 x = ## Whatever x is.
 xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
 which.max((x=0)  (xLag 0)) + 1 # Depending on how you've decided to count
 things, this +1 may be extraneous.
 
 The inner expression sets a 0 except where there is a switch from negative
 to positive and a one there: the which.max function returns the location of
 the first maximum, which is the first 1, in the vector. If you are
 guaranteed the run starts negative, then the location of the first positive
 should give you the length of the negative run.
 
 That is the same idea as from Bill [1]. The problem is, when the walk
 never returns to zero in a sample, `which.max(»everything FALSE)`
 returns 1 [2]. That is no problem though, when we do not have to worry
 about a walk starting with a positive value and adding 1 (+1) can be
 omitted when we count the epochs of first hitting 0 instead of the time
 of how long the walk stayed negative, which is always one less.
 
 Additionally my check `(x=0)  (xLag 0)` is redundant when we know we
 start with a negative value. `(x=0)` should be good enough in this
 case.
 
 This all gives you,
 
 f4 - function(n = 10, # number of simulations
   length = 10) # length of iterated sum
 {
   R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
 
   R = apply(R,1,cumsum)
 
  R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first 
 element in the row is positive, flip the entire row
 
 The line above seems to look the columns instead of rows. I think the
 following is correct since after the `apply()` above the random walks
 are in the columns.
 
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]
 
   fTemp - function(x) {
 
xLag = c(0,x[-length(x)])
return(which.max((x=0)  (xLag 0))+1)
 
   countNegative = apply(R,2,fTemp)
   tabulate(as.vector(countNegative), length)
 }
 
 That just crashed my computer though, so I wouldn't recommend it for large
 n,length.
 
 Welcome to my world. I would have never thought that simulating random
 walks with a length of say a million would create that much data and
 push common desktop systems with let us say 4 GB of RAM to their limits.
 
 Instead, you can help a little by combining the lagging and the 
 all in one.
 
 f4 - function(n = 10, llength = 10)
 {
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)

Re: [R] possible reason for merge not working

2011-08-01 Thread world peace

the answer was indeed in subtle differences, and 'str' did help.
Problem is solved.
Thanks everybody for comments which was all very useful.

Best,


On Mon, Aug 1, 2011 at 12:25 PM, jim holtman jholt...@gmail.com wrote:
 What you see and what the data really is may be two different
 things.  You should have at least enclosed an 'str' of the two data
 frames; even better would be a subset of the data using 'dput'.  Most
 likely your problem is that your data is not what you 'expect' it to
 be.

 On Mon, Aug 1, 2011 at 12:17 PM, world peace buysellrentof...@gmail.com 
 wrote:
 Hi Guys,

 working on a merge for 2 data frames.

 Using the command:

 x - merge(annotatedData, UCSCgenes, by.x=names,
 by.y=Ensembl.Gene.ID, all.x=TRUE)

 names and Ensembl.Gene.ID are columns with similar elements from the x
 and y data frames.

 annotatedData has 8909 entries, so has x(as expected). x has columns
 for UCSCgenes, but there is no data in them, all n/a, as if no match
 exists.
 This is not true as I can manually see and find many similarities
 between the names and UCSCgenes columns.

 I am wondering if there is any syntax error, or logical.

 comments appreciated.

 Thanks
 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Impact of multiple imputation on correlations

2011-08-01 Thread Joshua Wiley

Hi Tina,

That is quite a bit of missingness, especially considering the sample
size is not large to begin with.  This would make me treat *any*
result cautiously.  That said, if you have a reasonable idea what the
mechanism causing the missingness is or if from additional variables
in your study, you can model the missing data mechanism sufficiently
that you are confident (for some definition of confident) that the
missingness is random after accounting for your model (conditional
independence, I forget if Rubin calls it MCAR or MAR), you are in a
reasonable place to use MI and draw inferences from the results.

Even if you are uncertain about this, it is *not* any better to just
say, well there was too much missing data for me to feel safe using
MI so here is the correlation based just on the observed data.  That
_will be biased_ unless the missing data mechanism is completely
random (even unconditioned on anything else in your study; for example
if participants flipped coins to decide which questions to respond
to).

When averaging correlations, it is conventional to average the inverse
hyperbolic function of the correlations and then use the hyperbolic
function to transform the averaged value back to the original units
(also known as Fisher's Z transformation).  The mice package may do
this automatically if there is a functiong to compute pooled
correlations.

How results between simply deleted cases with any value unobserved and
using MI varies.  There may be no difference, are larger difference,
or a smaller difference.

Looking at the scatter plot matrix from the different imputations, I
do not know that I would actually classify that as varying quite a
bit.  I realize the sign of the slope changes some, but that is not
too surprising because all of them are somewhat close to flat.  You
can compare the between imputation variance to the within imputation
variance (I think mice gives you this information).

I partly addressed your last question at the beginning---I would
certainly not trust the correlation obtained simply by deleting
missingness, but I also would not trust the result obtained using MI
unless it was well setup.  Although you have shown us some of the
data, you have not mentioned how you modelled the missingness.  This
can have a substantial impact on your results (and also their
trustworthyness).  mice provides a number of different models and you
have a choice in what variables you use if you collect a lot in your
study.

Given all of this, I would suggest finding a local statistician or
consultant to talk with about this.  Your question(s) are more
statistical than they are R related.  Also, in addition to learning
more about MI (there are several good books and articles on it that
you can look up or email me offlist and I can provide references if
you want), someone who is there can be more helpful because they will
have access to your whole dataset and can work with you to find the
best variables/model to model the missing data mechanism.

I hope this helps and good luck,

Josh


On Mon, Aug 1, 2011 at 12:03 AM,  lifty.g...@gmx.de wrote:
 Dear all,

 I have been attempting to use multiple imputation (MI) to handle missing data 
 in my study. I use the mice package in R for this. The deeper I get into this 
 process, the more I realize I first need to understand some basic concepts 
 which I hope you can help me with.

 For example, let us consider two arbitrary variables in my study that have 
 the following missingness pattern:

 Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%)
 Variable 1 available, Variable 2 missing: 37 (31,3%)
 Variable 1 missing, Variable 2 available: 10 (8,4%)
 Variable 1 missing, Variable 2 missing: 20 (16,9%)

 I am interested in the correlation between Variable 1 and Variable 2.

 Q1. Does it even make sense for me to use MI (or anything else, really) to 
 replace my missing data when such large fractions are not available?

 Plot 1 (http://imgur.com/KFV9yCmV1sl) provides a scatter plot of these 
 example variables in the original data. The correlation coefficient r = -0.34 
 and p = 0.016.

 Q2. I notice that correlations between variables in imputed data (pooled 
 estimates over all imputations) are much lower and less significant than the 
 correlations in the original data. For this example, the pooled estimates for 
 the imputed data show r = -0.11 and p = 0.22.

 Since this seems to happen in all the variable combinations that I have 
 looked at, I would like to know if MI is known to have this behavior, or 
 whether this is specific to my imputation.

 Q3. When going through the imputations, the distribution of the individual 
 variables (min, max, mean, etc.) matches the original data. However, 
 correlations and least-square line fits vary quite a bit from imputation to 
 imputation (see Plot 2, http://imgur.com/KFV9ylCmV1s). Is this normal?

 Q4. Since my results differ (quite significantly) between the original and

[R] 5 arguments passed to .Internal(matrix) which requires 7

Hello,

I am having a problem with the function matrix. Specifically, when I pass
three arguments (two more being instantiated in the function), I get the
following error message:

Error in matrix(0, 30, 10) :
  5 arguments passed to .Internal(matrix) which requires 7


I looked into it, and someone has suggested that this may be the function
from an old version of R. I recently changed my source path from the lucid
version to the maverick version and installed all of the R packages I need
like so, but why would this change the matrix() function? Also, how does R
know that I passed five arguments (only three being given) if the matrix()
function is supposed to take seven arguments?

Thank you,

Robert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Inserting column in between

2011-08-01 Thread Bansal, Vikas

Dear all,

I have a very simple question.I have data frame of 50 columns and i want to 
insert a column in 30th position.But i do not want to delete that column.Is it 
possible to include a column in between, so that new values are in 30th column 
and 30 th column is now 31st and 31st is 32nd..so on and 50th column is 
51st..?I will be very thankful to you.


Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 5 arguments passed to .Internal(matrix) which requires 7

Robert,

What code did you run to get that error?

Do you get the error if the only code that you run is ...
 matrix(0, 30, 10)

You gave three arguments to matrix, which requires none, but can take up 
to five.
In the function matrix there is a call to .Internal(matrix) which requires 
7 arguments.
See ...
 matrix

Jean


`·.,,  (((º   `·.,,  (((º   `·.,,  (((º

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA





From:
Robert Pfister rw...@virginia.edu
To:
r-help@r-project.org
Date:
08/01/2011 11:56 AM
Subject:
[R]  5 arguments passed to .Internal(matrix) which requires 7
Sent by:
r-help-boun...@r-project.org



Hello,

I am having a problem with the function matrix. Specifically, when I pass
three arguments (two more being instantiated in the function), I get the
following error message:

Error in matrix(0, 30, 10) :
  5 arguments passed to .Internal(matrix) which requires 7


I looked into it, and someone has suggested that this may be the function
from an old version of R. I recently changed my source path from the lucid
version to the maverick version and installed all of the R packages I need
like so, but why would this change the matrix() function? Also, how does R
know that I passed five arguments (only three being given) if the matrix()
function is supposed to take seven arguments?

Thank you,

Robert

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 5 arguments passed to .Internal(matrix) which requires 7

2011-08-01 Thread Jeff Newmiller

Y'know, you aren't likely to get many responses with this kind of request. Why 
don't you go read the posting guidelines and come back with:

R version info
Sample data
Actual commands used, so we can reproduce the problem
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Robert Pfister rw...@virginia.edu wrote:

Hello,

I am having a problem with the function matrix. Specifically, when I pass
three arguments (two more being instantiated in the function), I get the
following error message:

Error in matrix(0, 30, 10) :
5 arguments passed to .Internal(matrix) which requires 7


I looked into it, and someone has suggested that this may be the function
from an old version of R. I recently changed my source path from the lucid
version to the maverick version and installed all of the R packages I need
like so, but why would this change the matrix() function? Also, how does R
know that I passed five arguments (only three being given) if the matrix()
function is supposed to take seven arguments?

Thank you,

Robert

[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Limited number of principal components in PCA

Providing the data will help, but the first thing I noted is that you have more 
columns (variables) than rows (cases). PCA will return a maximum of (the number 
of columns) or (the number of rows-1) whichever is less. With 84 columns and 66 
rows means you can get no more than 65 components. If the variables are highly 
correlated, you will get fewer components and that probably explains the 
reduction to 54. I would guess the variables are highly correlated and the 
first eigenvalue is very large.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Joshua Wiley
Sent: Friday, July 29, 2011 10:20 PM
To: William Armstrong
Cc: r-help@r-project.org
Subject: Re: [R] Limited number of principal components in PCA

Hi Billy,

Can you provide your data?  You could attach it as a text file or
provide it by pasting the output of:

dput(Q)

into an email.  It would help if we could reproduce what you are
doing.  You might also consider a list or forum that is more
statistics oriented than Rhelp, as your questions are more related to
the statistics than the software itself (but still, if you give us
data, you will probably get farther).

Cheers,

Josh

On Fri, Jul 29, 2011 at 11:33 AM, William Armstrong
william.armstr...@noaa.gov wrote:
 Hi all,

 I am attempting to run PCA on a matrix (nrow=66, ncol=84) using 'prcomp'
 (stats package).  My data (referred to as 'Q' in the code below) are
 separate river streamflow gaging stations (columns) and peak instantaneous
 discharge (rows).  I am attempting to use PCA to identify regions of that
 vary together.

 I am entering the following command:

 test_pca_Q-prcomp(~.,data=Q,scale.=TRUE,retx=FALSE,na.action=na.omit)

 It is outputting 54 'standard deviation' numbers (which are the
 sqrt(eigenvalues) in respect to a certain PC, am I correct?), and 54
 'rotation' numbers, which are the variable loadings with respect to a given
 PC.

 I have two questions:

 1.) Why is it only outputting 54 PCs and standard deviations?  If I have 84
 variables isn't the maximum number of PCs I can create 84 as well?

 2.) Can I now use the 'rotation' values to find clusters of gages that I
 acting together, or is there another step I must take?

 Thank you very much for your insight.

 Billy


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3704956.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between

x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want to 
 insert a column in 30th position.But i do not want to delete that column.Is 
 it possible to include a column in between, so that new values are in 30th 
 column and 30 th column is now 31st and 31st is 32nd..so on and 50th 
 column is 51st..?I will be very thankful to you.





-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between

Doesn't work -- you lose column names.

Try this instead:

yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

Adjust column names after via:

names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

Cheers,
Bert

On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want to 
 insert a column in 30th position.But i do not want to delete that column.Is 
 it possible to include a column in between, so that new values are in 30th 
 column and 30 th column is now 31st and 31st is 32nd..so on and 50th 
 column is 51st..?I will be very thankful to you.





 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting problems directional or rose plots

Searching R Graphical Manual (http://www.oga-lab.net/RGM2/, mirror
http://www.oga-lab.net/RGM2/) shows possible candidates in packages circular
(windrose), IDPmisc (plot.rose), climatol (rosavent), openair (windRose),
and oce (as.windrose).

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of kitty
Sent: Monday, August 01, 2011 10:39 AM
To: r-help@r-project.org
Subject: Re: [R] Plotting problems directional or rose plots

Hi again,

I have tried playing around with the code given to me by Alan and Jim, thank
you for the code but unfortunatelyI can't seem to get either of them to
work... Alans does not work with the sample data and Jims is giving the
error :

Error in radial.grid(labels = labels, label.pos = label.pos, radlab =
radlab,  :
  could not find function boxed.labels

I have also tried Rose plots in the (heR.Misc) library to to avail.

Sorry, does anyone know how to get the plots I need?

Thank you all for reading this and for your help

k.

On Tue, Jul 26, 2011 at 10:20 PM, kitty kitty.a1...@gmail.com wrote:

 Hi,

 I'm trying to get a plot that looks somewhat like the attached image
 (sketched in word).
 I think I need somthing called a rose diagram? but I can't get it to do
 what I want. I'm happy to use any library.

 Essentially, I want a circle with degree slices every 10 degrees with 0 at
 the top representing north, and
 'tick marks' around the outside in 10 degree increments to match the
slices
 (so the slices need to be ofset by 5 degrees so the 0 degree slice
actually
 faces north)
 I then want to be able to colour in the slices depending on the distance
 that the factor extends to; so for example the 9000 dist is the largest in
 the example so should fill the slice,
 a distance in this plot of 4500 would fill halfway up the slice.
 I also want to be able to specify the colour of each slice so that I can
 relate it back to the spatial correlograms I have.

 I have added some sample data below.

 Thank you for reading my post,
 All help is greatly appreciated,
 K

 sample data:

 #distance factor extends to
 dist-c(5000,7000,9000,4500,6000,500)

 #direction
 angle-c(0,10,20,30,40,50)

 #list of desired colour example, order corrisponds to associated
 angle/direction
 color.list-c('red','blue','green','yellow','pink','black')

 (my real data is from 0 to 350 degrees, and so I have corresponding
 distance and colour data for each 10 degree increment).




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between -- better way?

Folks:

I consider my reply below rather clumsy: One has to keep track of
index numbers other than that which is inserted and must separately
change column names. Is there as essentially better way to do this,
either via base R or via an R package. I leave it to you to define
essentially better.

Thanks.

Cheers,
Bert

On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote:
 Doesn't work -- you lose column names.

 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk 
 wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want to 
 insert a column in 30th position.But i do not want to delete that column.Is 
 it possible to include a column in between, so that new values are in 30th 
 column and 30 th column is now 31st and 31st is 32nd..so on and 50th 
 column is 51st..?I will be very thankful to you.





 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Men by nature long to get on to the ultimate truths, and will often
 be impatient with elementary studies or fight shy of them. If it were
 possible to reach the ultimate truths without the elementary studies
 usually prefixed to them, these would not be preparatory studies but
 superfluous diversions.

 -- Maimonides (1135-1204)

 Bert Gunter
 Genentech Nonclinical Biostatistics




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing the index of factor in by() function

2011-08-01 Thread Merik Nanish

Since I didn't get an answer to this question, I'm rephrasing my question in
simpler terms:

I have  a dataframe and I want to split it based on the levels of one of its
columns, and apply a function to each section of the data. Output of the
function may be drawing a plot, returning  a value, whatever. I want to do
it efficiently though (for loops are very slow).

How can I do that?

M

On Tue, Jul 26, 2011 at 10:12 AM, Ista Zahn iz...@psych.rochester.eduwrote:

 Hi Merik,
 Please keep the mailing list copied.

 On Tue, Jul 26, 2011 at 6:44 AM, Merik Nanish merik.nan...@gmail.com
 wrote:
  You can convert my data into a dataframe simply by dat - data.frame(id,
  month, value). That doesn't help though.

 Can you be more specific? What is the problem you are having?

 And no, that's not what I'm looking
  for. What I intend to do is for by to loop through the data based on
 levels
  of id factor (1,2, and 3), and for each level, for my function to
 printout
  the values of value and month belonging to the section of data with
 that
  id.

 OK, easy enough:

 dat.tmp - data.frame(id, month, value)
 my.plot - function(dat) {print(dat[, c(id, value)])}
 by(dat.tmp, id, my.plot)

  Right now, I achieve this with a for loop but I want to avoid looping in
 the
  data as much as possible.

 Why? What do you have against loops?

 Best,
 Ista

 
  On Tue, Jul 26, 2011 at 12:18 AM, Ista Zahn iz...@psych.rochester.edu
  wrote:
 
  Hi Merik,
  by() works most easily with data.frames. Is this what you are after?
 
  my.plot - function(dat) { print(dat$value);
  print(dat$month[dat$id==dat$value]) }
  by(dat.tmp, id, my.plot)
 
  Best,
  Ista
 
  On Mon, Jul 25, 2011 at 9:19 PM, Merik Nanish merik.nan...@gmail.com
  wrote:
   Hello,
  
   Here are three vectors to give context to my question below:
  
   *id- c(1,1,1,1,1,2,2,2,3,3,3))
   month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5)
   value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)*
  
   and I want to plot value over month separately for each id.
 Before
   I
   can do that, I need to section both month and value, based on ID. I
   create a
   my.plot function like this (at this point, it doesn't draw any plots,
 it
   is
   just an effort to help my understand what I'm doing):
  
   *my.plot - function(y) { print(y); print(month[id==y]) }*
  
   Now, I tried:
  
   *by(value, id, my.plot)*
  
   But of course, it didn't do what I wanted. I realized that the
 parameter
   passed to my.plot, is a secion of value per ID, and not the ID value
   itself. Question is, how can I get the value of factor ID at each
 level
   of
   by()?
  
   Please advise,
  
   Merik
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  --
  Ista Zahn
  Graduate student
  University of Rochester
  Department of Clinical and Social Psychology
  http://yourpsyche.org
 
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between

Not when I do it.

 a - data.frame(A=1:10, B=11:20, D=31:40, E=41:50)
 a
A  B  D  E
1   1 11 31 41
2   2 12 32 42
3   3 13 33 43
4   4 14 34 44
5   5 15 35 45
6   6 16 36 46
7   7 17 37 47
8   8 18 38 48
9   9 19 39 49
10 10 20 40 50
 b - cbind(a[,1:2], C=21:30, a[,3:4])
 b
A  B  C  D  E
1   1 11 21 31 41
2   2 12 22 32 42
3   3 13 23 33 43
4   4 14 24 34 44
5   5 15 25 35 45
6   6 16 26 36 46
7   7 17 27 37 47
8   8 18 28 38 48
9   9 19 29 39 49
10 10 20 30 40 50

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bert Gunter
Sent: Monday, August 01, 2011 12:18 PM
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] Inserting column in between

Doesn't work -- you lose column names.

Try this instead:

yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

Adjust column names after via:

names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

Cheers,
Bert

On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com
wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk
wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want
to insert a column in 30th position.But i do not want to delete that
column.Is it possible to include a column in between, so that new values are
in 30th column and 30 th column is now 31st and 31st is 32nd..so on and
50th column is 51st..?I will be very thankful to you.





 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between

Bert,

On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Doesn't work -- you lose column names.

But I don't lose column names:

 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 x
  A B C D E
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
 newcol - 4:6
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
  A B newcol C D E
1 1 1  4 1 1 1
2 2 2  5 2 2 2
3 3 3  6 3 3 3

It's even possible to change names in the cbind() statement:

 cbind(x[,1:2], Y=newcol, x[,3:ncol(x)])
  A B Y C D E
1 1 1 4 1 1 1
2 2 2 5 2 2 2
3 3 3 6 3 3 3

If for some reason it isn't working for you, you might try explicitly calling
cbind.data.frame() instead of the default cbind().


 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

This shouldn't be necessary, I think. What happens if you use my
above example?

Sarah


 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk 
 wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want to 
 insert a column in 30th position.But i do not want to delete that column.Is 
 it possible to include a column in between, so that new values are in 30th 
 column and 30 th column is now 31st and 31st is 32nd..so on and 50th 
 column is 51st..?I will be very thankful to you.






-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between -- better way?

Bert,

On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Folks:

 I consider my reply below rather clumsy: One has to keep track of
 index numbers other than that which is inserted and must separately
 change column names. Is there as essentially better way to do this,
 either via base R or via an R package. I leave it to you to define
 essentially better.

Having tried your solution with sample data, I'd have to agree. :)
Your approach does mess up the column names, and also doesn't work
if x is a matrix rather than data frame. Mine, using the full cbind(), works
in both cases, preserving the column names and running even if x is
a matrix.

It could be written as a function, but since it's only one line and
really only requires knowing at what position you'd like to add
the new column, it hardly seems worth it unless it's something
to be done repeatedly.

  x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 newcol - 4:6
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
  A B newcol C D E
1 1 1  4 1 1 1
2 2 2  5 2 2 2
3 3 3  6 3 3 3


 x[,3:6] - cbind(newcol, x[,3:5])
 x
  A B C D E E.1
1 1 1 4 1 1   1
2 2 2 5 2 2   2
3 3 3 6 3 3   3


 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 x - as.matrix(x)
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
 A B newcol C D E
[1,] 1 1  4 1 1 1
[2,] 2 2  5 2 2 2
[3,] 3 3  6 3 3 3
 x[,3:6] - cbind(newcol, x[,3:5])
Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds

Sarah

 Thanks.

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote:
 Doesn't work -- you lose column names.

 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk 
 wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want 
 to insert a column in 30th position.But i do not want to delete that 
 column.Is it possible to include a column in between, so that new values 
 are in 30th column and 30 th column is now 31st and 31st is 32nd..so 
 on and 50th column is 51st..?I will be very thankful to you.

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between -- better way?

2011-08-01 Thread Ista Zahn

On Mon, Aug 1, 2011 at 1:43 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Bert,

 On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Folks:

 I consider my reply below rather clumsy: One has to keep track of
 index numbers other than that which is inserted and must separately
 change column names. Is there as essentially better way to do this,
 either via base R or via an R package. I leave it to you to define
 essentially better.

A variation on the theme that I prefer for aesthetic reasons is

a - data.frame(A=1:10, B=11:20, D=31:40, E=41:50)
a$F - 21:30
a - a[, c(1:2, 5, 3:4)]

I doubt that it is essentially better, as it still requires keeping
track of the index, but to me this is easier to follow.

Best,
Ista


 Having tried your solution with sample data, I'd have to agree. :)
 Your approach does mess up the column names, and also doesn't work
 if x is a matrix rather than data frame. Mine, using the full cbind(), works
 in both cases, preserving the column names and running even if x is
 a matrix.

 It could be written as a function, but since it's only one line and
 really only requires knowing at what position you'd like to add
 the new column, it hardly seems worth it unless it's something
 to be done repeatedly.

  x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 newcol - 4:6
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
  A B newcol C D E
 1 1 1      4 1 1 1
 2 2 2      5 2 2 2
 3 3 3      6 3 3 3


 x[,3:6] - cbind(newcol, x[,3:5])
 x
  A B C D E E.1
 1 1 1 4 1 1   1
 2 2 2 5 2 2   2
 3 3 3 6 3 3   3


 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 x - as.matrix(x)
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
     A B newcol C D E
 [1,] 1 1      4 1 1 1
 [2,] 2 2      5 2 2 2
 [3,] 3 3      6 3 3 3
 x[,3:6] - cbind(newcol, x[,3:5])
 Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds

 Sarah

 Thanks.

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote:
 Doesn't work -- you lose column names.

 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com 
 wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk 
 wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want 
 to insert a column in 30th position.But i do not want to delete that 
 column.Is it possible to include a column in between, so that new values 
 are in 30th column and 30 th column is now 31st and 31st is 32nd..so 
 on and 50th column is 51st..?I will be very thankful to you.

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing the index of factor in by() function

Merik,

You did get an answer to the question, and it's even included in the material
below.

What doesn't work for you in Ista's suggestion?

id- c(1,1,1,1,1,2,2,2,3,3,3)
month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5)
value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)
dat.tmp - data.frame(id, month, value)

my.plot - function(dat) {print(dat[, c(id, value)])}
by(dat.tmp, id, my.plot)

But if for some reason you need to get the separate sections, not just
act on them, this might also work:


dat.split - split(dat.tmp, dat.tmp$id)
lapply(dat.split, my.plot)

Sarah

On Mon, Aug 1, 2011 at 1:34 PM, Merik Nanish merik.nan...@gmail.com wrote:
 Since I didn't get an answer to this question, I'm rephrasing my question in
 simpler terms:

 I have  a dataframe and I want to split it based on the levels of one of its
 columns, and apply a function to each section of the data. Output of the
 function may be drawing a plot, returning  a value, whatever. I want to do
 it efficiently though (for loops are very slow).

 How can I do that?

 M

 On Tue, Jul 26, 2011 at 10:12 AM, Ista Zahn iz...@psych.rochester.eduwrote:

 Hi Merik,
 Please keep the mailing list copied.

 On Tue, Jul 26, 2011 at 6:44 AM, Merik Nanish merik.nan...@gmail.com
 wrote:
  You can convert my data into a dataframe simply by dat - data.frame(id,
  month, value). That doesn't help though.

 Can you be more specific? What is the problem you are having?

 And no, that's not what I'm looking
  for. What I intend to do is for by to loop through the data based on
 levels
  of id factor (1,2, and 3), and for each level, for my function to
 printout
  the values of value and month belonging to the section of data with
 that
  id.

 OK, easy enough:

 dat.tmp - data.frame(id, month, value)
 my.plot - function(dat) {print(dat[, c(id, value)])}
 by(dat.tmp, id, my.plot)

  Right now, I achieve this with a for loop but I want to avoid looping in
 the
  data as much as possible.

 Why? What do you have against loops?

 Best,
 Ista

 
  On Tue, Jul 26, 2011 at 12:18 AM, Ista Zahn iz...@psych.rochester.edu
  wrote:
 
  Hi Merik,
  by() works most easily with data.frames. Is this what you are after?
 
  my.plot - function(dat) { print(dat$value);
  print(dat$month[dat$id==dat$value]) }
  by(dat.tmp, id, my.plot)
 
  Best,
  Ista
 
  On Mon, Jul 25, 2011 at 9:19 PM, Merik Nanish merik.nan...@gmail.com
  wrote:
   Hello,
  
   Here are three vectors to give context to my question below:
  
   *id    - c(1,1,1,1,1,2,2,2,3,3,3))
   month - c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5)
   value - c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)*
  
   and I want to plot value over month separately for each id.
 Before
   I
   can do that, I need to section both month and value, based on ID. I
   create a
   my.plot function like this (at this point, it doesn't draw any plots,
 it
   is
   just an effort to help my understand what I'm doing):
  
   *my.plot - function(y) { print(y); print(month[id==y]) }*
  
   Now, I tried:
  
   *by(value, id, my.plot)*
  
   But of course, it didn't do what I wanted. I realized that the
 parameter
   passed to my.plot, is a secion of value per ID, and not the ID value
   itself. Question is, how can I get the value of factor ID at each
 level
   of
   by()?
  
   Please advise,
  
   Merik
  



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between

Thanks Sarah and David.

Yes, but note this:

 z - data.frame(a=1:2,b=3:4)
 z
  a b
1 1 3
2 2 4
 newdat - 5:6
 cbind(z[,1],newdat,z[,2])
   newdat
[1,] 1  5 3
[2,] 2  6 4

 cbind.data.frame(z[,1],newdat,z[,2])
  z[, 1] newdat  z[, 2]
1  1   53
2  2   64

Aha moment! -- You need drop=FALSE:

 cbind(z[,1,drop=FALSE],newdat,z[,2,drop=FALSE])
  a newdat b
1 1  5 3
2 2  6 4


So your solution does not work in general (and you may not have
intended it to); while mine does, but is blatantly clumsy. I would say
the better approach is merely to add the drop = FALSE option to
yours even though it is unnecessary in your simple example:

 cbind(x[,1:2,drop = FALSE], newcol, x[,3:ncol(x)], drop= FALSE)

... and I would definitely count this as an R 'gotcha' . (and it has
gotcha'ed me before).

Cheers,
-- Bert


On Mon, Aug 1, 2011 at 10:37 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Bert,

 On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Doesn't work -- you lose column names.

 But I don't lose column names:

 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 x
  A B C D E
 1 1 1 1 1 1
 2 2 2 2 2 2
 3 3 3 3 3 3
 newcol - 4:6
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
  A B newcol C D E
 1 1 1      4 1 1 1
 2 2 2      5 2 2 2
 3 3 3      6 3 3 3

 It's even possible to change names in the cbind() statement:

 cbind(x[,1:2], Y=newcol, x[,3:ncol(x)])
  A B Y C D E
 1 1 1 4 1 1 1
 2 2 2 5 2 2 2
 3 3 3 6 3 3 3

 If for some reason it isn't working for you, you might try explicitly calling
 cbind.data.frame() instead of the default cbind().


 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

 This shouldn't be necessary, I think. What happens if you use my
 above example?

 Sarah


 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk 
 wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i want 
 to insert a column in 30th position.But i do not want to delete that 
 column.Is it possible to include a column in between, so that new values 
 are in 30th column and 30 th column is now 31st and 31st is 32nd..so 
 on and 50th column is 51st..?I will be very thankful to you.






 --
 Sarah Goslee
 http://www.functionaldiversity.org


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fill Matrix quicker

2011-08-01 Thread monk

dear all,

i have a quite simple question, i want to fill up a Matrix like done in the
following function,
but the performance is very bad for large dimensions
is there a way to do this like with apply or something similar?


makeMatrix - function(a, b,dim) {
X=matrix(0,ncol=dim,nrow=dim)



for (i in c(1:dim)){
for (j in c(1:dim)) {
if (i==j) {X[i,j]-a}
else { X[i,j]-  exp(( -1*abs(i-j))/(3*b)) }
}
}
X
}

--
View this message in context: 
http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error message jpeg62.dll missing

2011-08-01 Thread Prof Brian Ripley


See the footer of this and every R-help message.

In particular, that DLL is not used by R itself, so this is probably 
something called from a third-party package.


A number of packages used to use that DLL (which is rather out of 
date), but no longer, so is your R actually current (the posting guide 
asked you to update *before* posting: it also asked you for 'at a 
minimum information)?


On Mon, 1 Aug 2011, Rocky Hyacinth wrote:


Dear R-help

We are getting an error message `jpeg62.dll missing'.

We are running Windows 7 64-bit, from a Mac using Boot Camp.

Do you know of this error message, and can you give us help trying to
resolve the problem?

many thanks
Rocky

Rocky Hyacinth
Technician
Department of Archaeology
University of Sheffield
United Kingdom

[[alternative HTML version deleted]]


And not to send HTML 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill Matrix quicker

Making use of the row() and col() functions speeds things up a bit.

makeMatrix2 - function(a, b, dim) {
X - matrix(NA, ncol=dim, nrow=dim)
X - exp( (-1*abs(row(X) - col(X)))/(3*b) )
diag(X) - a
X
}

system.time(makeMatrix(1, 2, 1000))
system.time(makeMatrix2(1, 2, 1000))

Jean


`·.,,  (((º   `·.,,  (((º   `·.,,  (((º

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA




From:
monk m...@hush.com
To:
r-help@r-project.org
Date:
08/01/2011 01:20 PM
Subject:
[R] fill Matrix quicker
Sent by:
r-help-boun...@r-project.org



dear all,

i have a quite simple question, i want to fill up a Matrix like done in 
the
following function,
but the performance is very bad for large dimensions
is there a way to do this like with apply or something similar?


makeMatrix - function(a, b,dim) {
 X=matrix(0,ncol=dim,nrow=dim)



 for (i in c(1:dim)){
 for (j in c(1:dim)) {
 if (i==j) {X[i,j]-a}
 else { X[i,j]-  exp(( 
-1*abs(i-j))/(3*b)) }
 }
 }
 X
}

--
View this message in context: 
http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 5 arguments passed to .Internal(matrix) which requires 7

Yes, even if I only run the command matrix(0,30,10) I get the error. I am
running R with Ubuntu 10.10 (maverick) with R version:

R version 2.13.1 (2011-07-08)


When I check the function matrix, I can see that it is only passing five
arguments to the function .Internal() (shown below).

function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
{
data - as.vector(data)
if (missing(nrow))
nrow - ceiling(length(data)/ncol)
else if (missing(ncol))
ncol - ceiling(length(data)/nrow)
.Internal(matrix(data, nrow, ncol, byrow, dimnames))
}
environment: namespace:base



On Mon, Aug 1, 2011 at 1:02 PM, Jean V Adams jvad...@usgs.gov wrote:


 Robert,

 What code did you run to get that error?

 Do you get the error if the only code that you run is ...
  matrix(0, 30, 10)

 You gave three arguments to matrix, which requires none, but can take up to
 five.
 In the function matrix there is a call to .Internal(matrix) which requires
 7 arguments.
 See ...
  matrix

 Jean


 `·.,,  (((º   `·.,,  (((º   `·.,,  (((º

 Jean V. Adams
 Statistician
 U.S. Geological Survey
 Great Lakes Science Center
 223 East Steinfest Road
 Antigo, WI 54409  USA




  From: Robert Pfister rw...@virginia.edu To: r-help@r-project.org Date: 
 08/01/2011
 11:56 AM Subject: [R]  5 arguments passed to .Internal(matrix) which
 requires 7 Sent by: r-help-boun...@r-project.org
 --



 Hello,

 I am having a problem with the function matrix. Specifically, when I pass
 three arguments (two more being instantiated in the function), I get the
 following error message:

 Error in matrix(0, 30, 10) :
  5 arguments passed to .Internal(matrix) which requires 7


 I looked into it, and someone has suggested that this may be the function
 from an old version of R. I recently changed my source path from the lucid
 version to the maverick version and installed all of the R packages I need
 like so, but why would this change the matrix() function? Also, how does R
 know that I passed five arguments (only three being given) if the matrix()
 function is supposed to take seven arguments?

 Thank you,

 Robert

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 5 arguments passed to .Internal(matrix) which requires 7

That's interesting.  My function matrix() looks like this:

function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
{
if (is.object(data) || !is.atomic(data)) 
data - as.vector(data)
.Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow), 
missing(ncol)))
}
environment: namespace:base

I'm running Windows R version 2.13.0 (2011-04-13).

Jean



From:
Robert Pfister rw...@virginia.edu
To:
Jean V Adams jvad...@usgs.gov
Cc:
r-help@r-project.org
Date:
08/01/2011 01:35 PM
Subject:
Re: [R] 5 arguments passed to .Internal(matrix) which requires 7



Yes, even if I only run the command matrix(0,30,10) I get the error. I am 
running R with Ubuntu 10.10 (maverick) with R version:

R version 2.13.1 (2011-07-08)

When I check the function matrix, I can see that it is only passing five 
arguments to the function .Internal() (shown below).


function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
{
data - as.vector(data)
if (missing(nrow)) 
nrow - ceiling(length(data)/ncol)
else if (missing(ncol)) 

ncol - ceiling(length(data)/nrow)
.Internal(matrix(data, nrow, ncol, byrow, dimnames))
}
environment: namespace:base



On Mon, Aug 1, 2011 at 1:02 PM, Jean V Adams jvad...@usgs.gov wrote:

Robert, 

What code did you run to get that error? 

Do you get the error if the only code that you run is ... 
 matrix(0, 30, 10) 

You gave three arguments to matrix, which requires none, but can take up 
to five. 
In the function matrix there is a call to .Internal(matrix) which requires 
7 arguments. 
See ... 
 matrix 

Jean 


`·.,,  (((º   `·.,,  (((º   `·.,,  (((º

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA




From: 
Robert Pfister rw...@virginia.edu 
To: 
r-help@r-project.org 
Date: 
08/01/2011 11:56 AM 
Subject: 
[R]  5 arguments passed to .Internal(matrix) which requires 7 
Sent by: 
r-help-boun...@r-project.org




Hello,

I am having a problem with the function matrix. Specifically, when I pass
three arguments (two more being instantiated in the function), I get the
following error message:

Error in matrix(0, 30, 10) :
 5 arguments passed to .Internal(matrix) which requires 7


I looked into it, and someone has suggested that this may be the function
from an old version of R. I recently changed my source path from the lucid
version to the maverick version and installed all of the R packages I need
like so, but why would this change the matrix() function? Also, how does R
know that I passed five arguments (only three being given) if the matrix()
function is supposed to take seven arguments?

Thank you,

Robert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill Matrix quicker

2011-08-01 Thread Patrick Burns


Most certainly you can speed it up:

 X - exp(-abs(row(X) - col(X)) / (3*b))
 diag(X) - a

should do what you want.  This is called
'vectorization' and is discussed lots of
places -- for instance, in the two documents
mentioned below in my signature.

On 01/08/2011 19:12, monk wrote:

dear all,

i have a quite simple question, i want to fill up a Matrix like done in the
following function,
but the performance is very bad for large dimensions
is there a way to do this like with apply or something similar?


makeMatrix- function(a, b,dim) {
X=matrix(0,ncol=dim,nrow=dim)



for (i in c(1:dim)){
for (j in c(1:dim)) {
if (i==j) {X[i,j]-a}
else { X[i,j]-  exp(( -1*abs(i-j))/(3*b)) }
}
}
X
}

--
View this message in context: 
http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting column in between -- better way?

Actually Sara's method fails if the insertion is after the first or before
the last column:

x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
newcol - 4:6
cbind(x[,1], newcol, x[,2:ncol(x)])
  x[, 1] newcol B C D E
1  1  4 1 1 1 1
2  2  5 2 2 2 2
3  3  6 3 3 3 3

 cbind(x[,1:4], newcol, x[,ncol(x)])
  A B C D newcol x[, ncol(x)]
1 1 1 1 1  41
2 2 2 2 2  52
3 3 3 3 3  63

Inserting drop=FALSE fixes them.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Sarah Goslee
Sent: Monday, August 01, 2011 12:44 PM
To: Bert Gunter
Cc: r-help@r-project.org
Subject: Re: [R] Inserting column in between -- better way?

Bert,

On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Folks:

 I consider my reply below rather clumsy: One has to keep track of
 index numbers other than that which is inserted and must separately
 change column names. Is there as essentially better way to do this,
 either via base R or via an R package. I leave it to you to define
 essentially better.

Having tried your solution with sample data, I'd have to agree. :)
Your approach does mess up the column names, and also doesn't work
if x is a matrix rather than data frame. Mine, using the full cbind(), works
in both cases, preserving the column names and running even if x is
a matrix.

It could be written as a function, but since it's only one line and
really only requires knowing at what position you'd like to add
the new column, it hardly seems worth it unless it's something
to be done repeatedly.

  x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 newcol - 4:6
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
  A B newcol C D E
1 1 1  4 1 1 1
2 2 2  5 2 2 2
3 3 3  6 3 3 3


 x[,3:6] - cbind(newcol, x[,3:5])
 x
  A B C D E E.1
1 1 1 4 1 1   1
2 2 2 5 2 2   2
3 3 3 6 3 3   3


 x - data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3)
 x - as.matrix(x)
 cbind(x[,1:2], newcol, x[,3:ncol(x)])
 A B newcol C D E
[1,] 1 1  4 1 1 1
[2,] 2 2  5 2 2 2
[3,] 3 3  6 3 3 3
 x[,3:6] - cbind(newcol, x[,3:5])
Error in x[, 3:6] - cbind(newcol, x[, 3:5]) : subscript out of bounds

Sarah

 Thanks.

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter bgun...@gene.com wrote:
 Doesn't work -- you lose column names.

 Try this instead:

 yourframe[,30:51] - cbind( newcolumn,yourframe[,30:50])

 Adjust column names after via:

 names(yourframe) [30:51] - c(newcolname,names(yourframe[30:50])

 Cheers,
 Bert

 On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee sarah.gos...@gmail.com
wrote:
 x - cbind(x[,1:29], newcolumn, x[,30:ncol(x)])

 On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas vikas.ban...@kcl.ac.uk
wrote:
 Dear all,

 I have a very simple question.I have data frame of 50 columns and i
want to insert a column in 30th position.But i do not want to delete that
column.Is it possible to include a column in between, so that new values are
in 30th column and 30 th column is now 31st and 31st is 32nd..so on and
50th column is 51st..?I will be very thankful to you.

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Identifying US holidays

2011-08-01 Thread Dimitri Liakhovitski

Hello!

I am trying to identify which ones of a vector of dates are US
holidays. And, ideally, which is which. And I do not know (a-priori)
which dates those should be.
I have, for example:
 x-seq(as.Date(2011-01-01),as.Date(2011-12-31),by=day)
(x)

I think chron should help me here - but maybe I am not using it properly:

library(chron)
is.holiday(chron) # Says that none of those dates are holidays

?is.holiday says: holidays is an object that should be listing
holidays. But I want to figure out which of my dates are US holidays
and don't want to provide a list of

Package timeDate does almost what I need:
library(timeDate)
holidayNYSE(2008:2010)
holidayNYSE()

However, I don't need all the NYSE holidays (like Good Friday). Just
the major US holidays - New Years, MLK, Memorial Day, Independence
Day, Labor Day, Halloween, Thanksgiving, Christmas.
Is there any way to identify major US holidays?

Thanks a lot!

-
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error in self-made function - cannot deal with objects of length = 1

2011-08-01 Thread Berend Hasselman


bjmjarrett wrote:
 
 ...
 rate - function(x){
   storage - matrix(nrow=length(x),ncol=1)
   ifelse(length(x)==1,storage[1,] - NA,{
   storage[1,] - x[1]/max(x)
   for(i in 2:length(x)){
   p - i-1
   storage[i,] - ((x[i] - x[p]) / max(x))
   }
   })
   return(storage)
   }
 
 but I end up with this error when I try and use the above function in
 tapply():
 
 Error in ans[!test  !nas] - rep(no, length.out = length(ans))[!test  : 
 replacement has length zero
 
 

ifelse is for vector arguments.
You should use if() {...} else {.}

But why not just

c(x[1], diff(x))/max(x)

Berend

--
View this message in context: 
http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710621.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] example package for devel newcomers

2011-08-01 Thread Brian Diggs


On 7/31/2011 6:24 PM, Alexandre Aguiar wrote:

Em Domingo 31 Julho 2011, você escreveu:

My memory is that this question gets asked every few months and one of
the stock answers is to use the function 'package.skeleton' in the
utils package as a starting point.


Got that from docs. And actually I already have most of the code written.
My question addresses known tricks and impressions by experienced R
interface programmers. This kind of stuff can be really useful. For
instance, tricks are much better than docs when embedding php.

Thanx.


Hadley Wickham is working on this sort of thing.  I know he has given a 
master class on package development.  Some things related to that are on 
the wiki associated with his devtools package: 
https://github.com/hadley/devtools/wiki


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error while trying to install a package

2011-08-01 Thread Sushil Amirisetty


Hi Everyone, 

When i try to install a package using 

 install.packages(agricolae) 
--- Please select a CRAN mirror for use in this session --- 
| 


The cursor keeps blinking i dont get a popup menu to choose a CRAN mirror? Is 
it due to my proxy server settings? I tried to echo $http_proxy , it doesnt 
carry any proxy , its blank. Please help me. 

Thanks, 
Sushil.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill Matrix quicker

2011-08-01 Thread monk

thanks a lot , that will do the trick

--
View this message in context: 
http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710533.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error in self-made function - cannot deal with objects of length = 1

2011-08-01 Thread bjmjarrett

I have a function to calculate the rate of increase (the difference between
the value and the previous value divided by the total number of eggs in a
year) of egg production over the course of a year:

rate - function(x){
   storage - matrix(nrow=length(x),ncol=1)
   storage[1,] - x[1] / max(x) # as there is no previous value
   for( i in 2:length(x)){
   p - i - 1
   storage[i,] - ((x[i] - x[p] / max(x))
   }
   return(storage)
}

However, as it requires the subtraction of one term with the previous term
it fails when dealing with objects with length = 1 (when only one reading
has been taken in a year). I have tried adding an ifelse() function into
`rate' with NA added for length 1: 

rate - function(x){
storage - matrix(nrow=length(x),ncol=1)
ifelse(length(x)==1,storage[1,] - NA,{
storage[1,] - x[1]/max(x)
for(i in 2:length(x)){
p - i-1
storage[i,] - ((x[i] - x[p]) / max(x))
}
})
return(storage)
}

but I end up with this error when I try and use the above function in
tapply():

Error in ans[!test  !nas] - rep(no, length.out = length(ans))[!test  : 
replacement has length zero

Thanks in advance,

Ben

--
View this message in context: 
http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710555.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 5 arguments passed to .Internal(matrix) which requires 7