subject:"\[R\] R\-Help"

Re: [R] R help on loops

2013-06-07 Thread Berend Hasselman



Please, please do not use html formatted mail.
It is clearly requested by the mailing list
The code of your last mail is a mess and when replying it becomes even more of 
a mess.

I told you to do 

siml[g,] - optm(perm=20)

See the comma after g.

and not what you are now doing with siml[g]-optm(perm=20)[g]

I'm giving up.

Berend


On 07-06-2013, at 12:15, Laz lmra...@ufl.edu wrote:

 Hi,
 
 I am almost getting there, but still have errors. Thanks for your help. I 
 have tried improving but I get the following errors below:
 
 
 
 
  
 itn-function(it){
 
 + 
 siml-matrix(NA,ncol=5,nrow=it)
 
 + 
 for(g in 1:it){
 
 + 
 siml[g]-optm(perm=20)[g]
 
 + 
 }
 
 + 
 siml
 
 + 
 }
 
  
 itn(3)
 
  [,1] [,2] [,3] [,4] [,5]
 [1,] 0.8873775898   NA   NA   NA   NA
 [2,] 0.0015584824   NA   NA   NA   NA
 [3,] 0.0001414317   NA   NA   NA   NA
 
 
 
  
 itn-function(it){
 
 + 
 siml-matrix(NA,ncol=5,nrow=it)
 
 + 
 for(g in 1:it){
 
 + 
 siml[g]-optm(perm=20)
 
 + 
 }
 
 + 
 siml
 
 + 
 }
 
  
 itn(3)
 
   [,1] [,2] [,3] [,4] [,5]
 [1,] 0.8880941   NA   NA   NA   NA
 [2,] 0.8869727   NA   NA   NA   NA
 [3,] 0.8877045   NA   NA   NA   NA
 
 Warning messages:
 
 1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 
 I expect something close to
 
 
 average   sd  se   min   max
 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167
 
   0.884659 0.0004215379 0.000410769 0.2342 0.676307
 
   0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337
 
 
 
 
   
 Thanks fpr you help.
 
 
 On 6/7/2013 5:24 AM, Berend Hasselman wrote:
 On 07-06-2013, at 10:59, Laz lmra...@ufl.edu
  wrote:
 
 
 Dear R users,
 
 I am stuck here: My first function returns a vector of 5 values.
 In my second function, I want to repeat this, a number of times, say 10 
 so that I have 10 rows and five columns but I keep on getting errors.
 
 See the code and results below:
 
 optm -function(perm, verbose = FALSE)
 {
   trace-c()
   for (k in 1:perm){
 trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1]
 perm[k]-k
 mat-cbind(trace, perm = seq(perm))
   }
   if (verbose){
 cat(***starting matrix\n)
 print(mat)
   }
   # iterate till done
   while(nrow(mat)  1){
 high - diff(mat[, 'trace'])  0
 if (!any(high)) break  # done
 # find which one to delete
 delete - which.max(high) + 1L
 mat - mat[-delete, ]
 newmat-apply(mat,2,mean)[1]
 sdm-sd(mat[,1])
 sem-sdm/sqrt(nrow(mat))
 maxv-mat[1,1]
 minv-mat[nrow(mat),1]
   }
   stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv)
   stats
 }
 
 
 test-optm(perm=20)
 test
 
 average   sd   se   min  max
 trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241
 
 
 itn-function(it){
 siml-matrix(NA,ncol=5,nrow=length(it))
   for(g in 1:it){
siml[g]-optm(perm=20)
   }
 siml-cbind(siml=siml)
 siml
 }
 
 
 ans-itn(5)
 
 Warning messages:
 1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 4: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 5: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 ans
 
   [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469
 
 
 
 1. Not reproducible code. Where does function Rspatswap come from?
 
 2. You have several errors in function itn:
 Argument it is a scalar: length(it) is 1. You need to do siml - 
 matrix(NA,ncol=5,nrow=it)
 Next in the g-loop you want to fill row g so do: siml[g,] - …..
 
 Finally why are you doing siml - cbind(siml=siml)?
 Seems superfluous to me. Delete the line.
 
 Berend
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help on loops

2013-05-31 Thread Laz

Dear R Users,

I created a  function  which returns a value every time it's run (A 
simplified toy example is attached on this mail).

For example spat(a,b,c,d) # run the first time and it gives you ans1= 
10, perm=1 ;
  run the second time and gives you ans2= 7, perm=2 etc
I  store both the result and the iteration on a matrix called vector 
with columns:1==ans, column2==permutation

The rule I want to implement is: compare between ans1 and ans2. If 
ans1ans2, keep both ans1 and ans2. if ans1ans2, drop the whole row of 
the second permutation (that is drop both ans2 and perm2 but continue 
counting all permutations).
  Re-run the function for the 3rd time and repeat comparison between the 
value of the last run and the current value obtained.
Return  matrix ans with the saved ans and their permutations and another 
full matrix with all results including the dropped ans and their 
permutation numbers.

I need to repeat this process 1000 times but I am getting stuck below. 
See attached R code.
The code below works only for the first 6 permutations. suppose I want 
1000 permutations, where do I keep the loop?


Example: Not a perfect code though it somehow works:
testx-function(perm){
  optA-c()
  #while(perm=2){
for (k in 1:perm){
  #repeat {
  optA[k]-sample(1:1000,1,replace=TRUE)
perm[k]-k
}
mat2-as.matrix(cbind(optA=optA,perm=perm))
  result-mat2
lenm-nrow(result)
if(result[1,1]=result[2,1]) result-result[1,]
  return(list(mat2=mat2,result=result))
  #}
if(perm2){
  mat2-as.matrix(cbind(optA=optA,perm=perm))
  result-mat2
  lenm-nrow(result)
  if(result[1,1]=result[2,1]) result-result[1,]
if(result[lenm-1,1]=result[lenm,1]) result-result[-lenm,]
   if(result[(lenm-2),1]=result[(lenm-1),1]) 
result-result[-(lenm-1),]
   if(result[(lenm-3),1]=result[(lenm-2),1]) 
result-result[-(lenm-2),]
   if(result[(lenm-4),1]=result[(lenm-3),1]) 
result-result[-(lenm-3),]
  if(result[(lenm-5),1]=result[(lenm-4),1]) 
result-result[-(lenm-4),]
   return(list(mat2=mat2,result=result))
  }
}
## Now calling my function below:

testx(perm=6)



$mat2
  optA perm
[1,]  2721
[2,]  8582
[3,]  8343
[4,]  8624
[5,]  6505
[6,]  4056

$result
optA perm
  2721


 testx(perm=2)
$mat2
  optA perm
[1,]  3981
[2,]  8162

$result
optA perm
  3981


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on loops

2013-05-31 Thread jim holtman

Is this what you want?  I was not clear on your algorithm, but is looks
like you wanted descending values:

 testx -
+ function(n, verbose = FALSE)
+ {
+ mat - cbind(optA = sample(n, n, TRUE), perm = seq(n))
+ if (verbose){
+ cat(***starting matrix\n)
+ print(mat)
+ }
+ # iterate till done
+ while(nrow(mat)  1){
+ high - diff(mat[, 'optA'])  0
+ if (!any(high)) break  # done
+ # find which one to delete
+ delete - which.max(high) + 1L
+ mat - mat[-delete, ]
+ }
+ mat
+ }
 testx(20, verbose = TRUE)
***starting matrix
  optA perm
 [1,]   131
 [2,]   122
 [3,]73
 [4,]   104
 [5,]   115
 [6,]46
 [7,]   117
 [8,]28
 [9,]69
[10,]5   10
[11,]6   11
[12,]   18   12
[13,]9   13
[14,]   16   14
[15,]   18   15
[16,]9   16
[17,]2   17
[18,]7   18
[19,]   15   19
[20,]7   20
 optA perm
[1,]   131
[2,]   122
[3,]73
[4,]46
[5,]28
[6,]2   17



On Fri, May 31, 2013 at 2:02 PM, Laz lmra...@ufl.edu wrote:

 Dear R Users,

 I created a  function  which returns a value every time it's run (A
 simplified toy example is attached on this mail).

 For example spat(a,b,c,d) # run the first time and it gives you ans1=
 10, perm=1 ;
   run the second time and gives you ans2= 7, perm=2 etc
 I  store both the result and the iteration on a matrix called vector
 with columns:1==ans, column2==permutation

 The rule I want to implement is: compare between ans1 and ans2. If
 ans1ans2, keep both ans1 and ans2. if ans1ans2, drop the whole row of
 the second permutation (that is drop both ans2 and perm2 but continue
 counting all permutations).
   Re-run the function for the 3rd time and repeat comparison between the
 value of the last run and the current value obtained.
 Return  matrix ans with the saved ans and their permutations and another
 full matrix with all results including the dropped ans and their
 permutation numbers.

 I need to repeat this process 1000 times but I am getting stuck below.
 See attached R code.
 The code below works only for the first 6 permutations. suppose I want
 1000 permutations, where do I keep the loop?


 Example: Not a perfect code though it somehow works:
 testx-function(perm){
   optA-c()
   #while(perm=2){
 for (k in 1:perm){
   #repeat {
   optA[k]-sample(1:1000,1,replace=TRUE)
 perm[k]-k
 }
 mat2-as.matrix(cbind(optA=optA,perm=perm))
   result-mat2
 lenm-nrow(result)
 if(result[1,1]=result[2,1]) result-result[1,]
   return(list(mat2=mat2,result=result))
   #}
 if(perm2){
   mat2-as.matrix(cbind(optA=optA,perm=perm))
   result-mat2
   lenm-nrow(result)
   if(result[1,1]=result[2,1]) result-result[1,]
 if(result[lenm-1,1]=result[lenm,1]) result-result[-lenm,]
if(result[(lenm-2),1]=result[(lenm-1),1])
 result-result[-(lenm-1),]
if(result[(lenm-3),1]=result[(lenm-2),1])
 result-result[-(lenm-2),]
if(result[(lenm-4),1]=result[(lenm-3),1])
 result-result[-(lenm-3),]
   if(result[(lenm-5),1]=result[(lenm-4),1])
 result-result[-(lenm-4),]
return(list(mat2=mat2,result=result))
   }
 }
 ## Now calling my function below:

 testx(perm=6)



 $mat2
   optA perm
 [1,]  2721
 [2,]  8582
 [3,]  8343
 [4,]  8624
 [5,]  6505
 [6,]  4056

 $result
 optA perm
   2721


  testx(perm=2)
 $mat2
   optA perm
 [1,]  3981
 [2,]  8162

 $result
 optA perm
   3981


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 123, Issue 30

2013-05-27 Thread Neotropical bat risk assessments

Hi all are there any R packages that include circular stats similar to 
Oriana (http://www.kovcomp.co.uk/oriana/newver4.html)?


I am interested in looking at annual patterns of bat activity where data 
will have date/times and relative abundance values for each Date.


I would like to have a circular plot with the circumference axis the 
12 months of the year and then a value of relative abundance and likely 
with ggplot2 this can be set to color= species.


Tnx

Bruce

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 123, Issue 30

2013-05-27 Thread Jim Lemon


On 05/27/2013 10:28 AM, Neotropical bat risk assessments wrote:

Hi all are there any R packages that include circular stats similar to
Oriana (http://www.kovcomp.co.uk/oriana/newver4.html)?

I am interested in looking at annual patterns of bat activity where data
will have date/times and relative abundance values for each Date.

I would like to have a circular plot with the circumference axis the
12 months of the year and then a value of relative abundance and likely
with ggplot2 this can be set to color= species.

Tnx

Bruce


Hi Bruce,
Here is a possibility:

library(plotrix)
batact-matrix(c(sin(seq(0,1.833*pi,length=12))+2+rnorm(36)/4),
 nrow=3,byrow=TRUE)
batpos-seq(0,1.833*pi,length=12)
radial.plot(batact,batpos,rp.type=ps,main=Bat activity by month,
 line.col=2:4,radial.lim=0:4,label.pos=batpos,labels=month.abb,
 point.symbols=16:18,point.col=2:4,label.prop=1.1,start=pi/2,
 clockwise=TRUE)
legend(-3.5,0.5,paste(Species,1:3),pch=16:18,col=2:4)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-Help: nparLD Package Non-parametric Repeated Measures

2013-05-16 Thread Gerrit Eichner


Hello, James,

see my comments inline.

... Main issue/question: In R the nparLD ANOVA-type Test showed a 
significant p-value for diel period, no effect of season, and no 
interaction between diel period and season. But a post-hoc Wilcoxon 
Signed-Rank Test did NOT find a significant difference (p = 0.054) for 
diel period (day vs night) body temperature.


How is it possible to have a significant effect for day vs night, based on
the nparLD package, but NO significant difference between day and night for
the post-hoc Wilcoxon test?


Those tests -- in general -- test different hypotheses and use different 
test statistics, the first one, in addition, using a distributional 
approximation to obtain a p-value. You may want to look at the references 
cited in the respective online help pages for technical details.



Also, if I only have two levels of the fixed effect (day vs night), do I
need to run a post-hoc test or just look at the mean values after the
ANOVA-type test?


Hm, actually no (of course depending on your scientific question) since 
the ANOVA-table -- if you believe its p-value -- already tells you that 
there is a (marginally) significant main Diel-period-effect, i.e., a 
difference between day and night.


However, I recommend that you consult with a local statistician since this 
is actually not an R-problem and since I have the impression that there 
exist some uncertainties in your statistical expertise.


 Hth  --  Gerrit


Data info:

The repeated measurements on the 7 subjects had 2 fixed effects:

1. Diel period (day or night)
2. Season (Spring, summer, and fall)(Subplot Factor)

Mean values for body temperature and for diel period are below. Diel column
(D=Day, N = Night). State column (RT=Spring, RF = Summer, PT = Fall).
Subject, N=7. NA = missing value.

All comments (good and bad) are greatly appreciated!

Thanks,
James



-- output of sessionInfo():

[code]

data=read.csv(file.choose(), header=TRUE)
attach(data)
data

 stp diel state subject
1  26.2DRT   1
2  26.4NRT   1
3  24.1DRT   2
4NANRT   2
5NADRT   3
6  25.2NRT   3
7  27.1DRT   4
8  26.5NRT   4
9  26.9DRT   5
10 27.1NRT   5
11 26.2DRT   6
12 26.0NRT   6
13 26.3DRT   7
14 26.7NRT   7
15 26.0DRF   1
16 26.6NRF   1
17 24.2DRF   2
18 25.6NRF   2
19 25.6DRF   3
20 26.6NRF   3
21 26.1DRF   4
22 26.9NRF   4
23 27.2DRF   5
24 27.4NRF   5
25 26.2DRF   6
26 26.7NRF   6
27 27.2DRF   7
28 27.5NRF   7
29 25.0DPT   1
30 24.8NPT   1
31   NADPT   2
32   NANPT   2
33   NADPT   3
34   NANPT   3
35 26.7DPT   4
36 26.9NPT   4
37 27.6DPT   5
38 27.5NPT   5
39 25.2DPT   6
40 24.9NPT   6
41 27.1DPT   7
42 27.0NPT   7



ex.f2-ld.f2(y=stp, time1=diel, time2=state, subject=subject,

time1.name=Diel, time2.name=State, description=FALSE)


ex.f2$ANOVA.test

  Statistic   dfp-value
Diel   4.9028447 1.00 0.02681249
State  0.2332795 1.374320 0.70586274
Diel:State 2.1937783 1.062943 0.13717393
[/code]

[code]

detach(data)
data=read.csv(file.choose(), header=TRUE)
attach(data)
data

   day night
1  26.2  26.4
2  26.0  26.6
3  25.0  24.8
4  24.2  25.6
5  25.6  26.6
6  27.1  26.5
7  26.1  26.9
8  26.7  26.9
9  26.9  27.1
10 27.2  27.4
11 27.6  27.5
12 26.2  26.0
13 26.2  26.7
14 25.2  24.9
15 26.3  26.7
16 27.2  27.5
17 27.1  27.0


library(coin)



wilcoxsign_test(day ~ night, distribution=exact)


   Exact Wilcoxon-Signed-Rank Test

data:  y by x (neg, pos)
stratified by block
Z = -1.9234, p-value = 0.05482
alternative hypothesis: true mu is not equal to 0

[/code]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help: Batch read files based on names in a list

2013-05-15 Thread arun

HI,
You could use:
(# with 3 files in my folder)
lfiles-list.files(pattern=.txt)
 lfiles
#[1] file1.txt file2.txt file3.txt
lst1-lapply(lfiles,function(x) 
read.table(x,header=TRUE,sep=,stringsAsFactors=FALSE))
lst1
#[[1]]
#  col1 col2
#1    1  0.5
#2    2  0.2
#3    3  0.3
#4    4  0.3
#5    5  0.1
#6    6  0.2
#
#[[2]]
 # col1 col3
#1    1    A
#2    2    B
#3    3    C
#
#[[3]]
 # col1 col4
#1    1  0.1
#2    2  0.5
#3    4  0.9
library(plyr)
 join_all(lst1,by=col1)
#  col1 col2 col3 col4
#1    1  0.5    A  0.1
#2    2  0.2    B  0.5
#3    3  0.3    C   NA
#4    4  0.3 NA  0.9
#5    5  0.1 NA   NA
#6    6  0.2 NA   NA

join_all(lst1,by=col1,type=inner)
#  col1 col2 col3 col4
#1    1  0.5    A  0.1
#2    2  0.2    B  0.5

#or
 Reduce(function(...) merge(...,all=TRUE),lst1)
#  col1 col2 col3 col4
#1    1  0.5    A  0.1
#2    2  0.2    B  0.5
#3    3  0.3    C   NA
#4    4  0.3 NA  0.9
#5    5  0.1 NA   NA
#6    6  0.2 NA   NA

#Suppose you don't want to read file3.txt
 lfilesSub-lfiles[!lfiles%in% file3.txt]
lfilesSub
#[1] file1.txt file2.txt

A.K.








- Original Message -
From: Jonathan Dry dry...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Wednesday, May 15, 2013 1:51 PM
Subject: [R] R help: Batch read files based on names in a list

*

I am currently reading in a series of files, applying the same functions to
them one at a time, and then merging the resulting data frames e.g.:

MyRows - c(RowA, RowB, RowC)File1_DF - 
read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File1_DF - 
as.data.frame(t(File1_DF[MyRows,]))File1_DF - 
as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = 
Row.names, by.y=row.names)File2_DF - 
read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File2_DF - 
as.data.frame(t(File2_DF[MyRows,]))File2_DF - 
as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = 
Row.names, by.y=row.names)

...etc

I want to know if I can use a list of the filenames c(File1, File2,
File2) etc. and apply a function to do this in a more automated fasion?
This would involve using the list value in the directory path to read in
the file i.e.

*MyFilesValue*_DF - 
read.delim(DirectoryToFolders\\*MyFilesValue*_Folder\\*MyFilesValue*.txt,
 stringsAsFactors=FALSE, check.names=FALSE)

Any help appreciated
*

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help: Batch read files based on names in a list

2013-05-15 Thread Jonathan Dry

*

I am currently reading in a series of files, applying the same functions to
them one at a time, and then merging the resulting data frames e.g.:

MyRows - c(RowA, RowB, RowC)File1_DF - 
read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File1_DF - 
as.data.frame(t(File1_DF[MyRows,]))File1_DF - 
as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = 
Row.names, by.y=row.names)File2_DF - 
read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File2_DF - 
as.data.frame(t(File2_DF[MyRows,]))File2_DF - 
as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = 
Row.names, by.y=row.names)

...etc

I want to know if I can use a list of the filenames c(File1, File2,
File2) etc. and apply a function to do this in a more automated fasion?
This would involve using the list value in the directory path to read in
the file i.e.

*MyFilesValue*_DF - 
read.delim(DirectoryToFolders\\*MyFilesValue*_Folder\\*MyFilesValue*.txt,
 stringsAsFactors=FALSE, check.names=FALSE)

Any help appreciated
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help: Batch read files based on names in a list

2013-05-15 Thread Enrico Schumann

On Wed, 15 May 2013, Jonathan Dry dry...@gmail.com writes:

 *

 I am currently reading in a series of files, applying the same functions to
 them one at a time, and then merging the resulting data frames e.g.:

MyRows - c(RowA, RowB, RowC)File1_DF - 
read.delim(DirectoryToFiles\\File1_Folder\\File1.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File1_DF - 
as.data.frame(t(File1_DF[MyRows,]))File1_DF - 
as.data.frame(t(File1_DF))mergeDF - merge(mergeDF,File1_DF, by.x = 
Row.names, by.y=row.names)File2_DF - 
read.delim(DirectoryToFiles\\File2_Folder\\File2.txt, 
stringsAsFactors=FALSE, check.names=FALSE)File2_DF - 
as.data.frame(t(File2_DF[MyRows,]))File2_DF - 
as.data.frame(t(File2_DF))mergeDF - merge(mergeDF,File2_DF, by.x = 
Row.names, by.y=row.names)

 ...etc

 I want to know if I can use a list of the filenames c(File1, File2,
 File2) etc. and apply a function to do this in a more automated fasion?
 This would involve using the list value in the directory path to read in
 the file i.e.

Something like this?

  files - dir(my_directory)
  for (f in files) {
  ## do something with file 'f'
  }



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-Help: nparLD Package Non-parametric Repeated Measures

2013-05-14 Thread James Casey

Hi,

I'm trying to analyze repeated measurements of body temperature data
collected from 7 randomly chosen subjects (e.g. turtles). I am using R,
along with the nparLD package to test for an effect of diel period (fixed
factor: day or night) and season (sub-plot fixed factor: spring, summer,
fall) on body temperature.

Based on this set-up (LD-F2), I am using the non-parametric nparLD
pacakge([url]http://www.inside-r.org/packages/cran/nparLD/docs/ld.f2[/url])
because data transformations were unsuccessful and I am randomly missing
some paired values.

Main issue/question: In R the nparLD ANOVA-type Test showed a significant
p-value for diel period, no effect of season, and no interaction between
diel period and season. But a post-hoc Wilcoxon Signed-Rank Test did NOT
find a significant difference (p = 0.054) for diel period (day vs night)
body temperature.

How is it possible to have a significant effect for day vs night, based on
the nparLD package, but NO significant difference between day and night for
the post-hoc Wilcoxon test?

Also, if I only have two levels of the fixed effect (day vs night), do I
need to run a post-hoc test or just look at the mean values after the
ANOVA-type test?

Data info:

The repeated measurements on the 7 subjects had 2 fixed effects:

1. Diel period (day or night)
2. Season (Spring, summer, and fall)(Subplot Factor)

Mean values for body temperature and for diel period are below. Diel column
(D=Day, N = Night). State column (RT=Spring, RF = Summer, PT = Fall).
Subject, N=7. NA = missing value.

All comments (good and bad) are greatly appreciated!

Thanks,
James



 -- output of sessionInfo():

[code]
 data=read.csv(file.choose(), header=TRUE)
 attach(data)
 data
  stp diel state subject
1  26.2DRT   1
2  26.4NRT   1
3  24.1DRT   2
4NANRT   2
5NADRT   3
6  25.2NRT   3
7  27.1DRT   4
8  26.5NRT   4
9  26.9DRT   5
10 27.1NRT   5
11 26.2DRT   6
12 26.0NRT   6
13 26.3DRT   7
14 26.7NRT   7
15 26.0DRF   1
16 26.6NRF   1
17 24.2DRF   2
18 25.6NRF   2
19 25.6DRF   3
20 26.6NRF   3
21 26.1DRF   4
22 26.9NRF   4
23 27.2DRF   5
24 27.4NRF   5
25 26.2DRF   6
26 26.7NRF   6
27 27.2DRF   7
28 27.5NRF   7
29 25.0DPT   1
30 24.8NPT   1
31   NADPT   2
32   NANPT   2
33   NADPT   3
34   NANPT   3
35 26.7DPT   4
36 26.9NPT   4
37 27.6DPT   5
38 27.5NPT   5
39 25.2DPT   6
40 24.9NPT   6
41 27.1DPT   7
42 27.0NPT   7


ex.f2-ld.f2(y=stp, time1=diel, time2=state, subject=subject,
time1.name=Diel, time2.name=State, description=FALSE)

 ex.f2$ANOVA.test
   Statistic   dfp-value
Diel   4.9028447 1.00 0.02681249
State  0.2332795 1.374320 0.70586274
Diel:State 2.1937783 1.062943 0.13717393
[/code]

[code]
 detach(data)
 data=read.csv(file.choose(), header=TRUE)
 attach(data)
 data
day night
1  26.2  26.4
2  26.0  26.6
3  25.0  24.8
4  24.2  25.6
5  25.6  26.6
6  27.1  26.5
7  26.1  26.9
8  26.7  26.9
9  26.9  27.1
10 27.2  27.4
11 27.6  27.5
12 26.2  26.0
13 26.2  26.7
14 25.2  24.9
15 26.3  26.7
16 27.2  27.5
17 27.1  27.0

 library(coin)

 wilcoxsign_test(day ~ night, distribution=exact)

Exact Wilcoxon-Signed-Rank Test

data:  y by x (neg, pos)
 stratified by block
Z = -1.9234, p-value = 0.05482
alternative hypothesis: true mu is not equal to 0

[/code]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help for creating expression data of Differentially expressed genes

2013-05-08 Thread Vivek Das

Hi Arun,

 I am still facing trouble as I can see the data output is identical for
all rows when I am using this merge function. It seems that since in my
data2 which I have provided I have not given you the exact genes I have.
There are likely to be repeatations of ID in both the files but the problem
now is that when the loop is running for the merge on basis of ID it is
printing on the rows for which the ID's are repeated more than once and
which does not have any repeatations in the data.txt they are being
ignored. Say I had a locus XLOC_002126 which is the ID and its repeated for
10 times in the data1.txt so when my merging is taking place its only
working on those ID's which are repeated more than once in data1.txt and
merging them with their respective attributes. And this is happening for
the number of times it is being repeated in data1.txt and the next data on
which it merges is also the same for which in data1.txt we have more
repeats for ID. Here is an example of the output I am getting below.



  ID test_id gene locus Sample_118p_0 Sample_118rp3_0 Sample_118rz_0
Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0
Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0
Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0
Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0
Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0
XLOC_002126 XLOC_002126 MPZ chr1:161274524-161279762 0.32181 0.333882
0.174569 0.29143 1.56295 1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938
0.597619 0.999418 0.675425 0.624723 1.023 0.361899 1.23395 1.80139 1.30457
0.692972 1.42658 1280.78 76.5147 4.67875 468.667  XLOC_002126 XLOC_002126
MPZ chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295
1.67143 1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ
chr1:161274524-161279762 0.32181 0.333882 0.174569 0.29143 1.56295 1.67143
1.09774 0 0 0.0238811 0.0456828 0.0171938 0.597619 0.999418 0.675425
0.624723 1.023 0.361899 1.23395 1.80139 1.30457 0.692972 1.42658 1280.78
76.5147 4.67875 468.667  XLOC_002126 XLOC_002126 MPZ

Re: [R] R help for creating expression data of Differentially expressed genes

2013-05-07 Thread arun

Hi Vivek,

May be this helps:
set.seed(35)
 dat1- cbind(ID=1:8, 
as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))

set.seed(38)
dat2- cbind(ID= sample(1:20,8,replace=FALSE), 
as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1])
 merge(dat1[,1:2],dat2[,1:31],by=ID)
#  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
#1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3  45   9
#2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21  26  20
#3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29  50  42
#4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13  18  38
#5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42  25   5
#  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
#1  14  27   3  21   6  44  33  42  10  29
#2  48  13   8  47  18   9  23   9  44   3
#3  25  14  31  19  14   6  26  13   6  49
#4  43  28  15   6   9  19  43  21  41  21
#5   1  27  18   3  42   5  16  39  46  47
A.K.



- Original Message -
From: Vivek Das vd4mm...@gmail.com
To: arun smartpink...@yahoo.com
Cc: 
Sent: Tuesday, May 7, 2013 3:45 PM
Subject: R help for creating expression data of Differentially expressed genes

Hi Arun,

I need some help regarding R scripting. I have two data file one containing 
seven columns and the other containing 33. Both files have unique identifier as 
ID. I want to create another file which should have the first two columns of 
the first file and and the 31 columns of the second file matched on the basis 
of ID. The first file is having gene I'd and gene names of around 500 and I 
want the output file which is having all of those and other attributes as well. 
I want to get the output file having all attributes matching with the I'd of 
the first file. So that I get output of 500 rows with all the attributes of 
second file. I am new to R but having trouble with merge function in R. If you 
can help it will be great.

Regards,
Vivek

Sent from my iPad

On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote:

 HI Ye,
 
 For the NA in ID column, 
 
 
 
 Hi
 dat1- read.table(text=
 ObsNumber     ID          Weight
      1                 0001         12
      2                 0001          13
      3                 0001           14
      4                  0002         16
       5                 0002         17
      6                   N/A          18  
 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A)
  unlist(lapply(split(dat1,dat1$ID),function(x) 
with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 #[1] 0001_1 0001_2 0001_3 0002_1 0002_2
 A.K.
 
 From: Ye Lin ye...@lbl.gov
 To: arun smartpink...@yahoo.com 
 Cc: R help r-help@r-project.org 
 Sent: Tuesday, May 7, 2013 2:54 PM
 Subject: Re: [R] create unique ID for each group
 
 
 
 Thanks A.K. But I have NA in ID column, so when I apply the code, it gives 
 me error saying the replacement as less rows than the data has. Anyway for 
 ID=N/A, return sth like N/A_1 in order as well?
 
 
 
 
 
 
 On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote:
 
 H,
 Sorry, a mistake:
 dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x) 
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 dat1
  # ObsNumber   ID Weight UniqueID
 #1         1 0001     12   0001_1
 #2         2 0001     13   0001_2
 #3         3 0001     14   0001_3
 #4         4 0002     16   0002_1
 #5         5 0002     17   0002_2
 
 dat2$UniqueID-unlist(lapply(split(dat2,dat2$ID),function(x) 
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 
 A.K.
 
 
 
 
 
 - Original Message -
 
 From: arun smartpink...@yahoo.com
 To: Ye Lin ye...@lbl.gov
 Cc: R help r-help@r-project.org
 Sent: Tuesday, May 7, 2013 2:10 PM
 Subject: Re: [R] create unique ID for each group
 
 
 
 Hi,
 
 Try this:
 dat1- read.table(text=
 ObsNumber     ID          Weight
      1                 0001         12
      2                 0001          13
      3                 0001           14
      4                  0002         16
       5                 0002         17
 ,sep=,header=TRUE,colClass=c(numeric,character,numeric))
 dat2- read.table(text=
 ID               Height
 0001            3.2
 0001             2.6
 0001             3.2
 0002             2.2
 0002              2.6
 ,sep=,header=TRUE,colClass=c(character,numeric))
 dat1$UniqueID-with(dat1,as.character(interaction(ID,ObsNumber,sep=_)))
  
dat2$UniqueID-with(dat2,as.character(interaction(ID,rownames(dat2),sep=_)))
  dat2
 #    ID Height UniqueID
 #1 0001    3.2   0001_1
 #2 0001    2.6   0001_2
 #3 0001    3.2   0001_3
 #4 0002    2.2   0002_4
 #5 0002    2.6   0002_5
 A.K.
 
 
 
 - Original Message -
 From: Ye Lin ye...@lbl.gov
 To: R help r-help@r-project.org
 Cc:
 Sent: Tuesday, May 7,

Re: [R] R help for creating expression data of Differentially expressed genes

2013-05-07 Thread Vivek Das

HI Arun,

My data sets are as in the provided files. I am providing the sample files.
I guess this will give a better idea to the type of working I want to do
with the two files and the kind or script am trying to write. Hope you can
give me some suggestions regarding this. I am new to R so having trouble to
use different functions to use this for my working.

Anyone who can help me out with this can be of great help.


--

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek@ieo.eu
vchris...@yahoo.co.in
vd4mm...@gmail.com


On Tue, May 7, 2013 at 10:36 PM, arun smartpink...@yahoo.com wrote:

 Hi Vivek,

 May be this helps:
 set.seed(35)
  dat1- cbind(ID=1:8,
 as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))

 set.seed(38)
 dat2- cbind(ID= sample(1:20,8,replace=FALSE),
 as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
 colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1])
  merge(dat1[,1:2],dat2[,1:31],by=ID)
 #  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18
 X19 X20
 #1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3
 45   9
 #2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21
 26  20
 #3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29
 50  42
 #4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13
 18  38
 #5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42
 25   5
 #  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
 #1  14  27   3  21   6  44  33  42  10  29
 #2  48  13   8  47  18   9  23   9  44   3
 #3  25  14  31  19  14   6  26  13   6  49
 #4  43  28  15   6   9  19  43  21  41  21
 #5   1  27  18   3  42   5  16  39  46  47
 A.K.



 - Original Message -
 From: Vivek Das vd4mm...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc:
 Sent: Tuesday, May 7, 2013 3:45 PM
 Subject: R help for creating expression data of Differentially expressed
 genes

 Hi Arun,

 I need some help regarding R scripting. I have two data file one
 containing seven columns and the other containing 33. Both files have
 unique identifier as ID. I want to create another file which should have
 the first two columns of the first file and and the 31 columns of the
 second file matched on the basis of ID. The first file is having gene I'd
 and gene names of around 500 and I want the output file which is having all
 of those and other attributes as well. I want to get the output file having
 all attributes matching with the I'd of the first file. So that I get
 output of 500 rows with all the attributes of second file. I am new to R
 but having trouble with merge function in R. If you can help it will be
 great.

 Regards,
 Vivek

 Sent from my iPad

 On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote:

  HI Ye,
 
  For the NA in ID column,
 
 
 
  Hi
  dat1- read.table(text=
  ObsNumber ID  Weight
   1 0001 12
   2 0001  13
   3 0001   14
   4  0002 16
5 0002 17
   6   N/A  18
 
 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A)
   unlist(lapply(split(dat1,dat1$ID),function(x)
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
  #[1] 0001_1 0001_2 0001_3 0002_1 0002_2
  A.K.
  
  From: Ye Lin ye...@lbl.gov
  To: arun smartpink...@yahoo.com
  Cc: R help r-help@r-project.org
  Sent: Tuesday, May 7, 2013 2:54 PM
  Subject: Re: [R] create unique ID for each group
 
 
 
  Thanks A.K. But I have NA in ID column, so when I apply the code, it
 gives me error saying the replacement as less rows than the data has.
 Anyway for ID=N/A, return sth like N/A_1 in order as well?
 
 
 
 
 
 
  On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote:
 
  H,
  Sorry, a mistake:
  dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x)
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
  dat1
   # ObsNumber   ID Weight UniqueID
  #1 1 0001 12   0001_1
  #2 2 0001 13   0001_2
  #3 3 0001 14   0001_3
  #4 4 0002 16   0002_1
  #5 5 0002 17   0002_2
 
  dat2$UniqueID-unlist(lapply(split(dat2,dat2$ID),function(x)
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 
  A.K.
 
 
 
 
 
  - Original Message -
 
  From: arun smartpink...@yahoo.com
  To: Ye Lin ye...@lbl.gov
  Cc: R help r-help@r-project.org
  Sent: Tuesday, May 7, 2013 2:10 PM
  Subject: Re: [R] create unique ID for each group
 
 
 
  Hi,
 
  Try this:
  dat1- read.table(text=
  ObsNumber ID  Weight
   1 0001 12

Re: [R] R help for creating expression data of Differentially expressed genes

2013-05-07 Thread arun

HI,
Assuming that out_dat.txt is the output you expected.


 dat1- read.table(data1.txt,header=TRUE,stringsAsFactors=FALSE)
dat2- read.table(data2.txt,header=TRUE,stringsAsFactors=FALSE)
out_dat- read.table(out_data.txt,header=TRUE,stringsAsFactors=FALSE)
 out_dat2-merge(dat1[,1:4],dat2,by=ID)
 identical(out_dat,out_dat2)
#[1] TRUE
A.K.






From: Vivek Das vd4mm...@gmail.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Tuesday, May 7, 2013 6:07 PM
Subject: Re: R help for creating expression data of Differentially expressed 
genes



HI Arun,

My data sets are as in the provided files. I am providing the sample files. I 
guess this will give a better idea to the type of working I want to do with the 
two files and the kind or script am trying to write. Hope you can give me some 
suggestions regarding this. I am new to R so having trouble to use different 
functions to use this for my working.

Anyone who can help me out with this can be of great help.



--

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek@ieo.eu
            vchris...@yahoo.co.in
            vd4mm...@gmail.com



On Tue, May 7, 2013 at 10:36 PM, arun smartpink...@yahoo.com wrote:

Hi Vivek,

May be this helps:
set.seed(35)
 dat1- cbind(ID=1:8, 
as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))

set.seed(38)
dat2- cbind(ID= sample(1:20,8,replace=FALSE), 
as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
colnames(dat2)[-1]-gsub(V,X,colnames(dat2)[-1])
 merge(dat1[,1:2],dat2[,1:31],by=ID)
#  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
#1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3  45   9
#2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21  26  20
#3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29  50  42
#4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13  18  38
#5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42  25   5
#  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
#1  14  27   3  21   6  44  33  42  10  29
#2  48  13   8  47  18   9  23   9  44   3
#3  25  14  31  19  14   6  26  13   6  49
#4  43  28  15   6   9  19  43  21  41  21
#5   1  27  18   3  42   5  16  39  46  47

A.K.



- Original Message -

From: Vivek Das vd4mm...@gmail.com
To: arun smartpink...@yahoo.com
Cc:

Sent: Tuesday, May 7, 2013 3:45 PM
Subject: R help for creating expression data of Differentially expressed genes

Hi Arun,

I need some help regarding R scripting. I have two data file one containing 
seven columns and the other containing 33. Both files have unique identifier 
as ID. I want to create another file which should have the first two columns 
of the first file and and the 31 columns of the second file matched on the 
basis of ID. The first file is having gene I'd and gene names of around 500 
and I want the output file which is having all of those and other attributes 
as well. I want to get the output file having all attributes matching with the 
I'd of the first file. So that I get output of 500 rows with all the 
attributes of second file. I am new to R but having trouble with merge 
function in R. If you can help it will be great.

Regards,
Vivek

Sent from my iPad

On 07/mag/2013, at 21:13, arun smartpink...@yahoo.com wrote:

 HI Ye,

 For the NA in ID column,



 Hi
 dat1- read.table(text=
 ObsNumber     ID          Weight
      1                 0001         12
      2                 0001          13
      3                 0001           14
      4                  0002         16
       5                 0002         17
      6                   N/A          18 
 ,sep=,header=TRUE,colClass=c(numeric,character,numeric),na.strings=N/A)
  unlist(lapply(split(dat1,dat1$ID),function(x) 
with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 #[1] 0001_1 0001_2 0001_3 0002_1 0002_2
 A.K.
 
 From: Ye Lin ye...@lbl.gov
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, May 7, 2013 2:54 PM
 Subject: Re: [R] create unique ID for each group



 Thanks A.K. But I have NA in ID column, so when I apply the code, it gives 
 me error saying the replacement as less rows than the data has. Anyway for 
 ID=N/A, return sth like N/A_1 in order as well?






 On Tue, May 7, 2013 at 11:17 AM, arun smartpink...@yahoo.com wrote:

 H,
 Sorry, a mistake:
 dat1$UniqueID-unlist(lapply(split(dat1,dat1$ID),function(x) 
 with(x,as.character(interaction(ID,seq_len(nrow(x)),sep=_,use.names=FALSE)
 dat1
  # ObsNumber   ID Weight UniqueID
 #1         1 0001     12   0001_1
 #2         2 0001     13   0001_2
 #3         3 0001     14   0001_3
 #4         4 0002     16   0002_1
 #5         5

Re: [R] R help - bootstrap with survival analysis

2013-04-30 Thread Terry Therneau


This comes up regularly.  Type ?print.survfit and look at the comments there under 
value.

Terry T.

-  begin included message 

Hi,

I'm not sure if this is the proper way to ask questions, sorry if not.  But
here's my problem:

I'm trying to do a bootstrap estimate of the mean for some survival data.
Is there a way to specifically call upon the rmean value, in order to store
it in an object? I've used print(...,print.rmean=T) to print the summary of
survfit, but I'm not sure how to access only rmean because it does not show
up under attributes for survfit.

Thanks for any help in advance!

Fayaaz Khatri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help - bootstrap with survival analysis

2013-04-29 Thread Fayaaz Khatri

Hi,

I'm not sure if this is the proper way to ask questions, sorry if not.  But
here's my problem:

I'm trying to do a bootstrap estimate of the mean for some survival data.
Is there a way to specifically call upon the rmean value, in order to store
it in an object? I've used print(...,print.rmean=T) to print the summary of
survfit, but I'm not sure how to access only rmean because it does not show
up under attributes for survfit.

Thanks for any help in advance!

Fayaaz Khatri

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 121, Issue 5

2013-03-26 Thread Kerry

Thank you Jason!  actually, there have been two solutions and one is yours

setting row.names=F works great, additionally what Ive been having problems 
with is the European version of Word 2010.  Apparently it sets delimiters of 
; instead of , as in the English/USA version.  This is something that 
messes not only with R exporting txt or csv files, but will effect Office 
products, ArcGIS and several more.  So having to go into the Region and 
Language settings of the computer and changing the defaults has cleared up 
quite a bit of the issues I was having with R.
 
~K


Not all who wander are lost
K. L. Nicholson PhD
Associate Wildlife Biologist®





 From: Law, Jason jason@portlandoregon.gov


Sent: Tuesday, March 5, 2013 5:31 PM
Subject: RE: R-help Digest, Vol 121, Issue 5

On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to 
follow the particulars here:

http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data

For example:

write.table(test, '***.tab', sep = '\t', row.names = F)

The extension .tab and sep = '\t' are required for text files.  Didn't test 
row.names=T but I wouldn't count on that working either.

Jason Law
Statistician
City of Portland, Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov

-Original Message-
Date: Mon, 04 Mar 2013 10:48:39 -0500
From: Duncan Murdoch murdoch.dun...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org
Subject: Re: [R] Mysterious issues with reading text files from R in
    ArcGIS and Excel
Message-ID: 5134c257.6020...@gmail.com
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 04/03/2013 10:09 AM, Kerry wrote:
 It seems within the last ~3 months Ive been having issues with writing text 
 or csv files from a R data frame.  The problem is multifold and it is hard to 
 filter  out what is going on and where the problem is.  So, Im hoping someone 
 else has come across this and may provide insight.

I think you need to provide a simple example for us to try, either by 
putting a small example of one of your files online for us to download, 
or (better) by giving us self-contained code to duplicate the problem.

You might also get better help (especially about ArcGIS) on the 
R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo.

Duncan Murdoch




 My current settings for R:
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 locale:

 [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    
 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
 [5] LC_TIME=Swedish_Sweden.1252

 attached base packages:
 [1] tcltk     stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] adehabitat_1.8.11 shapefiles_0.6    foreign_0.8-51    tkrplot_0.0-23    
 ade4_1.5-1

 loaded via a namespace (and not attached):
 [1] tools_2.15.2

 I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop

 Basically, no matter what data frame I am working on, when I export it to a 
 text file to be use in Excel or ArcGIS problems arise.  Im not sure if it is 
 R or these other programs, maybe forums for ArcGIS might be more appropriate, 
 but this problem only occurs when I use tables that have been produced from 
 an R session.

 When I try to open a text file in Excel, either I get an error message stating
 The file you are trying to open is in a different format than specified by 
 the file extension.  Verify that the file is not corrupted and is from a 
 trusted source.
 Followed by
 Excel has detected that 'file.txt' is a SYLK file, but cannot load it.  
 Either the file has errors or is not a SYLK file format.  Click OK to open 
 the file in a different format
 Then the file opens


 Otherwise, the file opens fine the first time through - and looks ok. I 
 can't figure out what Im doing different between the two commands of 
 write.table as they are always written the same:
 write.csv(file, file = D:/mylocations/fileofinterest.csv) or 
 write.table(file, file = D:/mylocations/fileofinterest.txt)
 Sometimes I will try to add sep = , or sep = ; but these don't make a 
 difference (which I didn't figure they would).

 The other program I use is ArcGIS and bringing in a txt file from R is really 
 messing things up as 2 new columns of information are typically added and 
 date/time data is usually lost with txt files, but not with csv files.

 For instance - a text file that looks like this in Excel:
      id       x       y                date    R1dmed    R1dmean R1error 
R2error
 1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973 2498.2973   FALSE   
 FALSE
 2 F07001 1481274 6619628 2007-03-05 12:00:41  657.1029  657.1029    FALSE   
 FALSE
 3 F07001 1481279 6619630 2007-03-05 14:01:12  660.3569  660.3569    FALSE   
 FALSE
 4 F07001 1481271 6619700

Re: [R] R-help Digest, Vol 121, Issue 5

2013-03-26 Thread Robert Baer

On 3/26/2013 7:21 AM, Kerry wrote:
Thank you Jason! actually, there have been two solutions and one is yours

setting row.names=F works great, additionally what Ive been having problems
with is the European version of Word 2010. Apparently it sets delimiters of
; instead of , as in the English/USA version. This is something that
messes not only with R exporting txt or csv files, but will effect Office
products, ArcGIS and several more. So having to go into the Region and
Language settings of the computer and changing the defaults has cleared up
quite a bit of the issues I was having with R.

~K
?read.csv2
This is a wrapper to read.table(). read.csv2 expects the delimiter to
be '; by default and the decimal to be , by default.

Not all who wander are lost
K. L. Nicholson PhD
Associate Wildlife Biologist®

From: Law, Jason jason@portlandoregon.gov

Sent: Tuesday, March 5, 2013 5:31 PM
Subject: RE: R-help Digest, Vol 121, Issue 5

On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to
follow the particulars here:

http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data

For example:

write.table(test, '***.tab', sep = '\t', row.names = F)

The extension .tab and sep = '\t' are required for text files. Didn't test
row.names=T but I wouldn't count on that working either.

Jason Law
Statistician
City of Portland, Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov

-Original Message-
Date: Mon, 04 Mar 2013 10:48:39 -0500
From: Duncan Murdoch murdoch.dun...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org
Subject: Re: [R] Mysterious issues with reading text files from R in
ArcGIS and Excel
Message-ID: 5134c257.6020...@gmail.com
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 04/03/2013 10:09 AM, Kerry wrote:
It seems within the last ~3 months Ive been having issues with writing text
or csv files from a R data frame. The problem is multifold and it is hard
to filter out what is going on and where the problem is. So, Im hoping
someone else has come across this and may provide insight.
I think you need to provide a simple example for us to try, either by
putting a small example of one of your files online for us to download,
or (better) by giving us self-contained code to duplicate the problem.

You might also get better help (especially about ArcGIS) on the
R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo.

Duncan Murdoch

My current settings for R:
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:

[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252
LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
[5] LC_TIME=Swedish_Sweden.1252

attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods
base

other attached packages:
[1] adehabitat_1.8.11 shapefiles_0.6foreign_0.8-51tkrplot_0.0-23
ade4_1.5-1

loaded via a namespace (and not attached):
[1] tools_2.15.2

I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop

Basically, no matter what data frame I am working on, when I export it to a
text file to be use in Excel or ArcGIS problems arise. Im not sure if it is
R or these other programs, maybe forums for ArcGIS might be more
appropriate, but this problem only occurs when I use tables that have been
produced from an R session.

When I try to open a text file in Excel, either I get an error message
stating
The file you are trying to open is in a different format than specified by
the file extension. Verify that the file is not corrupted and is from a
trusted source.
Followed by
Excel has detected that 'file.txt' is a SYLK file, but cannot load it.
Either the file has errors or is not a SYLK file format. Click OK to open
the file in a different format
Then the file opens

Otherwise, the file opens fine the first time through - and looks ok. I
can't figure out what Im doing different between the two commands of
write.table as they are always written the same:
write.csv(file, file = D:/mylocations/fileofinterest.csv) or
write.table(file, file = D:/mylocations/fileofinterest.txt)
Sometimes I will try to add sep = , or sep = ; but these don't make a
difference (which I didn't figure they would).

The other program I use is ArcGIS and bringing in a txt file from R is
really messing things up as 2 new columns of information are typically added
and date/time data is usually lost with txt files, but not with csv files.

For instance - a text file that looks like this in Excel:
id x ydateR1dmedR1dmean R1error
R2error
1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973

Re: [R] R-help Digest, Vol 121, Issue 5

2013-03-05 Thread Law, Jason

On R 2.15.2 and ArcGIS 9.3.1, it works for me in ArcCatalog but you have to 
follow the particulars here:

http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Accessing_delimited_text_file_data

For example:

write.table(test, '***.tab', sep = '\t', row.names = F)

The extension .tab and sep = '\t' are required for text files.  Didn't test 
row.names=T but I wouldn't count on that working either.

Jason Law
Statistician
City of Portland, Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov

-Original Message-
Date: Mon, 04 Mar 2013 10:48:39 -0500
From: Duncan Murdoch murdoch.dun...@gmail.com
To: Kerry kernichol...@yahoo.com
Cc: r-help@r-project.org r-help@r-project.org
Subject: Re: [R] Mysterious issues with reading text files from R in
ArcGIS and Excel
Message-ID: 5134c257.6020...@gmail.com
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 04/03/2013 10:09 AM, Kerry wrote:
 It seems within the last ~3 months Ive been having issues with writing text 
 or csv files from a R data frame.  The problem is multifold and it is hard to 
 filter  out what is going on and where the problem is.  So, Im hoping someone 
 else has come across this and may provide insight.

I think you need to provide a simple example for us to try, either by 
putting a small example of one of your files online for us to download, 
or (better) by giving us self-contained code to duplicate the problem.

You might also get better help (especially about ArcGIS) on the 
R-sig-Geo mailing list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo.

Duncan Murdoch




 My current settings for R:
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 locale:

 [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252
 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
 [5] LC_TIME=Swedish_Sweden.1252

 attached base packages:
 [1] tcltk stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] adehabitat_1.8.11 shapefiles_0.6foreign_0.8-51tkrplot_0.0-23
 ade4_1.5-1

 loaded via a namespace (and not attached):
 [1] tools_2.15.2

 I am using Microsoft Excel 2010 and ArcGIS 10.1sp1 for Desktop

 Basically, no matter what data frame I am working on, when I export it to a 
 text file to be use in Excel or ArcGIS problems arise.  Im not sure if it is 
 R or these other programs, maybe forums for ArcGIS might be more appropriate, 
 but this problem only occurs when I use tables that have been produced from 
 an R session.

 When I try to open a text file in Excel, either I get an error message stating
 The file you are trying to open is in a different format than specified by 
 the file extension.  Verify that the file is not corrupted and is from a 
 trusted source.
 Followed by
 Excel has detected that 'file.txt' is a SYLK file, but cannot load it.  
 Either the file has errors or is not a SYLK file format.  Click OK to open 
 the file in a different format
 Then the file opens


 Otherwise, the file opens fine the first time through - and looks ok. I 
 can't figure out what Im doing different between the two commands of 
 write.table as they are always written the same:
 write.csv(file, file = D:/mylocations/fileofinterest.csv) or 
 write.table(file, file = D:/mylocations/fileofinterest.txt)
 Sometimes I will try to add sep = , or sep = ; but these don't make a 
 difference (which I didn't figure they would).

 The other program I use is ArcGIS and bringing in a txt file from R is really 
 messing things up as 2 new columns of information are typically added and 
 date/time data is usually lost with txt files, but not with csv files.

 For instance - a text file that looks like this in Excel:
  id   x   ydateR1dmedR1dmean R1error 
 R2error
 1 F07001 1482445 6621768 2007-03-05 10:00:53 2498.2973 2498.2973   FALSE   
 FALSE
 2 F07001 1481274 6619628 2007-03-05 12:00:41  657.1029  657.1029FALSE   
 FALSE
 3 F07001 1481279 6619630 2007-03-05 14:01:12  660.3569  660.3569FALSE   
 FALSE
 4 F07001 1481271 6619700 2007-03-05 16:00:39  620.1397  620.1397FALSE   
 FALSE

   in ArcGIS now looks like this:

 Field1idid_Xid_YxydateR1dmedR1dmean R1errorR2errorOBJECTID *
 1F07001118.0818119.485541e+01514824456621768NA2498.297272498.29727FALSEFALSE1
 2F07001118.0818119.485541e+01514812746619628NA657.102922657.102922FALSEFALSE2
 3F07001118.0818119.485541e+01514812796619630NA660.356911660.356911FALSEFALSE3
 4F07001118.0818119.485541e+01514812716619700NA620.139702620.139702FALSEFALSE4
 5F07001118.0818119.485541e+01514808496620321NA378.186792378.186792FALSEFALSE5

 Where did id_X and id_Y come from?? What are they??
 What happened to the Date column???  Why does the date column show up when I 
 use write.csv but not write.table?

 Thank you for your help.

 ~K
   [[alternative HTML version deleted]]

[R] R -HELP REQUEST

2013-02-05 Thread Mahmoud Coker

Good morning to you all,
Sorry for taking your time from your research and
teaching schedules.
 
If you have a non-stationary univariate time Series
data that has the transformation:
Say; l.dat-log (series)
d.ldat-diff (l.dat, differences=1)
and you fit say arima model.
predit.arima-predict (fit.series, n.ahead=10,
xregnew= (n+1) :( n+10))
How could I re-transform
prediction$pred to the level data since it has been differenced once? I know 
exp (prediction$pred) will bring the inverse of the log
transform but what about the differenced transform? This is my question.
I would be very grateful if you could help me with
this.Thank you very much in anticipation

Mr. Mahmoud Coker
Senior Manager 
Bank of Sierra Leone 
(Sam-Bangura Building) 
Freetown-Sierra Leone
West Africa
Email: cokiest...@yahoo.com
Phone: +232 78 625967 / +232 77 440143
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R -HELP REQUEST

2013-02-05 Thread Rolf Turner



If you just want point forecasts, it's simple:

Let your original series be X_t, t=1, ..., N.
Let Y_t = log(X_t).
Let Z_t = Y_t - Y_{t-1}, t = 2, ..., N.
Fit your model and forecast, obtaining Z-hat__1, ..., Z-hat_10.

Then Y-hat_{N+1} = Y_N + Z-hat_1, Y-hat_{N+2} = Y-hat_{N+1} + Z-hat_2,
., Y-hat_{N+10} = Y-hat_{N+9} + Z-hat_10.

In R, let your forecast values be the vector Zhat (a vector of length 10).
Then do:

Yhat - cumsum(c(Y[N],Zhat))[-1]
Xhat - exp(Yhat)

Get error bounds on the forecasts is more problematic.

cheers,

Rolf Turner

On 02/05/2013 11:49 PM, Mahmoud Coker wrote:

Good morning to you all,
Sorry for taking your time from your research and
teaching schedules.
  
If you have a non-stationary univariate time Series

data that has the transformation:
Say; l.dat-log (series)
d.ldat-diff (l.dat, differences=1)
and you fit say arima model.
predit.arima-predict (fit.series, n.ahead=10,
xregnew= (n+1) :( n+10))
How could I re-transform
prediction$pred to the level data since it has been differenced once? I know 
exp (prediction$pred) will bring the inverse of the log
transform but what about the differenced transform? This is my question.
I would be very grateful if you could help me with
this.Thank you very much in anticipation.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help archives --- are they up-to-date?

2013-01-29 Thread Rolf Turner


I just saw a message from David Winsemius, responding to an inquiry
from Carol White:

 On Jan 28, 2013, at 9:06 PM, carol white wrote:

 Should I understand that this message was received?

 It's always possible to check the Archives for this question.

This prompted me to ask about a problem that has been bothering me
for a while:  When I go to Search on the R web page and then click on

 Searchable mail archives http://tolstoy.newcastle.edu.au/%7Erking/R/ 
 of the three mailing lists are provided by Robert King at the 
 University of Newcastle, Australia.

I find that the archives appear to end at 31 January 2012.  If I click 
on, say
 2012: April to June http://tolstoy.newcastle.edu.au/R/e18/help/

I get the response The future isn't here yet.  Well, yes, it isn't.  
But in my
limited understanding April to June 2012 is in the past, not the future.  Am
I doing something wrong, or is something broken in my system, or does
this happen to others as well?  Just in case it's of any relevance:
  sessionInfo()
 R version 2.15.2 Patched (2013-01-10 r61627)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods base

 other attached packages:
 [1] misc_0.0-15

 loaded via a namespace (and not attached):
 [1] grid_2.15.2 lattice_0.20-13 lme4_0.99-0 Matrix_1.0-10
 [5] nlme_3.1-107stats4_2.15.2

Thanks for any insight.

 cheers,

 Rolf


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help archives --- are they up-to-date?

2013-01-29 Thread Sarah Goslee

The searchable archives may lag, and apparently do. The main list
archive is here:

https://stat.ethz.ch/pipermail/r-help/

and is complete. That's the one to check if you wish to know whether
something made it to the list.

If you go to the r-help listinfo link in the mailing list footer, you
are directed to the link I posted, and the searchable archive is also
mentioned.

Sarah

On Tue, Jan 29, 2013 at 3:28 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:

 I just saw a message from David Winsemius, responding to an inquiry
 from Carol White:

 On Jan 28, 2013, at 9:06 PM, carol white wrote:

 Should I understand that this message was received?

 It's always possible to check the Archives for this question.

 This prompted me to ask about a problem that has been bothering me
 for a while:  When I go to Search on the R web page and then click on

 Searchable mail archives http://tolstoy.newcastle.edu.au/%7Erking/R/
 of the three mailing lists are provided by Robert King at the
 University of Newcastle, Australia.

 I find that the archives appear to end at 31 January 2012.  If I click
 on, say
 2012: April to June http://tolstoy.newcastle.edu.au/R/e18/help/

 I get the response The future isn't here yet.  Well, yes, it isn't.
 But in my
 limited understanding April to June 2012 is in the past, not the future.  Am
 I doing something wrong, or is something broken in my system, or does
 this happen to others as well?  Just in case it's of any relevance:
  sessionInfo()
 R version 2.15.2 Patched (2013-01-10 r61627)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods base

 other attached packages:
 [1] misc_0.0-15

 loaded via a namespace (and not attached):
 [1] grid_2.15.2 lattice_0.20-13 lme4_0.99-0 Matrix_1.0-10
 [5] nlme_3.1-107stats4_2.15.2

 Thanks for any insight.

  cheers,

  Rolf


 [[alternative HTML version deleted]]



--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help archives --- are they up-to-date?

2013-01-29 Thread Rolf Turner


On 01/30/2013 10:28 AM, Sarah Goslee wrote:

The searchable archives may lag, and apparently do. The main list
archive is here:

https://stat.ethz.ch/pipermail/r-help/

and is complete. That's the one to check if you wish to know whether
something made it to the list.

If you go to the r-help listinfo link in the mailing list footer, you
are directed to the link I posted, and the searchable archive is also
mentioned.


Gottit!!!  Bewdy, ta.

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r-help

2013-01-10 Thread Paul Musingila



-- ___ Paul K. Musingila University of 
Hasselt, I-Biostat, Diepenbeek, Belgium Mobile: +254-724-423532, 
+32-48-637-4558 E-mail: paul.musing...@student.uhasselt.be, 
pmusing...@gmail.com Skype: pmusingila When darkness overtakes the 
godly, light will come bursting in Psalm 112:4


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help post Bayesian CART

2013-01-07 Thread Rachel Plewes

Hi,

I have explored many of the R packages that construct Bayesian trees including 
the tgp, bart, BMA and maptree packages. I have also searched through some 
other packages but they do not seem to be suitable for the type of analysis I 
need to do. I need to construct Bayesian CART that have terminal nodes which 
have bivariate regressions (not multiple regressions like most of the packages 
do). Does anyone know of a package that has this capability?

Thanks,
Rachel
M.Sc Student

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 118, Issue 20

2012-12-20 Thread Terry Therneau

I don't know of one.  If building your own you could use rpart with the maxdepth=1 as 
the tool to find the best split at each node.

Terry Therneau


On 12/20/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi,

I've searched R-help and haven't found an answer. I have a set of data from 
which I can create a classification tree using
rpart. However, what I'd like to do is predefine the blank structure of the 
binary tree (i.e., which nodes to include) and then use a package like rpart to 
fit for the optimal splitting criteria at each of the predefined nodes.

Does such a package exist?

Thanks,
Lee



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread prasmas

Hi,
I am new to R.  I am trying to regroup data frame using multiple constrains.
for example

data frame: data
  value class   percent
15526   36  4.6875
15527   62  85.9375
15527   82  32.4564
15528   36  70.3125
15528   62  9.375
15528   82  74.6875

I need to regroup each class that have greater than or equal to 70 percent
into new group. Similarly, I also need to regroup each class that have less
than 70 percent into new group.

I can do this by using following syntax for each class
class36- data[data$class==36data$percent70,]
class36a- data[data$class==36data$percent=70,]
but I have 100 different classes. In order to do this for all 100 classes, I
have write that syntax 100 times. There would be some way to do dynamically
to regroup for 100 classes (may be using for loop) but I dont know. Can you
please help in this. 
Output should be like
data frame: class36
value   class   percent
15528   36  70.3125

data frame: class36a
value   class   percent
15526   36  4.6875

data frame: class62
15527   62  85.9375

data frame: class62a
15528   62  9.375

data frame: class82
15528   82  74.6875

data frame: class82a
15527   82  32.4564

Thank you very much your help..
P.



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread David Winsemius



On Nov 23, 2012, at 1:14 PM, prasmas wrote:


Hi,
I am new to R.  I am trying to regroup data frame using multiple  
constrains.

for example

data frame: data
 value  class   percent
15526   36  4.6875
15527   62  85.9375
15527   82  32.4564
15528   36  70.3125
15528   62  9.375
15528   82  74.6875

I need to regroup each class that have greater than or equal to 70  
percent
into new group. Similarly, I also need to regroup each class that  
have less

than 70 percent into new group.

I can do this by using following syntax for each class
class36- data[data$class==36data$percent70,]
class36a- data[data$class==36data$percent=70,]
but I have 100 different classes. In order to do this for all 100  
classes, I
have write that syntax 100 times. There would be some way to do  
dynamically
to regroup for 100 classes (may be using for loop) but I dont know.  
Can you

please help in this.
Output should be like
data frame: class36
value   class   percent
15528   36  70.3125

data frame: class36a
value   class   percent
15526   36  4.6875

 dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70,  
class] , ]

 dat.a
  value class percent
1 1552636  4.6875
2 1552762 85.9375
3 1552782 32.4564
4 1552836 70.3125
5 1552862  9.3750
6 1552882 74.6875

 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class,  
function(x) paste0(x, letters[1:length(x)])))

 dat.a
value class percent
36a 1552636  4.6875
36b 1552762 85.9375
62a 1552782 32.4564
62b 1552836 70.3125
82a 1552862  9.3750
82b 1552882 74.6875

You can split by the NROW of dat.a if you want.

--
David.


data frame: class62
15527   62  85.9375

data frame: class62a
15528   62  9.375

data frame: class82
15528   82  74.6875

data frame: class82a
15527   82  32.4564

Thank you very much your help..
P.



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread Rui Barradas


Hello,

Like the following?

dat - read.table(text=
value class percent
15526 36 4.6875
15527 62 85.9375
15527 82 32.4564
15528 36 70.3125
15528 62 9.375
15528 82 74.6875
, header = TRUE)

per70 - dat$percent  70
split(dat, list(dat$class, per70))

Hope this helps,

Rui Barradas
Em 23-11-2012 21:14, prasmas escreveu:

Hi,
I am new to R.  I am trying to regroup data frame using multiple constrains.
for example

data frame: data
   valueclass   percent
15526   36  4.6875
15527   62  85.9375
15527   82  32.4564
15528   36  70.3125
15528   62  9.375
15528   82  74.6875

I need to regroup each class that have greater than or equal to 70 percent
into new group. Similarly, I also need to regroup each class that have less
than 70 percent into new group.

I can do this by using following syntax for each class
class36- data[data$class==36data$percent70,]
class36a- data[data$class==36data$percent=70,]
but I have 100 different classes. In order to do this for all 100 classes, I
have write that syntax 100 times. There would be some way to do dynamically
to regroup for 100 classes (may be using for loop) but I dont know. Can you
please help in this.
Output should be like
data frame: class36
value   class   percent
15528   36  70.3125

data frame: class36a
value   class   percent
15526   36  4.6875

data frame: class62
15527   62  85.9375

data frame: class62a
15528   62  9.375

data frame: class82
15528   82  74.6875

data frame: class82a
15527   82  32.4564

Thank you very much your help..
P.



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread David Winsemius



On Nov 23, 2012, at 6:32 PM, arun wrote:


Hi David,
Tried the solution on a slightly different data:
dat - read.table(text=
value class percent
15526 36 4.6875
15527 62 85.9375
15527 82 32.4564
15528 36 70.3125
15528 62 9.375
15528 82 74.6875
15529 72 50.
15530 72 50.
, header = TRUE)
dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70,  
class] , ]
 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class,  
function(x) paste0(x, letters[1:length(x)])))

 dat.a
#value class percent
#36a 1552636  4.6875
#36b 1552762 85.9375
#62a 1552782 32.4564
#62b 1552836 70.3125
#82a 1552862  9.3750
#82b 1552882 74.6875



Great. Seems to be working as requested. Er,  What's your point?

--
David.


- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: prasmas prasad...@gmail.com
Cc: r-help@r-project.org
Sent: Friday, November 23, 2012 7:40 PM
Subject: Re: [R] R help..subsetting data frame that meeting multiple  
criteria



On Nov 23, 2012, at 1:14 PM, prasmas wrote:


Hi,
I am new to R.  I am trying to regroup data frame using multiple  
constrains.

for example

data frame: data
  valueclasspercent
15526364.6875
155276285.9375
155278232.4564
155283670.3125
15528629.375
155288274.6875

I need to regroup each class that have greater than or equal to 70  
percent
into new group. Similarly, I also need to regroup each class that  
have less

than 70 percent into new group.

I can do this by using following syntax for each class
class36- data[data$class==36data$percent70,]
class36a- data[data$class==36data$percent=70,]
but I have 100 different classes. In order to do this for all 100  
classes, I
have write that syntax 100 times. There would be some way to do  
dynamically
to regroup for 100 classes (may be using for loop) but I dont know.  
Can you

please help in this.
Output should be like
data frame: class36
valueclasspercent
155283670.3125

data frame: class36a
valueclasspercent
15526364.6875

dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70,  
class] , ]

dat.a

  value class percent
1 1552636  4.6875
2 1552762 85.9375
3 1552782 32.4564
4 1552836 70.3125
5 1552862  9.3750
6 1552882 74.6875

row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class,  
function(x) paste0(x, letters[1:length(x)])))

dat.a

value class percent
36a 1552636  4.6875
36b 1552762 85.9375
62a 1552782 32.4564
62b 1552836 70.3125
82a 1552862  9.3750
82b 1552882 74.6875

You can split by the NROW of dat.a if you want.

--David.


data frame: class62
155276285.9375

data frame: class62a
15528629.375

data frame: class82
155288274.6875

data frame: class82a
155278232.4564

Thank you very much your help..
P.



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread arun

HI,
You could also try this:
dat - read.table(text=
value class percent
15526 36 4.6875
15527 62 85.9375
15527 82 32.4564
15528 36 70.3125
15528 62 9.375
15528 82 74.6875
15529 72 50.
15530 72 50.
, header = TRUE)
mat1-as.matrix(dat)
row.names(mat1)-paste(dat$class,ave(dat$percent,dat$class,FUN=function(x) 
x70),sep=_)
 mat1
# value class percent
#36_0 15526    36  4.6875
#62_1 15527    62 85.9375
#82_0 15527    82 32.4564
#36_1 15528    36 70.3125
#62_0 15528    62  9.3750
#82_1 15528    82 74.6875
#72_0 15529    72 50.
#72_0 15530    72 50.

If you  want to change 0 and 1 to a and b.
row.names(mat1)-gsub(1,b,gsub(0,a,row.names(mat1)))
 row.names(mat1)
#[1] 36_a 62_b 82_a 36_b 62_a 82_b 72_a 72_a

Hope it helps.
A.K.


- Original Message -
From: prasmas prasad...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, November 23, 2012 4:14 PM
Subject: [R] R help..subsetting data frame that meeting multiple criteria

Hi,
I am new to R.  I am trying to regroup data frame using multiple constrains.
for example

data frame: data
  value    class    percent
15526    36    4.6875
15527    62    85.9375
15527    82    32.4564
15528    36    70.3125
15528    62    9.375
15528    82    74.6875

I need to regroup each class that have greater than or equal to 70 percent
into new group. Similarly, I also need to regroup each class that have less
than 70 percent into new group.

I can do this by using following syntax for each class
class36- data[data$class==36data$percent70,]
class36a- data[data$class==36data$percent=70,]
but I have 100 different classes. In order to do this for all 100 classes, I
have write that syntax 100 times. There would be some way to do dynamically
to regroup for 100 classes (may be using for loop) but I dont know. Can you
please help in this. 
Output should be like
data frame: class36
value    class    percent
15528    36    70.3125

data frame: class36a
value    class    percent
15526    36    4.6875

data frame: class62
15527    62    85.9375

data frame: class62a
15528    62    9.375

data frame: class82
15528    82    74.6875

data frame: class82a
15527    82    32.4564

Thank you very much your help..
P.



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help..subsetting data frame that meeting multiple criteria

2012-11-23 Thread arun

Hi David,
Tried the solution on a slightly different data:
dat - read.table(text=
value class percent
15526 36 4.6875
15527 62 85.9375
15527 82 32.4564
15528 36 70.3125
15528 62 9.375
15528 82 74.6875
15529 72 50.
15530 72 50.
, header = TRUE)
dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ]
 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) 
paste0(x, letters[1:length(x)])))
 dat.a
#    value class percent
#36a 15526    36  4.6875
#36b 15527    62 85.9375
#62a 15527    82 32.4564
#62b 15528    36 70.3125
#82a 15528    62  9.3750
#82b 15528    82 74.6875


A.K.




- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: prasmas prasad...@gmail.com
Cc: r-help@r-project.org
Sent: Friday, November 23, 2012 7:40 PM
Subject: Re: [R] R help..subsetting data frame that meeting multiple criteria


On Nov 23, 2012, at 1:14 PM, prasmas wrote:

 Hi,
 I am new to R.  I am trying to regroup data frame using multiple constrains.
 for example
 
 data frame: data
  value    class    percent
 15526    36    4.6875
 15527    62    85.9375
 15527    82    32.4564
 15528    36    70.3125
 15528    62    9.375
 15528    82    74.6875
 
 I need to regroup each class that have greater than or equal to 70 percent
 into new group. Similarly, I also need to regroup each class that have less
 than 70 percent into new group.
 
 I can do this by using following syntax for each class
 class36- data[data$class==36data$percent70,]
 class36a- data[data$class==36data$percent=70,]
 but I have 100 different classes. In order to do this for all 100 classes, I
 have write that syntax 100 times. There would be some way to do dynamically
 to regroup for 100 classes (may be using for loop) but I dont know. Can you
 please help in this.
 Output should be like
 data frame: class36
 value    class    percent
 15528    36    70.3125
 
 data frame: class36a
 value    class    percent
 15526    36    4.6875
 
 dat.a - dat[ dat[[class]] %in% dat[ dat[[percent]] 70, class] , ]
 dat.a
  value class percent
1 15526    36  4.6875
2 15527    62 85.9375
3 15527    82 32.4564
4 15528    36 70.3125
5 15528    62  9.3750
6 15528    82 74.6875

 row.names(dat.a) - unlist(tapply(dat.a$class, dat.a$class, function(x) 
 paste0(x, letters[1:length(x)])))
 dat.a
    value class percent
36a 15526    36  4.6875
36b 15527    62 85.9375
62a 15527    82 32.4564
62b 15528    36 70.3125
82a 15528    62  9.3750
82b 15528    82 74.6875

You can split by the NROW of dat.a if you want.

--David.

 data frame: class62
 15527    62    85.9375
 
 data frame: class62a
 15528    62    9.375
 
 data frame: class82
 15528    82    74.6875
 
 data frame: class82a
 15527    82    32.4564
 
 Thank you very much your help..
 P.
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-help-subsetting-data-frame-that-meeting-multiple-criteria-tp4650601.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r-help or r-devel ?

2012-11-05 Thread Christophe Genolini

Hi the list,

I have some basic questions about writing a package. On which list shall I 
post them? 
Theoretically, I am supposed to post them on r-devel, but all the questions on 
this list are very 
advance questions, not basic ones... So I don't know what to do.

Christophe

-- 
Christophe Genolini
Maître de conférences en bio-statistique
Vice président Communication interne et animation du campus
Université Paris Ouest Nanterre La Défense


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] r-help or r-devel ?

2012-11-05 Thread Marc Schwartz

On Nov 5, 2012, at 11:24 AM, Christophe Genolini cgeno...@u-paris10.fr wrote:

 Hi the list,
 
 I have some basic questions about writing a package. On which list shall I 
 post them? 
 Theoretically, I am supposed to post them on r-devel, but all the questions 
 on this list are very 
 advance questions, not basic ones... So I don't know what to do.
 
 Christophe



In general, queries pertaining to creating R packages should go to R-Devel, 
especially if it involves the use of C/C++/FORTRAN.

That being said, if your queries are basic, please be sure to read through 
Writing R Extensions:

  http://cran.r-project.org/doc/manuals/R-exts.pdf

and consider searching the list archives via:

  http://www.rseek.org

as there is a reasonable chance that your query has already been posted and 
answered in the past.

If you have already done those things, then post to R-Devel, but please do 
subscribe first:

  https://stat.ethz.ch/mailman/listinfo/r-devel

as it will save the R-Devel list moderators from having to manually approve 
each of your posts, therefore making your posting more expedient.

Thanks and regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help - Adding a column in a data frame with multiple conditions

2012-10-04 Thread Libby M Gertken

Hi,

I am trying to add a column of numbers to a data frame in R with multiple
conditions.

Here is a simplified example df:

[A]  [B] [C] [D] [E]
[1] 1 X 90 88
[2] 1 Y 72 70
[3] 1 Z 67 41
[4] 2 X 74 49
[5] 2 Y 42 50
[6] 2 Z 81 56
[7] 3 X 92 59
[8] 3 Y 94 80
[9] 3 Z 80 82

I would like column [E] to have a certain value (found either in [C] or
[D]) based on conditions in columns [A] *and* [B].

E.g. :
if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90)
if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70)
if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67)

if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74)
if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50)
if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81)

and so on.

ATTEMPT TO RESOLVE:

The following code allowed me to add values for column [E] when [A] ==1,
but I can't figure out how to keep the code going in order to get a value
for column [E] based on all of the numbers in column [A] and the secondary
condition for [B] ([A] goes from 1:48).

df$[E] -


ifelse((df$A == 1)  (df$B == X), df$C[df$A == 1],

ifelse((df$A == 1)  (df$B == Y), df$D[df$A == 1],

ifelse((df$A == 1)  (df$B == Z),  df$C[df$A == 1],

NA


Thank you for any advice you can give.


Libby G

libbymg[at]utexas.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help - Adding a column in a data frame with multiple conditions

2012-10-04 Thread Sarah Goslee

Hi Libby,

You had an accumulation of small errors, from an extra ) to an unclear
understanding of how indexing works in R. Also, you shouldn't call
your dataframe df, or use square brackets in column names.

That said, what about:


sampledata - structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), B = c(X,
Y, Z, X, Y, Z, X, Y, Z), C = c(90L, 72L, 67L,
74L, 42L, 81L, 92L, 94L, 80L), D = c(88L, 70L, 41L, 49L, 50L,
56L, 59L, 80L, 82L)), .Names = c(A, B, C, D), class =
data.frame, row.names = c(NA,
-9L))

E -
ifelse((sampledata$A == 1)  (sampledata$B == X), sampledata$C,
ifelse((sampledata$A == 1)  (sampledata$B == Y), sampledata$D,
ifelse((sampledata$A == 1)  (sampledata$B == Z),  sampledata$C, NA)))

sampledata - data.frame(sampledata, E)

Sarah

On Thu, Oct 4, 2012 at 2:47 PM, Libby M Gertken libb...@utexas.edu wrote:
 Hi,

 I am trying to add a column of numbers to a data frame in R with multiple
 conditions.

 Here is a simplified example df:

 [A]  [B] [C] [D] [E]
 [1] 1 X 90 88
 [2] 1 Y 72 70
 [3] 1 Z 67 41
 [4] 2 X 74 49
 [5] 2 Y 42 50
 [6] 2 Z 81 56
 [7] 3 X 92 59
 [8] 3 Y 94 80
 [9] 3 Z 80 82

 I would like column [E] to have a certain value (found either in [C] or
 [D]) based on conditions in columns [A] *and* [B].

 E.g. :
 if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90)
 if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70)
 if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67)

 if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74)
 if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50)
 if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81)

 and so on.

 ATTEMPT TO RESOLVE:

 The following code allowed me to add values for column [E] when [A] ==1,
 but I can't figure out how to keep the code going in order to get a value
 for column [E] based on all of the numbers in column [A] and the secondary
 condition for [B] ([A] goes from 1:48).

 df$[E] -


 ifelse((df$A == 1)  (df$B == X), df$C[df$A == 1],

 ifelse((df$A == 1)  (df$B == Y), df$D[df$A == 1],

 ifelse((df$A == 1)  (df$B == Z),  df$C[df$A == 1],

 NA


 Thank you for any advice you can give.


 Libby G

 libbymg[at]utexas.edu

 [[alternative HTML version deleted]]

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help - Adding a column in a data frame with multiple conditions

2012-10-04 Thread arun

Hi,
By extending Sarah's solution to the whole dataset:
dat1-read.table(text=
 A B C  D
 1 X 90 88
 1 Y 72 70
 1 Z 67 41
 2 X 74 49
 2 Y 42 50
 2 Z 81 56
 3 X 92 59
 3 Y 94 80
 3 Z 80 82
,sep=,header=TRUE,stringsAsFactors=FALSE) 

dat1$E-unlist(lapply(split(dat1,dat1$A),function(x) 
ifelse(x$B==X|x$B==Z,x$C,ifelse(x$B==Y,x$D,NA


 dat1
#  A B  C  D  E
#1 1 X 90 88 90
#2 1 Y 72 70 70
#3 1 Z 67 41 67
#4 2 X 74 49 74
#5 2 Y 42 50 50
#6 2 Z 81 56 81
#7 3 X 92 59 92
#8 3 Y 94 80 80
#9 3 Z 80 82 80
A.K.




- Original Message -
From: Sarah Goslee sarah.gos...@gmail.com
To: Libby M Gertken libb...@utexas.edu
Cc: r-help@r-project.org
Sent: Thursday, October 4, 2012 4:03 PM
Subject: Re: [R] R help - Adding a column in a data frame with multiple 
conditions

Hi Libby,

You had an accumulation of small errors, from an extra ) to an unclear
understanding of how indexing works in R. Also, you shouldn't call
your dataframe df, or use square brackets in column names.

That said, what about:


sampledata - structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), B = c(X,
Y, Z, X, Y, Z, X, Y, Z), C = c(90L, 72L, 67L,
74L, 42L, 81L, 92L, 94L, 80L), D = c(88L, 70L, 41L, 49L, 50L,
56L, 59L, 80L, 82L)), .Names = c(A, B, C, D), class =
data.frame, row.names = c(NA,
-9L))

E -
ifelse((sampledata$A == 1)  (sampledata$B == X), sampledata$C,
ifelse((sampledata$A == 1)  (sampledata$B == Y), sampledata$D,
ifelse((sampledata$A == 1)  (sampledata$B == Z),  sampledata$C, NA)))

sampledata - data.frame(sampledata, E)

Sarah

On Thu, Oct 4, 2012 at 2:47 PM, Libby M Gertken libb...@utexas.edu wrote:
 Hi,

 I am trying to add a column of numbers to a data frame in R with multiple
 conditions.

 Here is a simplified example df:

 [A]  [B] [C] [D] [E]
 [1] 1 X 90 88
 [2] 1 Y 72 70
 [3] 1 Z 67 41
 [4] 2 X 74 49
 [5] 2 Y 42 50
 [6] 2 Z 81 56
 [7] 3 X 92 59
 [8] 3 Y 94 80
 [9] 3 Z 80 82

 I would like column [E] to have a certain value (found either in [C] or
 [D]) based on conditions in columns [A] *and* [B].

 E.g. :
 if [A] = 1 and [B] = X, then [E] = the entry in [C] for that row (i.e., 90)
 if [A] = 1 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 70)
 if [A] = 1 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 67)

 if [A] = 2 and [B] = X, then [E] = the entry in [C] for that row (i.e., 74)
 if [A] = 2 and [B] = Y, then [E] = the entry in [D] for that row (i.e., 50)
 if [A] = 2 and [B] = Z, then [E] = the entry in [C] for that row (i.e., 81)

 and so on.

 ATTEMPT TO RESOLVE:

 The following code allowed me to add values for column [E] when [A] ==1,
 but I can't figure out how to keep the code going in order to get a value
 for column [E] based on all of the numbers in column [A] and the secondary
 condition for [B] ([A] goes from 1:48).

 df$[E] -


 ifelse((df$A == 1)  (df$B == X), df$C[df$A == 1],

 ifelse((df$A == 1)  (df$B == Y), df$D[df$A == 1],

 ifelse((df$A == 1)  (df$B == Z),  df$C[df$A == 1],

 NA


 Thank you for any advice you can give.


 Libby G

 libbymg[at]utexas.edu

         [[alternative HTML version deleted]]

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 115, Issue 12

2012-09-12 Thread Stephen Politzer-Ahles

Hello Amelie,

I don't have an answer to your question, but I just wanted to point out
this page I noticed recently (
http://hlplab.wordpress.com/2009/05/07/multinomial-random-effects-models-in-r/),
which might be helpful.

I'm also interested in figuring out how to do a multinomial glmm, so if you
find out anything I'd be happy to hear more about it! Based on what I've
found so far it seems like doing a multinomial glmm is not very
straightforward, unfortunately...

Best,
Steve Politzer-Ahles


Message: 94
 Date: Wed, 12 Sep 2012 09:47:12 +0200
 From: Vaniscotte vaname...@gmail.com
 To: r-help@r-project.org
 Subject: [R] multinomial MCMCglmm
 Message-ID: 50503e00.4060...@gmail.com
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 Dear all,

 I would like to add mixed effects in a multinomial model and I am trying
 to use MCMCglmm for that.

 The main problem I face: my data set consits of a trapping data set,
 where the observation at eah trap (1 or 0 for each species) have been
 aggregated per traplines. Therefore we have a proportion of
 presence/absence for each species per trapline.

 ex:
ID_line mesh habitat Apsy Mygl Crle Crru Miag Miar Mimi Mumu Misu
 Soar Somi
 11  028S6A   28   copse200000000
 00
 12  028S6B   28   copse110000000
 00
 13  028S6C   28   hedge200400000
 00
 14  028S6D   28   hedge100700001
 00
 15  028S6E   28   hedge700100000
 00
empty
 1128
 1228
 1324
 1421
 1522

 When I run the following:

   test1 -

 MCMCglmm(fixed=cbind(Apsy,Mygl,Crle,Crru,Miag,Miar,Mimi,Mumu,Misu,Soar,Somi,empty)~habitat,random=~mesh,family=multinomial12,data=metalSmA[,c(2,9,23:34)],rcov=~us(trait):units)

 I got some error when running regarding the variance structure:

   ill-conditioned G/R structure: use proper priors if you haven't or
 rescale data if you have

 I guess that the problem comes from the nature of my observation whih
 are frequencies rather than 0/1 per unit

 Does someone know if a multinomial model fitted with MCMCglmm can
 handdle those frequencies table and how to specify the good G/R variance
 structures?


 Regards

 Am?lie Vaniscotte


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help question

2012-08-13 Thread Louise Cowpertwait

Hi there,

I have subscribed to R-help but am not sure how to view or post questions? I 
think this is the right way.

I am planning on doing a multivariate regression investigating the relationship 
between depression (a continuous variable) and social support variables (mostly 
continuous, some categorical) among older people. I have a number of 
demographic and health-related variables that I am including as control 
variables. I have a large dataset from nearly 4,000 individuals. 

I need to check whether my data is 1) Missing at Random (MAR) and 2) Missing 
Completely At Random (MCAR).

Here are three questions that I have related to this:


1) To check whether the data is MAR, I dichotomised a variable into missing and 
not missing, and checked for any significant differences in means (for 
continuous) or proportions (for categorical) of the other variables. I did this 
for each of the variables in my analysis. Is this correct?

2) Because of the size of my dataset, relationships for my MAR analysis are 
coming up as significant when, practically, the differences in means or 
proportions are not meaningful. Is it acceptable for me to argue as such, and 
say that the data is effectively MAR despite statistical significance?

Sorry this is not a question specifically to R (more of a stats question) so no 
problem if no-one can help, though it would be greatly appreciated.

3) I have no idea how to check whether the data is Missing Completely At Random 
in R. I think this involves seeing whether those who had missing data for one 
variable were more likely to have missing data in other variables? If so, I 
don't know how to do this. Or, I need to do an overall test like Little's test 
of missing completely at random. I have spent ages looking online and at 
packages and can't find anything.

Please help! I don't want to use SPSS!

Cheers,

Louise





Louise Cowpertwait
louisecowpertw...@gmail.com
021 258 9795 
Auckland, NZ




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help question

2012-08-13 Thread R. Michael Weylandt

On Sun, Aug 12, 2012 at 10:58 PM, Louise Cowpertwait
louisecowpertw...@gmail.com wrote:
 Hi there,

 I have subscribed to R-help but am not sure how to view or post questions? I 
 think this is the right way.

Indeed!


 I am planning on doing a multivariate regression investigating the 
 relationship between depression (a continuous variable) and social support 
 variables (mostly continuous, some categorical) among older people. I have a 
 number of demographic and health-related variables that I am including as 
 control variables. I have a large dataset from nearly 4,000 individuals.

 I need to check whether my data is 1) Missing at Random (MAR) and 2) Missing 
 Completely At Random (MCAR).

 Here are three questions that I have related to this:


 1) To check whether the data is MAR, I dichotomised a variable into missing 
 and not missing, and checked for any significant differences in means (for 
 continuous) or proportions (for categorical) of the other variables. I did 
 this for each of the variables in my analysis. Is this correct?

Something like a classic chi-sq test or small sample analogue? Seems
somewhat reasonable, though you'll have to worry about making sure you
don't get tripped up by doing too many tests. See ?p.adjust.methods
for some references.


 2) Because of the size of my dataset, relationships for my MAR analysis are 
 coming up as significant when, practically, the differences in means or 
 proportions are not meaningful. Is it acceptable for me to argue as such, and 
 say that the data is effectively MAR despite statistical significance?

I'm not sure there is a statistical answer to that: it's going to
depend much more on the nature of your data set. Let your meta
knowledge of the source of missingness guide things here.


 Sorry this is not a question specifically to R (more of a stats question) so 
 no problem if no-one can help, though it would be greatly appreciated.

 3) I have no idea how to check whether the data is Missing Completely At 
 Random in R. I think this involves seeing whether those who had missing data 
 for one variable were more likely to have missing data in other variables? If 
 so, I don't know how to do this. Or, I need to do an overall test like 
 Little's test of missing completely at random. I have spent ages looking 
 online and at packages and can't find anything.

You might want to check the rms package and the accompanying book
(Regression Modeling Strategies) by Frank Harrell. It has the best
coverage of MAR/MCAR/Imputation/etc that I've read on a practical
basis.

Cheers,
Michael


 Please help! I don't want to use SPSS!

 Cheers,

 Louise





 Louise Cowpertwait
 louisecowpertw...@gmail.com
 021 258 9795
 Auckland, NZ




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help

2012-08-08 Thread chong hui qi

Hi 

 

I'm trying to increase the memory limit but Im facing this problem 

Error: unexpected symbol in C:\\Program
Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M 

 

Hope you can help me out 

Thank You 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-08-08 Thread Jeff Newmiller

There seem to be too many quotes in your error message... the path and name of 
the program should be quoted, the argument should not (or should be separately 
quoted).

500M isn't especially large these days... be sure to check your existing 
(default) value using the memory.limit() function before changing it this way.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

chong hui qi imgreee...@gmail.com wrote:

Hi 

 

I'm trying to increase the memory limit but Im facing this problem 

Error: unexpected symbol in C:\\Program
Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M 

 

Hope you can help me out 

Thank You 

 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-08-08 Thread Jeff Newmiller

Please continue to include the r-help list in the conversation... you will most 
likely get your solution quicker.

Why do you have double backslashes? That is needed within R, but not at the 
command line or in a shortcut.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Hui Qi imgreee...@gmail.com wrote:

Hi

Thanks for ur help


 Error: unexpected symbol in C:\\Program
 Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe   --max-mem-size=1014M

I tried agn but still obtain this



On 9 Aug, 2012, at 12:37 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 Error: unexpected symbol in C:\\Program
 Files\\R\\R-2.15.0\\bin\\i386\\Rgui.exe --max-mem-size=500M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Help xts object Subset Date by Day of the Week

2012-08-06 Thread R. Michael Weylandt

On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz
doug...@sigmamonster.com wrote:
 I have a xts object made of daily closing prices I have acquired using
 quantmod.



 Here is my code:

 library(xts)

 library(quantmod)

 library(lubridate)



 # Gets SPY data

 getSymbols(SPY)

 # Subset Prices to just closing price

 SP500 - Cl(SPY)

 # Show day of the week for each date using 2-6 for monday-friday

 SP500wd - wday(SP500)

 # Add Price and days of week together

 SP500wd - cbind(SP500, SP500wd)

 # subset Monday into one xts object

 SPmon - subset(SP500wd, SP500wd$..2==2)





 I then used the package lubridate to show the days of the week.   Due to the
 requirement of an xts objects to be numeric you will see each day is
 represented as a number so that Monday is =2, Tuesday=3, Wednesday=4,
 Thursday=5, Friday=6, Saturday=7.   Since this is a financial index you will
 only see the numbers 2-6 or Monday-Friday.

 I want to subset the data by using the day column.  I would like some help
 to figure out the best way to accomplish a few objectives.

 1.   Subset the data so that I only show Monday in sequence.  However, I
 do want to make sure that it shows the date, price and the ..2 colum(which
 is the day of week) after Sub setting the data  (I have it done but not sure
 if it is the best way)


I think what you do works, this might also be a one liner:

SPY[format(index(SPY), %a) == Mon, ]

Alternatively

split.default(SPY, format(index(SPY), %a))

creates a list of xts objects split by day of the week (Note you need
split.default here because split.xts does something different)


 2.   Rearrange the object (hopefully without destroying the xts object)
 so that my data lines up like a weekly calendar.   So it would look like the
 follow.

Unfortunately, your formatting got all chewed up by the R-help server,
which doesn't like HTML so I'm not quite sure what you want here.

Possibly some black magic like this?

SPY.CL - Cl(SPY)

length(SPY.CL) - 7*floor(length(SPY.CL)/7)

dim(SPY.CL) - c(length(SPY.CL)/7, 7)

But note that this looses time stamps because each row can only have a
single time stamp.

You might also try

to.weekly()

Cheers,

Michael







 Long Date Monday

 Monday Price

 Monday Day Index

 Long Date Tuesday

 Tuesday Price

 Tuesday Day Index

 Long Date Wednesday

 Wednesday Price

 Wednesday Index

 Long Date Thursday

 Thursday Price

 Thursday Index

 Friday

 Friday Price

 Friday Index


 1/5/2009

 92.85

 2

 1/6/2009

 93.47

 3

 1/7/2009

 90.67

 4

 1/8/2009

 84.4

 5

 1/9/2009

 89.09

 6


 1/12/2009

 86.95

 2

 1/13/2009

 87.11

 3

 1/14/2009

 84.37

 4

 1/15/2009

 91.04

 5

 1/16/2009

 85.06

 6


 MLK Mondy

 MLK Monday

 MLK Monday

 1/20/2009

 80.57

 3

 1/21/2009

 84.05

 4

 1/22/2009

 82.75

 5

 1/23/2009

 83.11

 6


 1/26/2009

 83.68

 2

 1/27/2009

 84.53

 3

 1/28/2009

 87.39

 4

 1/29/2009

 84.55

 5

 1/30/2009

 82.83

 6










































 Thank you,

 Douglas


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Help xts object Subset Date by Day of the Week

2012-08-06 Thread R. Michael Weylandt

On Mon, Aug 6, 2012 at 4:30 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz
 doug...@sigmamonster.com wrote:
 I have a xts object made of daily closing prices I have acquired using
 quantmod.



 Here is my code:

 library(xts)

 library(quantmod)

 library(lubridate)



 # Gets SPY data

 getSymbols(SPY)

 # Subset Prices to just closing price

 SP500 - Cl(SPY)

 # Show day of the week for each date using 2-6 for monday-friday

 SP500wd - wday(SP500)

 # Add Price and days of week together

 SP500wd - cbind(SP500, SP500wd)

 # subset Monday into one xts object

 SPmon - subset(SP500wd, SP500wd$..2==2)





 I then used the package lubridate to show the days of the week.   Due to the
 requirement of an xts objects to be numeric you will see each day is
 represented as a number so that Monday is =2, Tuesday=3, Wednesday=4,
 Thursday=5, Friday=6, Saturday=7.   Since this is a financial index you will
 only see the numbers 2-6 or Monday-Friday.

 I want to subset the data by using the day column.  I would like some help
 to figure out the best way to accomplish a few objectives.

 1.   Subset the data so that I only show Monday in sequence.  However, I
 do want to make sure that it shows the date, price and the ..2 colum(which
 is the day of week) after Sub setting the data  (I have it done but not sure
 if it is the best way)


 I think what you do works, this might also be a one liner:

 SPY[format(index(SPY), %a) == Mon, ]

 Alternatively

 split.default(SPY, format(index(SPY), %a))

 creates a list of xts objects split by day of the week (Note you need
 split.default here because split.xts does something different)


 2.   Rearrange the object (hopefully without destroying the xts object)
 so that my data lines up like a weekly calendar.   So it would look like the
 follow.

 Unfortunately, your formatting got all chewed up by the R-help server,
 which doesn't like HTML so I'm not quite sure what you want here.

 Possibly some black magic like this?

 SPY.CL - Cl(SPY)

 length(SPY.CL) - 7*floor(length(SPY.CL)/7)

 dim(SPY.CL) - c(length(SPY.CL)/7, 7)

 But note that this looses time stamps because each row can only have a
 single time stamp.

To clarify that's not _why_ that looses the time-stamps (and
xts-ness) but just that it does happen. Technically, it's because
dim-.xts doesn't exist; the reason it doesn't (I'd imagine) is
because of the time stamp thing.

M


 You might also try

 to.weekly()

 Cheers,

 Michael







 Long Date Monday

 Monday Price

 Monday Day Index

 Long Date Tuesday

 Tuesday Price

 Tuesday Day Index

 Long Date Wednesday

 Wednesday Price

 Wednesday Index

 Long Date Thursday

 Thursday Price

 Thursday Index

 Friday

 Friday Price

 Friday Index


 1/5/2009

 92.85

 2

 1/6/2009

 93.47

 3

 1/7/2009

 90.67

 4

 1/8/2009

 84.4

 5

 1/9/2009

 89.09

 6


 1/12/2009

 86.95

 2

 1/13/2009

 87.11

 3

 1/14/2009

 84.37

 4

 1/15/2009

 91.04

 5

 1/16/2009

 85.06

 6


 MLK Mondy

 MLK Monday

 MLK Monday

 1/20/2009

 80.57

 3

 1/21/2009

 84.05

 4

 1/22/2009

 82.75

 5

 1/23/2009

 83.11

 6


 1/26/2009

 83.68

 2

 1/27/2009

 84.53

 3

 1/28/2009

 87.39

 4

 1/29/2009

 84.55

 5

 1/30/2009

 82.83

 6










































 Thank you,

 Douglas


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: Help xts object Subset Date by Day of the Week

2012-08-05 Thread Douglas Karabasz

I have a xts object made of daily closing prices I have acquired using
quantmod.

 

Here is my code:

library(xts)

library(quantmod)

library(lubridate)

 

# Gets SPY data

getSymbols(SPY)

# Subset Prices to just closing price

SP500 - Cl(SPY)

# Show day of the week for each date using 2-6 for monday-friday

SP500wd - wday(SP500)

# Add Price and days of week together

SP500wd - cbind(SP500, SP500wd)

# subset Monday into one xts object

SPmon - subset(SP500wd, SP500wd$..2==2)

 

 

I then used the package lubridate to show the days of the week.   Due to the
requirement of an xts objects to be numeric you will see each day is
represented as a number so that Monday is =2, Tuesday=3, Wednesday=4,
Thursday=5, Friday=6, Saturday=7.   Since this is a financial index you will
only see the numbers 2-6 or Monday-Friday.

I want to subset the data by using the day column.  I would like some help
to figure out the best way to accomplish a few objectives.  

1.   Subset the data so that I only show Monday in sequence.  However, I
do want to make sure that it shows the date, price and the ..2 colum(which
is the day of week) after Sub setting the data  (I have it done but not sure
if it is the best way)

2.   Rearrange the object (hopefully without destroying the xts object)
so that my data lines up like a weekly calendar.   So it would look like the
follow.  

   


Long Date Monday

Monday Price

Monday Day Index

Long Date Tuesday 

Tuesday Price

Tuesday Day Index

Long Date Wednesday

Wednesday Price

Wednesday Index

Long Date Thursday

Thursday Price

Thursday Index

Friday

Friday Price

Friday Index


1/5/2009

92.85

2

1/6/2009

93.47

3

1/7/2009

90.67

4

1/8/2009

84.4

5

1/9/2009

89.09

6


1/12/2009

86.95

2

1/13/2009

87.11

3

1/14/2009

84.37

4

1/15/2009

91.04

5

1/16/2009

85.06

6


MLK Mondy

MLK Monday

MLK Monday

1/20/2009

80.57

3

1/21/2009

84.05

4

1/22/2009

82.75

5

1/23/2009

83.11

6


1/26/2009

83.68

2

1/27/2009

84.53

3

1/28/2009

87.39

4

1/29/2009

84.55

5

1/30/2009

82.83

6


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thank you,

Douglas


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R- Help (looping)

2012-07-31 Thread Rui Barradas


Hello,

Inline
Em 31-07-2012 02:59, Wellington Silva escreveu:

Ok,

This really helped.

You've probably noticed, I'm a begginer using R.

And when you said that you tried with rnorm and the counts were not zero,
where did you use the rnorm?


Where the runif is, in its stead use
matrix( rnorm(nc*dds), ...etc... )

But note that this is fake data, what is the distribution of your 
dataset? Or is the purpose of this simulations? If so, there must also 
be a distribution to replicate, no?


Rui Barradas


The point is, I need this count to be different from zero, so I can use the
ARL later on.

I need how many values are out of control (for each column /variable),
store these values in a vector, calculate this vector's mean, and then
calculate the ARL.

I've got to the same point as you did, but I'm stuck in the same problem. I
need this count do be different from zero.

I'm still learning, haha, please be patient.

Best Regards.

2012/7/30 Rui Barradas ruipbarra...@sapo.pt


Hello,

Try the following.

# make up some data
dds - 1e3
nc - 10
xss - data.frame(matrix(runif(nc***dds, min=-1, max=1), ncol=nc))
names(xss) - paste0(xs, 1:10)

# two functions with descriptive names
getControlLimits - function(x, L = 3){
 mu - mean(x)
 sigma - sd(x)
 c(lcl = mu - L*sigma, ucl = mu + L*sigma)
}

countOutOfControl - function(x, L = 3){
 cl - getControlLimits(x, L = L)
 sum( x  cl[lcl] | x  cl[ucl] )
}

sapply(xss, getControlLimits)# To know why the next returns all zeros
sapply(xss, countOutOfControl)# No values outside control limits


The data comes from an uniform in the interval (-1, 1) therefore the sd is
sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below and
above -1 and 1, respectively.  (I've tried rnorm and the counts were not
zero.)
But this is just a data example to see if the code works, and it does.


Hope this helps,

Rui Barradas

Em 30-07-2012 22:04, Wellington Silva escreveu:


Ok Rui,

I'll try to explain myself a little bit better.

I have 10 columns, with 1000 rows each. (Each column represents a variable
of the process)

these columns were simulated using runif, with values between 4 and 6.

I need to calculate the control limits for each column (variable), and
verify if they are within the bounds accepted.

For now, I'm reading these columns separately, I'm creating variables to
read only one of the each column, and then check if the values in this
column is within the limits or not.

PS. All these columns originally come from a dataframe. something like
this: (dds = 1000)

xs1-runif(dds, min=-1, max=1)
xs2-runif(dds, min=-1, max=1)
xs3-runif(dds, min=-1, max=1)
xs4-runif(dds, min=-1, max=1)
xs5-runif(dds, min=-1, max=1)
...
xs10-runif(dds, min=-1,max=1)
xs-data.frame(xs1,xs2,xs3,**xs4,xs5,..xs10)

__**___


#Reading columns (first one)
c.n1 - 'xs1'
c1 - with(xss, get(c.n1))
#Mean
mc1 - mean (c1)
#Standard deviation
sdc1 - sd(c1)
#Control Limits
L=3
uclc1 = mc1 + L*sdc1
lclc1 = mc1 - L*sdc1


I need an if inside a loop to do the following:

for (each column)

if (value in c1  lclc1 | value in c1  uclc1) {Count how many values are
outside the bounds}

And I need this process to be repeated to all columns.




2012/7/30 Rui Barradas ruipbarra...@sapo.pt

  Hello,

Your code example doesn't make much sense, it needs decrypting. Let's
see,

for(k in 1:length(c1)){
  if(c1[k]  lcl | c1[k]  ucl) {do something}
}

If this is it, then you can completely avoid the loop:

i1 - c1  lcl | c1  ucl   # create an index vector
out.of.control - c1[ i1 ]# save the values

When you say you have a column, is it a matrix column? data.frame?
Give a data example with

dput( head(myData, 20) )   # paste the output of this in a post

Hope this helps,

Rui Barradas

Em 29-07-2012 21:59, Wellington G. Silva escreveu:

  Hi,


I'm Wellington from Brazil and I have the following issue:


I've been working on a project a for a while, and I'm having trouble in
using the loop (for)


I need to read a column (c1), and for each value of this column, I need
to
check if it's within the control limits


So, I was trying to do this:


For (k in 1: c1)


If (c1 lcl1 | c1  ucl1) {here I need to store the values outside the
limits)


I have 5 columns, need to do the same process in each one of them.


And later on I'm gonna concatenate these 5 vectors and calculate its
mean
for an ARL project.






  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help
https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html 
http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] R- Help (looping)

2012-07-31 Thread Rui Barradas


Hello,

Glad it helped.
E fico à espera do relatório.

Rui Barradas
Em 31-07-2012 14:40, Wellington Silva escreveu:

Rui,

This is a cientific initiation program.

The idea is to develop a code which can simulate data and calculate the ARL
later.

*So, a little bit later yesterday night, after sending the email, I've
figuered how to use the rnorm and then my problems were gone.

Ok, we were working with normal distribution, that's why I needed to use
rnorm. These simulations have just one purpose: calculate ARL at the end of
the project.

And, I want to thank you for the help, thank you so much, you've really
helped me.

As soon as I'm done with the final report, at the special thanks page,
I'm gonna put your name Haha. Because, your help was enormous.

Thanks again, I'll send you a copy of the final report as soon as I finish
it if you wish.

2012/7/31 Rui Barradas ruipbarra...@sapo.pt


Hello,

Inline
Em 31-07-2012 02:59, Wellington Silva escreveu:

  Ok,

This really helped.

You've probably noticed, I'm a begginer using R.

And when you said that you tried with rnorm and the counts were not zero,
where did you use the rnorm?


Where the runif is, in its stead use
matrix( rnorm(nc*dds), ...etc... )

But note that this is fake data, what is the distribution of your dataset?
Or is the purpose of this simulations? If so, there must also be a
distribution to replicate, no?

Rui Barradas


The point is, I need this count to be different from zero, so I can use
the
ARL later on.

I need how many values are out of control (for each column /variable),
store these values in a vector, calculate this vector's mean, and then
calculate the ARL.

I've got to the same point as you did, but I'm stuck in the same problem.
I
need this count do be different from zero.

I'm still learning, haha, please be patient.

Best Regards.

2012/7/30 Rui Barradas ruipbarra...@sapo.pt

  Hello,

Try the following.

# make up some data
dds - 1e3
nc - 10
xss - data.frame(matrix(runif(nc*dds, min=-1, max=1), ncol=nc))

names(xss) - paste0(xs, 1:10)

# two functions with descriptive names
getControlLimits - function(x, L = 3){
  mu - mean(x)
  sigma - sd(x)
  c(lcl = mu - L*sigma, ucl = mu + L*sigma)
}

countOutOfControl - function(x, L = 3){
  cl - getControlLimits(x, L = L)
  sum( x  cl[lcl] | x  cl[ucl] )
}

sapply(xss, getControlLimits)# To know why the next returns all zeros
sapply(xss, countOutOfControl)# No values outside control limits


The data comes from an uniform in the interval (-1, 1) therefore the sd
is
sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below and
above -1 and 1, respectively.  (I've tried rnorm and the counts were not
zero.)
But this is just a data example to see if the code works, and it does.


Hope this helps,

Rui Barradas

Em 30-07-2012 22:04, Wellington Silva escreveu:

  Ok Rui,

I'll try to explain myself a little bit better.

I have 10 columns, with 1000 rows each. (Each column represents a
variable
of the process)

these columns were simulated using runif, with values between 4 and 6.

I need to calculate the control limits for each column (variable), and
verify if they are within the bounds accepted.

For now, I'm reading these columns separately, I'm creating variables to
read only one of the each column, and then check if the values in this
column is within the limits or not.

PS. All these columns originally come from a dataframe. something like
this: (dds = 1000)

xs1-runif(dds, min=-1, max=1)
xs2-runif(dds, min=-1, max=1)
xs3-runif(dds, min=-1, max=1)
xs4-runif(dds, min=-1, max=1)
xs5-runif(dds, min=-1, max=1)
...
xs10-runif(dds, min=-1,max=1)
xs-data.frame(xs1,xs2,xs3,xs4,xs5,..xs10)

_____



#Reading columns (first one)
c.n1 - 'xs1'
c1 - with(xss, get(c.n1))
#Mean
mc1 - mean (c1)
#Standard deviation
sdc1 - sd(c1)
#Control Limits
L=3
uclc1 = mc1 + L*sdc1
lclc1 = mc1 - L*sdc1


I need an if inside a loop to do the following:

for (each column)

if (value in c1  lclc1 | value in c1  uclc1) {Count how many values
are
outside the bounds}

And I need this process to be repeated to all columns.




2012/7/30 Rui Barradas ruipbarra...@sapo.pt

   Hello,


Your code example doesn't make much sense, it needs decrypting. Let's
see,

for(k in 1:length(c1)){
   if(c1[k]  lcl | c1[k]  ucl) {do something}
}

If this is it, then you can completely avoid the loop:

i1 - c1  lcl | c1  ucl   # create an index vector
out.of.control - c1[ i1 ]# save the values

When you say you have a column, is it a matrix column? data.frame?
Give a data example with

dput( head(myData, 20) )   # paste the output of this in a post

Hope this helps,

Rui Barradas

Em 29-07-2012 21:59, Wellington G. Silva escreveu:

   Hi,


I'm Wellington from Brazil and I have the following issue:


I've been working on a project a for a while, and I'm having trouble
in
using the loop (for)


I need to read a column (c1), and for each

[R] R- Help (looping)

2012-07-30 Thread Wellington G. Silva

Hi,

 

I'm Wellington from Brazil and I have the following issue:

 

I've been working on a project a for a while, and I'm having trouble in
using the loop (for) 

 

I need to read a column (c1), and for each value of this column, I need to
check if it's within the control limits

 

So, I was trying to do this:

 

For (k in 1: c1)

 

If (c1 lcl1 | c1  ucl1) {here I need to store the values outside the
limits)

 

I have 5 columns, need to do the same process in each one of them.

 

And later on I'm gonna concatenate these 5 vectors and calculate its mean
for an ARL project.

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R- Help (looping)

2012-07-30 Thread Rui Barradas


Hello,

Your code example doesn't make much sense, it needs decrypting. Let's see,

for(k in 1:length(c1)){
if(c1[k]  lcl | c1[k]  ucl) {do something}
}

If this is it, then you can completely avoid the loop:

i1 - c1  lcl | c1  ucl   # create an index vector
out.of.control - c1[ i1 ]# save the values

When you say you have a column, is it a matrix column? data.frame?
Give a data example with

dput( head(myData, 20) )   # paste the output of this in a post

Hope this helps,

Rui Barradas

Em 29-07-2012 21:59, Wellington G. Silva escreveu:

Hi,

  


I'm Wellington from Brazil and I have the following issue:

  


I've been working on a project a for a while, and I'm having trouble in
using the loop (for)

  


I need to read a column (c1), and for each value of this column, I need to
check if it's within the control limits

  


So, I was trying to do this:

  


For (k in 1: c1)

  


If (c1 lcl1 | c1  ucl1) {here I need to store the values outside the
limits)

  


I have 5 columns, need to do the same process in each one of them.

  


And later on I'm gonna concatenate these 5 vectors and calculate its mean
for an ARL project.

  

  

  

  



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R- Help (looping)

2012-07-30 Thread Rui Barradas


Hello,

Try the following.

# make up some data
dds - 1e3
nc - 10
xss - data.frame(matrix(runif(nc*dds, min=-1, max=1), ncol=nc))
names(xss) - paste0(xs, 1:10)

# two functions with descriptive names
getControlLimits - function(x, L = 3){
mu - mean(x)
sigma - sd(x)
c(lcl = mu - L*sigma, ucl = mu + L*sigma)
}

countOutOfControl - function(x, L = 3){
cl - getControlLimits(x, L = L)
sum( x  cl[lcl] | x  cl[ucl] )
}

sapply(xss, getControlLimits)# To know why the next returns all zeros
sapply(xss, countOutOfControl)# No values outside control limits


The data comes from an uniform in the interval (-1, 1) therefore the sd 
is sqrt(1/3) = 0.577. Times 3 gives values for lcl and ucl clearly below 
and above -1 and 1, respectively.  (I've tried rnorm and the counts were 
not zero.)

But this is just a data example to see if the code works, and it does.

Hope this helps,

Rui Barradas

Em 30-07-2012 22:04, Wellington Silva escreveu:

Ok Rui,

I'll try to explain myself a little bit better.

I have 10 columns, with 1000 rows each. (Each column represents a variable
of the process)

these columns were simulated using runif, with values between 4 and 6.

I need to calculate the control limits for each column (variable), and
verify if they are within the bounds accepted.

For now, I'm reading these columns separately, I'm creating variables to
read only one of the each column, and then check if the values in this
column is within the limits or not.

PS. All these columns originally come from a dataframe. something like
this: (dds = 1000)

xs1-runif(dds, min=-1, max=1)
xs2-runif(dds, min=-1, max=1)
xs3-runif(dds, min=-1, max=1)
xs4-runif(dds, min=-1, max=1)
xs5-runif(dds, min=-1, max=1)
...
xs10-runif(dds, min=-1,max=1)
xs-data.frame(xs1,xs2,xs3,xs4,xs5,..xs10)

_


#Reading columns (first one)
c.n1 - 'xs1'
c1 - with(xss, get(c.n1))
#Mean
mc1 - mean (c1)
#Standard deviation
sdc1 - sd(c1)
#Control Limits
L=3
uclc1 = mc1 + L*sdc1
lclc1 = mc1 - L*sdc1


I need an if inside a loop to do the following:

for (each column)

if (value in c1  lclc1 | value in c1  uclc1) {Count how many values are
outside the bounds}

And I need this process to be repeated to all columns.




2012/7/30 Rui Barradas ruipbarra...@sapo.pt


Hello,

Your code example doesn't make much sense, it needs decrypting. Let's see,

for(k in 1:length(c1)){
 if(c1[k]  lcl | c1[k]  ucl) {do something}
}

If this is it, then you can completely avoid the loop:

i1 - c1  lcl | c1  ucl   # create an index vector
out.of.control - c1[ i1 ]# save the values

When you say you have a column, is it a matrix column? data.frame?
Give a data example with

dput( head(myData, 20) )   # paste the output of this in a post

Hope this helps,

Rui Barradas

Em 29-07-2012 21:59, Wellington G. Silva escreveu:


Hi,


I'm Wellington from Brazil and I have the following issue:


I've been working on a project a for a while, and I'm having trouble in
using the loop (for)


I need to read a column (c1), and for each value of this column, I need to
check if it's within the control limits


So, I was trying to do this:


For (k in 1: c1)


If (c1 lcl1 | c1  ucl1) {here I need to store the values outside the
limits)


I have 5 columns, need to do the same process in each one of them.


And later on I'm gonna concatenate these 5 vectors and calculate its mean
for an ARL project.






 [[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 113, Issue 20

2012-07-17 Thread Francois Maurice

Hi,
 
I'm using write.table() to export dataframe to Excel. All works fine except 
that I want to export the variable labels instead of variable names.
 
 I can see the labels in the R consol using attr(), but I just don't know how 
to use the labels instead of the names.
 
Thanks,
 
François Maurice
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help

2012-07-16 Thread Sébastien Morant

hi
would you please remove my email from the mailing list.
thanks in advance
Sebastien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-07-16 Thread Jessica Streicher

https://stat.ethz.ch/mailman/listinfo/r-help

at the bottom…

On 16.07.2012, at 10:52, Sébastien Morant wrote:

 hi
 would you please remove my email from the mailing list.
 thanks in advance
 Sebastien
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 113, Issue 19

2012-07-16 Thread e-letter

On 16/07/2012, r-help-requ...@r-project.org
r-help-requ...@r-project.org wrote:
 --

 Message: 77
 Date: Mon, 16 Jul 2012 10:48:39 +0100
 From: Rui Barradas ruipbarra...@sapo.pt
 To: e-letter inp...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] histogram of time-stamp data
 Message-ID: 5003e377.3000...@sapo.pt
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed


 timestamps - as.POSIXct(Sys.Date()) + sample(24*60*60, 1e3, TRUE)


Why is it necessary to apply the sample to the current date? Looking
at the dataframe, I noticed that values have been changed:

file 'test.txt':
12:19:00
09:30:00
16:56:00
01:56:00
10:44:00
10:31:00
02:14:00
05:05:00
12:52:00
21:50:00

R command terminal input:
 timestamps-read.table(test.txt)
 timestamps
 V1
1  12:19:00
2  09:30:00
3  16:56:00
4  01:56:00
5  10:44:00
6  10:31:00
7  02:14:00
8  05:05:00
9  12:52:00
10 21:50:00
 timestamps - as.POSIXct(Sys.Date()) + sample(24*60*60, 1e3, TRUE)
 write.csv(timestamps,file=test1.txt)

test1.txt:
,x
1,2012-07-16 04:52:48
2,2012-07-16 21:21:28
3,2012-07-16 18:58:27
4,2012-07-16 22:17:25
5,2012-07-16 11:13:52
6,2012-07-16 03:17:35
7,2012-07-16 02:14:17
8,2012-07-16 14:18:27
9,2012-07-16 14:39:16

Why is this happening?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 113, Issue 13

2012-07-10 Thread Terry Therneau


http://www.ncbi.nlm.nih.gov/pubmed/21418051  for the full reference.
I don't have an electronic copy, but I do have that issue of Biometrics 
in my office.  I'll have a copy sent over.


Terry


On 07/10/2012 04:08 PM, r-help-requ...@r-project.org wrote:

Send R-help mailing list submissions to
r-help@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
r-help-requ...@r-project.org

You can reach the person managing the list at
r-help-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-help digest...


Today's Topics:

1. how can I show the xlab and ylab information while using
   layout (Jie Tang)
2. Re: how can I show the xlab and ylab information while using
   layout (Sarah Goslee)
3. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (Rui Barradas)
4. Re: How to use external image with R plot? (Michael Sumner)
5. Re: is it possible to insert a figure into into another new
   figure by r script (Thomas Adams)
6. image.plot transparent? (Chris82)
7. Re: How to add marker in Stacked bar plot? (Jorge I Velez)
8. Re: Predicted values for zero-inflated Poisson (Laura Lee)
9. Re: how can I show the xlab and ylab information while using
   layout (Sarah Goslee)
   10. Re: image.plot transparent? (Sarah Goslee)
   11. RGB components of plot() colours ( (Ted Harding))
   12. Re: number of decimal places in a number? (Martin Ivanov)
   13. define stuff to be only usable in the same file
   (Jessica Streicher)
   14. fill 0-row data.frame with 1 line of NAs (Liviu Andronic)
   15. Re: RGB components of plot() colours (Duncan Murdoch)
   16. Re: define stuff to be only usable in the same file
   (Duncan Murdoch)
   17. Re: RGB components of plot() colours (Sarah Goslee)
   18. Re: Predicted values for zero-inflated Poisson (Achim Zeileis)
   19. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas)
   20. Re: Plotting rpart trees with long list of class members
   (Jean V Adams)
   21. Re: image.plot transparent? (Prof Brian Ripley)
   22. customize packages' help index ( 00index.html file )
   (Damien Georges)
   23. identify.hclust() doesn't cut tree at the vertical position
   of the mouse pointer (WATSON Mick)
   24. Use of Sappy and Tappy for Mathematical Calculation (Rantony)
   25. fitting power growth (Thomas Hoffmann)
   26. Mac OS X R uninstallation question (Alastair)
   27. Count of elements in coulmns of a matrix (Rantony)
   28. Re: Questions about doing analysis based on time (APOCooter)
   29. gdata: Problem reading excel document containing non-US
   characters (=?iso-8859-1?Q?Rolf_Marvin_B=F8e_Lindgren?=)
   30. Re: Count of elements in coulmns of a matrix (Sarah Goslee)
   31. Re: define stuff to be only usable in the same file
   (Jessica Streicher)
   32. Re: Mac OS X R uninstallation question (Prof Brian Ripley)
   33. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers)
   34. Re: define stuff to be only usable in the same file
   (Jessica Streicher)
   35. estimation of NA by predict command (eliza botto)
   36. Re: fill 0-row data.frame with 1 line of NAs (Brian Diggs)
   37. Help with vectors and rollapply (Raghuraman Ramachandran)
   38. Re: multiple comparisons with generalised least squares (Ariel)
   39. RGL 3D curvilinear shapes (PatGauthier)
   40. Re: estimation of NA by predict command (arun)
   41. Re: Count of elements in coulmns of a matrix (arun)
   42. Re: Help with vectors and rollapply (William Dunlap)
   43. Re: Predicted values for zero-inflated Poisson (Laura Lee)
   44. Re: Use of Sappy and Tappy for Mathematical Calculation
   (Nordlund, Dan (DSHS/RDA))
   45. Re: Questions about doing analysis based on time (Rui Barradas)
   46. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (vimmster)
   47. Revolutions blog: June Roundup (David Smith)
   48. calculating the difference between days? (C W)
   49. Re: R to winbugs interface (Uwe Ligges)
   50. Re: Predicted values for zero-inflated Poisson
   (Highland Statistics Ltd)
   51. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (Rui Barradas)
   52. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers)
   53. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas)
   54. Re: Predicted values for zero-inflated Poisson (Laura Lee)
   55. Re: Specify model with polynomial interaction terms up to
   degree n (YTP)
   56. Re: Use of Sappy and Tappy for Mathematical Calculation (arun)
   57. problem for installing rgdal (stanislas rebaudet)
   58. HELP me please with import of csv to R (F86)
   59. R code help to change table format (peziza)
   60. Re: Skipping lines and incomplete rows (arun)

[R] R help, using R to build choropleth

2012-06-28 Thread iverson

Hi guys

i need some help to build choropleth.

Basically i am trying to colour regions on the map by population.

I possess the shape file of the country, and also the population data,
however, i am having trouble to create the plot, below is my code:



population=read.csv(nz2.csv)
population
names(population)

nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir)))

nz$land_distr

mapnames = nz$land_distr

mapnamesnew=as.character(mapnames)

popnames =as.character(mapnamesnew)
pop =population$People

pop2=as.numeric(pop)

popdich = ifelse(pop2  10, red, blue)
popnames.lower = tolower(popnames)

cols=popdich[match(mapnames,popnames.lower)]

plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000)




thanks in advance for any assistance

Kind regards

Iverson



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-using-R-to-build-choropleth-tp4634686.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help, using R to build choropleth

2012-06-28 Thread Pascal Oettli


Hello,

You can find some hints there:
http://geography.uoregon.edu/geogr/topics/index.html

Regards


Le 12/06/28 14:26, iverson a écrit :

Hi guys

i need some help to build choropleth.

Basically i am trying to colour regions on the map by population.

I possess the shape file of the country, and also the population data,
however, i am having trouble to create the plot, below is my code:



population=read.csv(nz2.csv)
population
names(population)

nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir)))

nz$land_distr

mapnames = nz$land_distr

mapnamesnew=as.character(mapnames)

popnames =as.character(mapnamesnew)
pop =population$People

pop2=as.numeric(pop)

popdich = ifelse(pop2  10, red, blue)
popnames.lower = tolower(popnames)

cols=popdich[match(mapnames,popnames.lower)]

plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000)




thanks in advance for any assistance

Kind regards

Iverson



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-using-R-to-build-choropleth-tp4634686.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help, using R to build choropleth

2012-06-28 Thread Jim Lemon


On 06/28/2012 03:26 PM, iverson wrote:

Hi guys

i need some help to build choropleth.

Basically i am trying to colour regions on the map by population.

I possess the shape file of the country, and also the population data,
however, i am having trouble to create the plot, below is my code:



population=read.csv(nz2.csv)
population
names(population)

nz=readShapeSpatial((sprintf('nz-geodetic-marks',tmp_dir)))

nz$land_distr

mapnames = nz$land_distr

mapnamesnew=as.character(mapnames)

popnames =as.character(mapnamesnew)
pop =population$People

pop2=as.numeric(pop)

popdich = ifelse(pop2  10, red, blue)
popnames.lower = tolower(popnames)

cols=popdich[match(mapnames,popnames.lower)]

plot(nz,fill=TRUE,col=cols,proj=GCS_NZGD_2000)


Hi Iverson,
Here is some code that I have used to draw a choropleth map of NSW where 
the colors represent the average Index of Relative Social Disadvantage 
for Statistical Local Areas in New South Wales. I've cut out the bits 
that map the locations of low speed vehicle runovers as you won't need 
that code.


# load the RGDAL package
require(rgdal)
# read in the SLA boundaries for Australia
SLAmap-readOGR(.,SLA10aAust)
# pick out the bits in NSW
NSWmap-SLAmap[1:199,]
# read in the SEIFA (including IRSD) values
AU_SEIFA-read.csv(AU_SEIFA_SLA_2006.csv)
require(plotrix)
# generate the colors for the IRSD scores
NSW_SEIFAcol-color.scale(AU_SEIFA$SEIFA1dec[1:199],
 c(1,0.9,0.8,0.8),c(0.8,0.9,0.9,0.8),c(0.8,0.8,0.9,1),xrange=c(0,10))
# open the graphics device
png(lsvro_NSW_96-10.png,width=700,height=700)
par(mar=c(4,0,3,1))
# plot the map with the colors
plot(NSWmap,xlim=c(140,max(lsvro$longitude,na.rm=TRUE)),
 col=NSW_SEIFAcol)
legend_values-c(0:10)
# generate the colors for the legend
SEIFAlegendcol-color.scale(legend_values,
 c(1,0.9,0.8,0.8),c(0.8,0.9,0.9,0.8),c(0.8,0.8,0.9,1))
# stick a legend on the side
color.legend(151.8,-37.5,152.3,-34.5,as.character(legend_values),SEIFAlegendcol,
 align=rb,gradient=y)
title(main=Low speed vehicle run overs NSW - 1996-2010)
par(xpd=TRUE)
# put a title on the color legend
text(151.8,-34.3,SEIFA decile,adj=0)
par(xpd=FALSE)
dev.off()

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 112, Issue 25

2012-06-25 Thread John C Nash

While lm() is a linear modeling, the constraints make it easier to solve with a 
nonlinear
tool. Both my packages Rvmmin and Rcgmin (I recommend the R-forge versions as 
more
up-to-date) have bounds constraints and masks i.e., fixed parameters.

I am actually looking for example problems of this type that are more recent 
than the ones
that got me into this 30 years ago. Do contact me off-list if you have 
something that
could be shared. I'd also welcome discussion on appropriate tools for such 
constrained
linear modeling problems. They are, I believe, more or less present in most 
linear
modeling situations, but we rarely impose the constraints explicitly, and tend 
to use lm()
and (hopefully) check if the solution obeys the conditions.

Best,

John Nash


On 06/25/2012 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 5
 Date: Sun, 24 Jun 2012 03:34:10 -0700 (PDT)
 From: rgoodman rosa.good...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Constrained coefficients in lm (correction)
 Message-ID: 1340534050627-4634321.p...@n4.nabble.com
 Content-Type: text/plain; charset=us-ascii
 
 Hi Jorge,
 
 Did you ever figure this out? I want to do the same thing with the
 additional constraint of the coef for x1 = 2. 
 
 lm(Y~offset(2*x1)+x2+x3,data=mydata)
 where b= coeff for x2, c = coeff for x3, b+c=1 and b and c0. 
 
 I've loaded the systemfit package, but the suggestion R*beta0 = q, where R
 is R.restr and q is q.restr in the function call makes no sense to me.
 
 Cheers,
 Rosie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 112, Issue 6

2012-06-06 Thread David L Lorenz

Rich,
  The documentation for cenboxplot states that the second argument must be 
logical and not integer. the function cenboxplot substitutes synthetic 
values for censored values using ros, hence the error message from the ros 
method.
  I also do not understand how you expect group = 'SO4' to work. It is not 
clear that the function will replicate 'SO4' to make a single group.
Dave

 Date: Tue, 5 Jun 2012 09:58:16 -0700 (PDT)
 From: Rich Shepard rshep...@appl-ecosys.com
 To: r-help@r-project.org
 Subject: [R] NADA Applied to my Data
 Message-ID:
alpine.lnx.2.00.1206050931300.27...@salmo.appl-ecosys.com
 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
 
I need a nudge in the right direction to get started using NADA. I 
bought
 Helsel's second addition and am currently reading it; NADA is installed 
in
 R.
 
My data has been restructured with a couple of awk scripts. The data 
frame
 structure now has a flag if the quantity is censored (ceneq1 column) as 
well
 as a lower and upper limit for censored data. For present purposes, 
interval
 censoring can be ignored. The data frame structure is now:
 
 str(waterchem)
 'data.frame':   46551 obs. of  7 variables:
   $ site: Factor w/ 126 levels BC-0.5,BC-1,..: 22 22 22 13 3 13 
...
   $ sampdate: Date, format: 1996-05-22 1996-07-19 ...
   $ param   : Factor w/ 58 levels -0.100,AGP,..: 47 58 10 16 16 26 
...
   $ quant   : num  0.01 7.69 0.02 63.8 120 0.02 399 439 2 433 ...
   $ ceneq1  : int  1 0 0 0 0 0 0 0 0 0 ...
   $ low : num  0 7.69 0.02 63.8 120 0.02 399 439 2 433 ...
   $ high: num  0.01 7.69 0.02 63.8 120 0.02 399 439 2 433 ...
 
What I want to first learn is how to specify a box plot (and whether 
I can
 use the lattice package) for specific chemicals.
 
 ?cenboxplot shows me the arguments, but I'm not entering them correctly, 
or
 there's a prerequisite step I need to take:
 
 cenboxplot(waterchem$quant, waterchem$ceneq1, group='SO4', log=T, 
range=1.5)
 Error in function (classes, fdef, mtable)  :
unable to find an inherited method for function ros, for signature
 numeric, integer
 
Perhaps cenboxplot is looking for a separate data set and not a data
 frame? Or, perhaps I need to melt and re-cast the data frame to the wide
 format from the current narrow format? Pointers appreciated.
 
 Rich
 
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-03 Thread Nutter, Benjamin

So long as x is a character vector, I tend to use the following for this 
problem.

 x - c(12/31/11 23:45, 1/1/12 2:15)
 
 x.split - strsplit(x,  )
 
 x.date - sapply(x.split, function(y) return(y[1]))
 x.time - sapply(x.split, function(y) if (length(y)  1) return(y[2]) else NA)
 
 x.date
[1] 12/31/11 1/1/12  
 x.time
[1] 23:45 2:15 


  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
445-1365


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Alex Roth
Sent: Wednesday, May 02, 2012 4:01 PM
To: r-help@r-project.org
Subject: [R] R help!

Hello there, I was wondering if you could help me with a quick R issue.

I have a data set where one of the columns has both date and time in it, e.g. 
12/31/11 23:45 in one cell. I want to use R to split this column into two new 
columns: date and time.

One of the problems with splitting here is that when the dates go into single 
digits there are no 0's in front of months January-September (e.g., January is 
represented by 1 as opposed to 01), so every entry is a different length. 
Therefore, splitting by the space is the only option, I think.

Here's the coding I've developed thus far:

z$dt - z$Date#time and date is all under z$Date
foo - strsplit( , z$dt) #attempted split based on the space

And then if that were to work, I would proceed use the coding:

foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE) z$Date - foo[ ,1] z$Time - 
foo[ ,2]

However, foo - strsplit( , z$dt) isn't working. Do you know what the problem 
is? If you could respond soon, that would be greatly appreciated!

Thanks so much!
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===

 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News  World Report (2010).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use\...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-03 Thread peter dalgaard


On May 3, 2012, at 14:05 , Nutter, Benjamin wrote:

 So long as x is a character vector, I tend to use the following for this 
 problem.
 
 x - c(12/31/11 23:45, 1/1/12 2:15)
 
 x.split - strsplit(x,  )
 
 x.date - sapply(x.split, function(y) return(y[1]))
 x.time - sapply(x.split, function(y) if (length(y)  1) return(y[2]) else 
 NA)
 
 x.date
 [1] 12/31/11 1/1/12  
 x.time
 [1] 23:45 2:15 
 
 
   Benjamin Nutter |  Biostatistician |  Quantitative Health Sciences
   Cleveland Clinic|  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
 445-1365
 


I think this can be simplified to

 sapply(x.split,`[`,1)
[1] 12/31/11 1/1/12  
 sapply(x.split,`[`,2)
[1] 23:45 2:15 

It's a bit inefficient, though. Other ideas:

 sub( .*$, , x)
[1] 12/31/11 1/1/12  
 sub(^.* , , x)
[1] 23:45 2:15 
 read.table(text=x)
V1V2
1 12/31/11 23:45
2   1/1/12  2:15


-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-03 Thread Petr PIKAL

Hi

I would convert it to propper date format and then you can extract 
anything.

dat-strptime(12/31/11 23:45, format=%m/%d/%y %H:%M)
as.Date(dat)
[1] 2011-12-31
format(dat, %H:%M)
[1] 23:45

Regards
Petr
 

 
 Hello there, I was wondering if you could help me with a quick R issue.
 
 I have a data set where one of the columns has both date and time in
 it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this
 column into two new columns: date and time.
 
 One of the problems with splitting here is that when the dates go into
 single digits there are no 0's in front of months January-September
 (e.g., January is represented by 1 as opposed to 01), so every entry
 is a different length. Therefore, splitting by the space is the only
 option, I think.
 
 Here's the coding I've developed thus far:
 
 z$dt - z$Date#time and date is all under z$Date
 foo - strsplit( , z$dt) #attempted split based on the space
 
 And then if that were to work, I would proceed use the coding:
 
 foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE)
 z$Date - foo[ ,1]
 z$Time - foo[ ,2]
 
 However, foo - strsplit( , z$dt) isn't working. Do you know what
 the problem is? If you could respond soon, that would be greatly
 appreciated!
 
 Thanks so much!
 Alex
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help!

2012-05-02 Thread Alex Roth

Hello there, I was wondering if you could help me with a quick R issue.

I have a data set where one of the columns has both date and time in
it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this
column into two new columns: date and time.

One of the problems with splitting here is that when the dates go into
single digits there are no 0's in front of months January-September
(e.g., January is represented by 1 as opposed to 01), so every entry
is a different length. Therefore, splitting by the space is the only
option, I think.

Here's the coding I've developed thus far:

z$dt - z$Date#time and date is all under z$Date
foo - strsplit( , z$dt) #attempted split based on the space

And then if that were to work, I would proceed use the coding:

foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE)
z$Date - foo[ ,1]
z$Time - foo[ ,2]

However, foo - strsplit( , z$dt) isn't working. Do you know what
the problem is? If you could respond soon, that would be greatly
appreciated!

Thanks so much!
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-02 Thread Jorge I Velez

Hi Alex,

Here is one way:

s - 12/31/11 23:45
strsplit(s,  )[[1]]
# [1] 12/31/11 23:45
*

HTH,
Jorge.-

*

On Wed, May 2, 2012 at 4:00 PM, Alex Roth  wrote:

 Hello there, I was wondering if you could help me with a quick R issue.

 I have a data set where one of the columns has both date and time in
 it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this
 column into two new columns: date and time.

 One of the problems with splitting here is that when the dates go into
 single digits there are no 0's in front of months January-September
 (e.g., January is represented by 1 as opposed to 01), so every entry
 is a different length. Therefore, splitting by the space is the only
 option, I think.

 Here's the coding I've developed thus far:

 z$dt - z$Date#time and date is all under z$Date
 foo - strsplit( , z$dt) #attempted split based on the space

 And then if that were to work, I would proceed use the coding:

 foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE)
 z$Date - foo[ ,1]
 z$Time - foo[ ,2]

 However, foo - strsplit( , z$dt) isn't working. Do you know what
 the problem is? If you could respond soon, that would be greatly
 appreciated!

 Thanks so much!
 Alex

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-02 Thread Berend Hasselman


On 02-05-2012, at 22:00, Alex Roth wrote:

 Hello there, I was wondering if you could help me with a quick R issue.
 
 I have a data set where one of the columns has both date and time in
 it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this
 column into two new columns: date and time.
 
 One of the problems with splitting here is that when the dates go into
 single digits there are no 0's in front of months January-September
 (e.g., January is represented by 1 as opposed to 01), so every entry
 is a different length. Therefore, splitting by the space is the only
 option, I think.
 
 Here's the coding I've developed thus far:
 
 z$dt - z$Date#time and date is all under z$Date
 foo - strsplit( , z$dt) #attempted split based on the space
 
 And then if that were to work, I would proceed use the coding:
 
 foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE)
 z$Date - foo[ ,1]
 z$Time - foo[ ,2]
 
 However, foo - strsplit( , z$dt) isn't working. Do you know what
 the problem is? If you could respond soon, that would be greatly
 appreciated!

Have a look at the help for strsplit.
You seem to have the arguments of strsplit the wrong way around.

strsplit(z$dt,  )

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help!

2012-05-02 Thread Jeff Newmiller

Hard to say.. your example is not reproducible. (Study the Posting Guide 
mentioned at the end of every message on this list.)

A stab in the dark might be that z$dt is a factor and needs to be converted to 
character first. Use the str function to study your data.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Alex Roth alexsro...@gmail.com wrote:

Hello there, I was wondering if you could help me with a quick R issue.

I have a data set where one of the columns has both date and time in
it, e.g. 12/31/11 23:45 in one cell. I want to use R to split this
column into two new columns: date and time.

One of the problems with splitting here is that when the dates go into
single digits there are no 0's in front of months January-September
(e.g., January is represented by 1 as opposed to 01), so every entry
is a different length. Therefore, splitting by the space is the only
option, I think.

Here's the coding I've developed thus far:

z$dt - z$Date#time and date is all under z$Date
foo - strsplit( , z$dt) #attempted split based on the space

And then if that were to work, I would proceed use the coding:

foo2 - matrix(unlist(foo), ncol = 2, byrow=TRUE)
z$Date - foo[ ,1]
z$Time - foo[ ,2]

However, foo - strsplit( , z$dt) isn't working. Do you know what
the problem is? If you could respond soon, that would be greatly
appreciated!

Thanks so much!
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 110, Issue 28

2012-04-26 Thread Richard Valliant

I will be out of the office Apr 26-27, 2012 with limited or no email
access.
For immediate help, please call the JPSM main number,
301-314-7911.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 110, Issue 23

2012-04-23 Thread Terry Therneau

Yes, the (start, stop] formalism is the easiest way to deal with time
dependent data.

Each individual only needs to have sufficient data to describe them, so
for if id number 4 is in house 1, their housemate #1 was eaten at time
2, and the were eaten at time 10, the following is sufficient data for
that subject:

id house time1 time2 status discovered
4 10 20 false
4 12 10 1 true

We don't need observations for each intermediate time, only that from
0-2 they were not yet discovered and that from 2-10 they were. The
status variable tells whether an interval ended in disaster. Use
Surv((time1, time2, status) on the left side of the equation.

Since the time scale is discrete you should technically use
method='exact' in a Cox model, but the default Efron approximation will
be very close.

Interval censoring isn't necessary. You will have a model of time to
discovery instead of time to eaten, but with a fixed examination
schedule such as you have there is no information in the data to help
you move from one to the other. The standard interval approach would
just assume deaths happened at the midpoint between examinations.

Terry T.

On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear R users,

I fear this is terribly trivial but I'm struggling to get my head around it.

First of all, I'm using the survival package in R 2.12.2 on Windows Vista with the
RExcel plugin. You probably only need to know that I'm using survival for this.

I have data collected from 180 or so individuals that were checked 7 times
throughout a trial with set start and end times. Once the event happens (death
by predator) there are no more checks for that individual. This means that I
check on each individual up to 7 times with either an event recorded or the
final time being censored.

At the moment, I have a data sheet with one observation per individual; that is
either the event time (the observation time when the individual had had an
event) or the censored time. However, I'd like to add a time dependent factor
and I also wonder if this data should be treated as interval censored.

The time dependent factor is like this. The individuals are grouped in houses and once one individual in a group has
an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered
the others in the house as well (the predator is able to consume many individuals). At the moment I'm coding this as
a normal two level factor (discovered) where all individuals alive after the first event in that house are TRUE and
the first individuals in a house to be eaten are FALSE. All individuals in houses that were not discovered at al are
also FALSEl. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first
individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).

Should I write up this data set with a start and stop time for every check I
made so each individual has up to 7 records, one for each time I checked?

Is there a quick and easy way to do this in R or would I have to go through the
data set manually?

Does coding the discovered factor the way I have, make statistical sense?

Should I worry about proportional hazards of the discovered factor? It seems
to me that it would often turn out not proportional because of its nature.

Sorry, lots of stats questions. I don't mind if you don't answer all of these.
Just knowing how to best feed this data into R would help me no end. The rest I
can probably glean from the millions of survival analysis books I have lying
about.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: Help; error in optim

2012-04-15 Thread Christopher Kelvin

Hello,
When i run the code below from Weibull distribution with 30% censoring by using 
optim i get an error form R, which states that


Error in optim(start, fn = z, data = q, hessian = T) : 
  objective function in optim evaluates to length 25 not 1

can somebody help me remove this error. Is my censoring approach correct.

n=25;rr=1000
p=1.5;b=1.2
for (i in 1:rr){
q-c(t,cen)
t-rweibull(25,shape=p,scale=b)
meantrue-gamma(1+(1/p))*b
meantrue
d-meantrue/0.30
cen- runif(25,min=0,max=d)
cen
s-ifelse(t=cen,1,0)

z-function(data,p){ 
beta-p[1]
eta-p[2]
log1-(n*cen*log(p[1])-n*cen*(p[1])*log(p[2])+cen*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1])))
return(-log1)
}
start -c(0.5,0.5)
zz-optim(start,fn=z,data=q,hessian=T)

m1-zz$par[2]
p-zz$par[1]

}
m1
p

Thank you

Chris Guure
Researcher
Institute for Mathematical Research
UPM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Help; error in optim

2012-04-15 Thread Berend Hasselman


On 16-04-2012, at 07:04, Christopher Kelvin wrote:

 Hello,
 When i run the code below from Weibull distribution with 30% censoring by 
 using optim i get an error form R, which states that
 
 
 Error in optim(start, fn = z, data = q, hessian = T) : 
   objective function in optim evaluates to length 25 not 1
 
 can somebody help me remove this error. Is my censoring approach correct.
 
 n=25;rr=1000
 p=1.5;b=1.2
 for (i in 1:rr){
 q-c(t,cen)
 t-rweibull(25,shape=p,scale=b)
 meantrue-gamma(1+(1/p))*b
 meantrue
 d-meantrue/0.30
 cen- runif(25,min=0,max=d)
 cen
 s-ifelse(t=cen,1,0)
 
 z-function(data,p){ 
 beta-p[1]
 eta-p[2]
 log1-(n*cen*log(p[1])-n*cen*(p[1])*log(p[2])+cen*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1])))
 return(-log1)
 }
 start -c(0.5,0.5)
 zz-optim(start,fn=z,data=q,hessian=T)
 
 m1-zz$par[2]
 p-zz$par[1]
 
 }
 m1
 p

The example as given doesn't run.

The first assignment after the start of the for loop ( q - ...) gives an error 
message: object 'cen' not found.
The assignment needs to moved to after the line with ifelse.

In function z object 'cen' (with length 25) is used in the calculation of log1, 
which becomes a vector of length 25.
You need to review the definition of log1 in function z.

Finally: why are you assigning p[1] and p[2] to beta and eta and nor using 
these variables?

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help: Censoring data (actually an optim issue

2012-04-14 Thread John C Nash

Your function is giving NaN's during the optimization.

The R-forge version of optimx() has functionality specifically intended to deal 
with this.
NOTE: the CRAN version does not, and the R-forge version still has some 
glitches!

However, I easily ran the code you supplied by changing optim to optimx in the 
penultimate
line. Here's the final output.

KKT condition testing
Number of parameters = 2
max abs gradient element = 0.004032794   test tol =  0.1018794
KKT1 result =  TRUE
Hessian eigenvalues:
[1] 7.138974e+02 9.931285e-04
KKT2 result =  TRUE
KKT results: gmax= 0.004032794   evratio= 1.391136e-06   KKT1  2:  TRUE TRUE
[1] 7.138974e+02 9.931285e-04
Save results from method  BFGS

 zz
  method   par  fvalues fns grs hes rs conv KKT1 KKT2
2   BFGS 0.2832468, 52.6096161 100.8794  10   1   0 -23 TRUE TRUE
1 NM 0.2827563, 54.3551491 100.8779  53   0   0  10 TRUE TRUE
 mtilt xtimes meths
2   NA   0.11  BFGS
1 0.0005130808   0.57NM
There were 50 or more warnings (use warnings() to see the first 50)


Note the Hessian eigenvalues are pretty bad. And the warnings are the NaN's 
produced.
This is with just the 2 default methods. When I tried all.methods, one 
(uobyqa) seemed
to lock up. This is a fairly ill-conditioned problem.


Best, JN


On 04/14/2012 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 13
 Date: Fri, 13 Apr 2012 03:54:43 -0700 (PDT)
 From: Christopher Kelvin chris_kelvin2...@yahoo.com
 To: r-help@r-project.org r-help@r-project.org
 Subject: [R] R-Help: Censoring data
 Message-ID:
   1334314483.27693.yahoomail...@web65408.mail.ac4.yahoo.com
 Content-Type: text/plain; charset=iso-8859-1
 
 Hello,
 ?I want to estimate weibull parameters with 30% censored data. I have below 
 the code for the censoring
 ?but how it must be put into the likelihood equation to obtain the desire 
 estimate is where i have a problem with,
 ?can some body help?
 ?My likelihood equation is for a random type-I censoring where time for the 
 censored units is different for each unit.
 ?
 n=50;r=35
 p=0.8;b=1.5
 t-rweibull(50,shape=p,scale=b)
 meantrue-gamma(1+(1/p))*b
 meantrue
 d-meantrue/0.30
 
 cen- runif(50,min=0,max=d)
 cen
 s-ifelse(t=cen,1,0)
 s
 
 z-function(p){?
 shape-p[1]
 scale-p[2]
 log1-(r*log(p[1])-r*(p[1])*log(p[2])+(p[1]-1)*sum(log(t))-sum((t/(p[2]))^(p[1])
 )-((n-r)*(sum(cen)/(p[2]))^(p[1])))
 return(-log1)
 }
 
 start - c(1,1)
 zz-optim(start,fn=z,hessian=T)
 zz
 
 Thanks in anticipation
 
 Chris Guure
 Researcher
 Institute for Mathematical Research
 UPM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-Help: Censoring data

2012-04-13 Thread Christopher Kelvin

Hello,
 I want to estimate weibull parameters with 30% censored data. I have below the 
code for the censoring
 but how it must be put into the likelihood equation to obtain the desire 
estimate is where i have a problem with,
 can some body help?
 My likelihood equation is for a random type-I censoring where time for the 
censored units is different for each unit.
 
n=50;r=35
p=0.8;b=1.5
t-rweibull(50,shape=p,scale=b)
meantrue-gamma(1+(1/p))*b
meantrue
d-meantrue/0.30

cen- runif(50,min=0,max=d)
cen
s-ifelse(t=cen,1,0)
s

z-function(p){ 
shape-p[1]
scale-p[2]
log1-(r*log(p[1])-r*(p[1])*log(p[2])+(p[1]-1)*sum(log(t))-sum((t/(p[2]))^(p[1])
)-((n-r)*(sum(cen)/(p[2]))^(p[1])))
return(-log1)
}

start - c(1,1)
zz-optim(start,fn=z,hessian=T)
zz

Thanks in anticipation

Chris Guure
Researcher
Institute for Mathematical Research
UPM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help; generating censored data

2012-04-11 Thread Christopher Kelvin

Hello,
 can i implement this as 10% censored data where t gives me failure and x 
censored.
Thank you

p=2;b=120
n=50

set.seed(132);
r-sample(1:50,45)
t-rweibull(r,shape=p,scale=b)
t
set.seed(123); 
cens - sample(1:50, 5) 
x-runif(cens,shape=p,scale=b) 
x

Chris Guure
Researcher,
Institute for Mathematical Research
UPM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help; generating censored data

2012-04-11 Thread Ted Harding

On 11-Apr-2012 16:28:31 Christopher Kelvin wrote:
 Hello,
 can i implement this as 10% censored data where t gives me
 failure and x censored.
 Thank you
 
 p=2;b=120
 n=50
 
 set.seed(132);
 r-sample(1:50,45)
 t-rweibull(r,shape=p,scale=b)
 t
 set.seed(123);_
 cens - sample(1:50, 5)_
 x-runif(cens,shape=p,scale=b)_
 x
 
 Chris Guure
 Researcher,
 Institute for Mathematical Research
 UPM

This query is obscure!

First, its approach does not seem to conform to the standard
notion of censored data. This refers to a situation where,
for each item observed, either (a) it value falls within a
certain range (which may itself depend on the item), in which
case its value is recorded as a value in the data; or (b) its
value falls outside that range, in which case that fact is
recorded but the value is not recorded (thus being censored).

Eaxmple: Patients who have been admitted to hospital for a
particular disease are subsequently monitored for a period
of time (days/months/years) which may vary from patient to
patient. The reason for the time limitation may be that the
design of the investigation set a limit, or may be haphazard
as a result of the patient moving away and no longer being
accessible. The value recorded (if available) is the time
from admission to death. If not available, then all that can
be recorded is that the event occurred later than the upper
time limit for thaqt patient.

As far as I can see, no element of your code above corresponds
to this notion of censored.

Next, your r-sample(1:50,45) selects 45 different values
from (1:50), and then your t-rweibull(r,shape=p,scale=b)
generates 45 values sampled from the Weibull distribution,
** regardless of the 45 values from (1:50) in r ** -- See
under '?rweibull' where it says:

  n: number of observations. If 'length(n)  1', the length
   is taken to be the number required.

So it would seem that your r-sample(1:50,45) is superfluous,
and you could simply have written t-rweibull(45,shape=p,scale=b).

Similar comments apply to your

  cens - sample(1:50, 5)
  x-runif(cens,shape=p,scale=b)

where you could have equivalently written x-runif(5,shape=p,scale=b).
Also, the parameters shape and scale would not be recognised
by runif(), whose parameters are as in runif(n, min=..., max=...).
Maybe you meant to write x-rweibull(cens,shape=p,scale=b),
but then you would simply be sampling a further 5 values from the
same Weibull distribution, along with your original 45.

So how does censoring come into this?

If you would explain, in plain words, what you are seeking to do,
it would help to remove this obscurity and confusion!

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 11-Apr-2012  Time: 23:23:36
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help; Censoring

2012-04-10 Thread Christopher Kelvin

Hello,
I wish to censor 10% of my sample units of 50 from a Weibull distribution. 
Below is the code for it.
I will need to know whether what i have done is correct and if not, can i have 
any suggestion to improve it?
Thank you

 p=2;b=120
n=50
r=45

t-rweibull(r,shape=p,scale=b)
meantrue-gamma(1+(1/p))*b
meantrue

cen- runif(n-r,min=0,max=meantrue)
cen


Chris Guure
Researcher,
Institute for Mathematical Research
UPM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help; Censoring

2012-04-10 Thread David Winsemius



On Apr 10, 2012, at 11:48 PM, Christopher Kelvin wrote:


Hello,
I wish to censor 10% of my sample units of 50 from a Weibull  
distribution. Below is the code for it.
I will need to know whether what i have done is correct and if not,  
can i have any suggestion to improve it?

Thank you

 p=2;b=120
n=50
r=45

t-rweibull(r,shape=p,scale=b)
meantrue-gamma(1+(1/p))*b
meantrue

cen- runif(n-r,min=0,max=meantrue)
cen


I don't see how that works. You have five random draws from a uniform  
distribution with support on some interval, not necessarily related to  
the t vector. How are you deciding which of the variates should be  
considered censored? Why wouldn't you just use set.seed(n); cens -  
sample(1:50, 5)) and use as an index?



--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help

2012-04-07 Thread kebrab67

I have a 11*1562 set of data. I want to regress using ols all combination of
the ten variables on the eleventh. Store the results of the t test of each
regression and the r2 to compare which combination is better. the
combination can use any amount o variables. How can I do that in R?

--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-04-07 Thread R. Michael Weylandt

This is, more often that not, statistically speaking a bad idea:
you're likely to get a false positive (https://xkcd.com/882/)

But it sounds like something along these lines should work for you to
get the models :

x - data.frame(rnorm(50), rnorm(50), rnorm(50), rnorm(50))
m - vector(list, ncol(x)) # Make a list to hold all the resulting models

for(i in seq_along(x)) m[[i]] - lm(as.formula(paste(names(x)[i],
~.)), data = x)

To parse that, loop over the columns of x (that's what seq_along will
give you if x is a data frame) -- take the i-th name and model is
based on everything else (~.) -- coerce that thing to a formula and
pass it to lm() to do the linear model.

Hope this helps,
Michael

On Sat, Apr 7, 2012 at 11:27 AM, kebrab67 selamgetac...@gmail.com wrote:
 I have a 11*1562 set of data. I want to regress using ols all combination of
 the ten variables on the eleventh. Store the results of the t test of each
 regression and the r2 to compare which combination is better. the
 combination can use any amount o variables. How can I do that in R?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html
 Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-04-07 Thread Peter Ehlers


On 2012-04-07 08:27, kebrab67 wrote:

I have a 11*1562 set of data. I want to regress using ols all combination of
the ten variables on the eleventh. Store the results of the t test of each
regression and the r2 to compare which combination is better. the
combination can use any amount o variables. How can I do that in R?


Personally, I think that what you propose is scientific nonsense.
But if you must do this, you might check out the leaps package.

Peter Ehlers



--
View this message in context: 
http://r.789695.n4.nabble.com/R-help-tp4539571p4539571.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Help

2012-03-31 Thread Jaymin Shah

Hi, 

I have a polynomial of 2n^2-5n+3 and I have my n values going up in powers of 2 
i.e. n=2,4,8,16…..

I wanted to fit this curve to the function A*n*log2(n) +B*n where A and B are 
to be found.

How would i do this?

Thank you


Jaymin
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Help

2012-03-31 Thread R. Michael Weylandt

This sounds a little bit like homework (since you know the actual
model), so I'll just point you in the right direction:

I'd do something like:

n - 2^(1:10)
y - 2*n^2 - 5*n + 3
dtf - data.frame(y = y, n=n)

lm( **, data = dtf)

The * should be replaced by a formula object: to find the syntax
for those, just google around or read ?formula and the examples of
?lm. The help pages are a little opaque (very good, just not super
beginner friendly) but the easiest way to set up formulas is

+ includes a term
: includes just a cross term
* includes a cross term and the individual terms (i.e., A*B = A + B + A:B)
- removes a term.

That should help you set things up.

Hope this helps,
Michael


On Sat, Mar 31, 2012 at 10:03 AM, Jaymin Shah jayminsh...@hotmail.com wrote:
 Hi,

 I have a polynomial of 2n^2-5n+3 and I have my n values going up in powers of 
 2 i.e. n=2,4,8,16…..

 I wanted to fit this curve to the function A*n*log2(n) +B*n where A and B are 
 to be found.

 How would i do this?

 Thank you


 Jaymin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 109, Issue 16

2012-03-16 Thread Gerald Lindsly

Q #1: yesterday:

A: 1. Restructure your database with NoSQL.  See MongoDB.org.
    2. Choose application language with which to work (and get Driver)
and write your code.
    3. R package - rmongodb.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-help-es have reached 500 members!

2012-03-13 Thread Kjetil Halvorsen

This posting is only to celebrate that R-help-es (R-help for Spanish
Speakers) have reached 500
members!

(and to thank Patricia for doing the bulk of admin work).

Kjetil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help-es have reached 500 members!

2012-03-13 Thread Igor Sosa Mayor

:)

congratulations!

On Tue, Mar 13, 2012 at 08:59:56AM -0600, Kjetil Halvorsen wrote:
 This posting is only to celebrate that R-help-es (R-help for Spanish
 Speakers) have reached 500
 members!
 
 (and to thank Patricia for doing the bulk of admin work).
 
 Kjetil
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
:: Igor Sosa Mayor   :: joseleopoldo1...@gmail.com ::
:: GnuPG: 0x69804897 :: http://www.gnupg.org/  ::

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 109, Issue 9

2012-03-09 Thread Terry Therneau


On Fri, 2012-03-09 at 12:00 +0100, r-help-requ...@r-project.org wrote:
  A note on standard errors: ?S(t) +- std is a terrible confidence
  interval. ?You will be much more accurate if you use log
 scale. ?(Some
  argue for logit or log-log, in truth they all work well.) ? If n is
 large
  enough, however, you should be ok.
 
 Very true, but if one really wants a confidence interval for
 S_1(t)-S_2(t)  (not for S_1(t)/S_2(t)) then one is pretty much forced
 to use the raw probability scale.
 
 -thomas

 I agree, and stand corrected.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread Petr Savicky

On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
 Dear all,
   I need to generate numbers from multivariate normal with large dimensions
 (5,000,000).
 Below is my code and the error I got from R. Sigma in the code is the
 covariance
 matrix. Can anyone give some idea on how to take care of this error. Thank
 you.
 Hannah
 
  m - 500
  m1 - 0.5*m
  rho - 0.5
  Sigma - rho* matrix(1, m, m)+diag(1-rho, m)
 Error in matrix(1, m, m) : too many elements specified

Hi.

The matrix of dimension m times m does not fit into memory,
since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.

Generating a multivariate normal with a covariance matrix
with 1 on the diagonal and rho outside of the diagonal may
be done also as follows.

  m - 10 # can be 500
  rho - 0.5
  # single vector
  x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

  # several vectors
  a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho

  # check the sample covariance matrix if m is not too large
  sigma - cov(a)
  range(diag(sigma)) # elements on the diagonal

  [1] 0.9963445 1.0196015

  diag(sigma) - NA 
  range(sigma, na.rm=TRUE) # elements outside of the diagonal

  [1] 0.4935129 0.5162836

Generating several vectors using replicate() may not be efficient.
The following can be used instead.

  n - 1
  a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, 
sd=sqrt(rho))

Note that the size of a is n times m and it should fit into the memory.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread li li

Petr,
   Thanks for the help. That certainly makes sense and solves my current
problem. But, in general, for arbitrary covariance matrix (instead of the
special equi-correlation case), it there a way to generate numbers from
 multivariate normal when the dimension is very high?
   Thanks.
 Hannah

ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czÐ´µÀ£º

 On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
  Dear all,
I need to generate numbers from multivariate normal with large
 dimensions
  (5,000,000).
  Below is my code and the error I got from R. Sigma in the code is the
  covariance
  matrix. Can anyone give some idea on how to take care of this error.
 Thank
  you.
  Hannah
 
   m - 500
   m1 - 0.5*m
   rho - 0.5
   Sigma - rho* matrix(1, m, m)+diag(1-rho, m)
  Error in matrix(1, m, m) : too many elements specified

 Hi.

 The matrix of dimension m times m does not fit into memory,
 since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.

 Generating a multivariate normal with a covariance matrix
 with 1 on the diagonal and rho outside of the diagonal may
 be done also as follows.

  m - 10 # can be 500
  rho - 0.5
  # single vector
  x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

  # several vectors
  a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 -
 rho

  # check the sample covariance matrix if m is not too large
  sigma - cov(a)
  range(diag(sigma)) # elements on the diagonal

  [1] 0.9963445 1.0196015

  diag(sigma) - NA
  range(sigma, na.rm=TRUE) # elements outside of the diagonal

  [1] 0.4935129 0.5162836

 Generating several vectors using replicate() may not be efficient.
 The following can be used instead.

  n - 1
  a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n,
 sd=sqrt(rho))

 Note that the size of a is n times m and it should fit into the memory.

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread li li

Actually I still get an error for the case of equal correlation.
Below is the code

 m - 10
 n - 500
 m1 - 0.5*m
 mu - c(rep(2, m1), rep(0, m-m1))
 rho - 0.5
 x.mod1 - matrix(rnorm(n*m, sd=sqrt(1-rho)), nrow=n, ncol=m)+rnorm(n,
sd=sqrt(rho))+t(replicate(n,mu))
Error: cannot allocate vector of size 381.5 Mb


R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug






ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç10:53£¬li li hannah@gmail.comÐ´µÀ£º


 Petr,
Thanks for the help. That certainly makes sense and solves my current
 problem. But, in general, for arbitrary covariance matrix (instead of the
 special equi-correlation case), it there a way to generate numbers from
  multivariate normal when the dimension is very high?
Thanks.
  Hannah

 ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czÐ´µÀ£º

 On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
  Dear all,
I need to generate numbers from multivariate normal with large
 dimensions
  (5,000,000).
  Below is my code and the error I got from R. Sigma in the code is the
  covariance
  matrix. Can anyone give some idea on how to take care of this error.
 Thank
  you.
  Hannah
 
   m - 500
   m1 - 0.5*m
   rho - 0.5
   Sigma - rho* matrix(1, m, m)+diag(1-rho, m)
  Error in matrix(1, m, m) : too many elements specified

 Hi.

 The matrix of dimension m times m does not fit into memory,
 since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.

 Generating a multivariate normal with a covariance matrix
 with 1 on the diagonal and rho outside of the diagonal may
 be done also as follows.

  m - 10 # can be 500
  rho - 0.5
  # single vector
  x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

  # several vectors
  a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 -
 rho

  # check the sample covariance matrix if m is not too large
  sigma - cov(a)
  range(diag(sigma)) # elements on the diagonal

  [1] 0.9963445 1.0196015

  diag(sigma) - NA
  range(sigma, na.rm=TRUE) # elements outside of the diagonal

  [1] 0.4935129 0.5162836

 Generating several vectors using replicate() may not be efficient.
 The following can be used instead.

  n - 1
  a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n,
 sd=sqrt(rho))

 Note that the size of a is n times m and it should fit into the memory.

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread peter dalgaard


On Feb 19, 2012, at 16:53 , li li wrote:

 Petr,
   Thanks for the help. That certainly makes sense and solves my current
 problem. But, in general, for arbitrary covariance matrix (instead of the
 special equi-correlation case), it there a way to generate numbers from
 multivariate normal when the dimension is very high?


In the general case, there is really no alternative to an algorithm on the 
order of p^2 in space and p^2 per replication in time. The covariance matrix is 
of size p*(p+1)/2 and you will have to multiply by its square root for each 
replicate. If you run out of space, you are out of luck. Any potential speedups 
will have to rely on special structure of the covariance.

   Thanks.
 Hannah
 
 ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.cz‹´µÀ£º
 
 On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
 Dear all,
  I need to generate numbers from multivariate normal with large
 dimensions
 (5,000,000).
 Below is my code and the error I got from R. Sigma in the code is the
 covariance
 matrix. Can anyone give some idea on how to take care of this error.
 Thank
 you.
Hannah
 
 m - 500
 m1 - 0.5*m
 rho - 0.5
 Sigma - rho* matrix(1, m, m)+diag(1-rho, m)
 Error in matrix(1, m, m) : too many elements specified
 
 Hi.
 
 The matrix of dimension m times m does not fit into memory,
 since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.
 
 Generating a multivariate normal with a covariance matrix
 with 1 on the diagonal and rho outside of the diagonal may
 be done also as follows.
 
 m - 10 # can be 500
 rho - 0.5
 # single vector
 x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))
 
 # several vectors
 a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 -
 rho
 
 # check the sample covariance matrix if m is not too large
 sigma - cov(a)
 range(diag(sigma)) # elements on the diagonal
 
 [1] 0.9963445 1.0196015
 
 diag(sigma) - NA
 range(sigma, na.rm=TRUE) # elements outside of the diagonal
 
 [1] 0.4935129 0.5162836
 
 Generating several vectors using replicate() may not be efficient.
 The following can be used instead.
 
 n - 1
 a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n,
 sd=sqrt(rho))
 
 Note that the size of a is n times m and it should fit into the memory.
 
 Hope this helps.
 
 Petr Savicky.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread David Winsemius



On Feb 19, 2012, at 11:46 AM, li li wrote:


Actually I still get an error for the case of equal correlation.


No. You need to read the error message for meaning.


Below is the code


m - 10
n - 500
m1 - 0.5*m
mu - c(rep(2, m1), rep(0, m-m1))
rho - 0.5
x.mod1 - matrix(rnorm(n*m, sd=sqrt(1-rho)), nrow=n, ncol=m)+rnorm(n,

sd=sqrt(rho))+t(replicate(n,mu))
Error: cannot allocate vector of size 381.5 Mb


You don't have enough space in your machine (whatever it might be).

--  
David.



R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error  
code=12)

*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error  
code=12)

*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(173,0xa0a7b540) malloc: *** mmap(size=43072) failed (error  
code=12)

*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug






ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç10:53£¬li li hannah@gmail.comÐ´µÀ£º



Petr,
  Thanks for the help. That certainly makes sense and solves my  
current
problem. But, in general, for arbitrary covariance matrix (instead  
of the
special equi-correlation case), it there a way to generate numbers  
from

multivariate normal when the dimension is very high?
  Thanks.
Hannah

ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky savi...@cs.cas.czÐ´µÀ£º

On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:

Dear all,
 I need to generate numbers from multivariate normal with large

dimensions

(5,000,000).
Below is my code and the error I got from R. Sigma in the code is  
the

covariance
matrix. Can anyone give some idea on how to take care of this  
error.

Thank

you.
   Hannah


m - 500
m1 - 0.5*m
rho - 0.5
Sigma - rho* matrix(1, m, m)+diag(1-rho, m)

Error in matrix(1, m, m) : too many elements specified


Hi.

The matrix of dimension m times m does not fit into memory,
since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.

Generating a multivariate normal with a covariance matrix
with 1 on the diagonal and rho outside of the diagonal may
be done also as follows.

m - 10 # can be 500
rho - 0.5
# single vector
x - rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

# several vectors
a - t(replicate(1, rnorm(1, sd=sqrt(rho)) + rnorm(m,  
sd=sqrt(1 -

rho

# check the sample covariance matrix if m is not too large
sigma - cov(a)
range(diag(sigma)) # elements on the diagonal

[1] 0.9963445 1.0196015

diag(sigma) - NA
range(sigma, na.rm=TRUE) # elements outside of the diagonal

[1] 0.4935129 0.5162836

Generating several vectors using replicate() may not be efficient.
The following can be used instead.

n - 1
a - matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n,
sd=sqrt(rho))

Note that the size of a is n times m and it should fit into the  
memory.


Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread Petr Savicky

On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
 Dear all,
   I need to generate numbers from multivariate normal with large dimensions
 (5,000,000).

Hi.

I am replying to your first email, since the other did not arrive
to my folder, possibly filtered out by a spam filter. I see them
at the web interface.

1. Error: cannot allocate vector of size 381.5 Mb

The error message makes sense. The matrix requires m*n*8/2^20 MB,
which is in your case

  m - 10
  n - 500
  m*n*8/2^20

  [1] 381.4697

May be, you already have other large objects in the memory. Try to
minimize the number and size of objects, which you need simultaneously
in an R session.

2. Generating a multivariate normal distribution.

As Peter Dalgaard pointed out, a speed up is possible only
for special types of the covariance matrix Sigma. A general
way is to find a matrix A such that A A^t = Sigma. Then,
the vector A X, where X is from N(0,I) and I is an identity
matrix of an appropriate dimension, has covariance Sigma.
This is also the way, how mvtnorm package works.

A speed up is possible, if computing the product A X does not
require to have matrix A explicitly represented in memory.

The matrix A need not be a square matrix. In particular, the
previous case may be understood as using the matrix A, which
for a small m is as follows.

  m - 5
  rho - 0.5
  A - cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
  A


[,1]  [,2]  [,3]  [,4]  [,5]  [,6]
  [1,] 0.7071068 0.7071068 0.000 0.000 0.000 0.000
  [2,] 0.7071068 0.000 0.7071068 0.000 0.000 0.000
  [3,] 0.7071068 0.000 0.000 0.7071068 0.000 0.000
  [4,] 0.7071068 0.000 0.000 0.000 0.7071068 0.000
  [5,] 0.7071068 0.000 0.000 0.000 0.000 0.7071068

  A %*% t(A)

   [,1] [,2] [,3] [,4] [,5]
  [1,]  1.0  0.5  0.5  0.5  0.5
  [2,]  0.5  1.0  0.5  0.5  0.5
  [3,]  0.5  0.5  1.0  0.5  0.5
  [4,]  0.5  0.5  0.5  1.0  0.5
  [5,]  0.5  0.5  0.5  0.5  1.0

This construction is conceptually possible also for a large m because
the structure of A allows to compute A X by simpler operations with
the vector X than an explicit matrix product. Namely, the expression

  rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

or, more clearly,

  sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)

is equivalent to the required A X, where X consists of rnorm(1)
and rnorm(m) together.

If you have a specific Sigma, describe it and we can discuss,
whether an appropriate A can be found.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread Bert Gunter

Folks:

Perhaps I am naive, ignorant, or foolish, but this whole discussion
seems rather ridiculous. What possible relation to reality could a
multivariate normal of the size requested have? Even for simulation,
it seems like nonsense.

Cheers,
Bert

On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky savi...@cs.cas.cz wrote:
 On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
 Dear all,
   I need to generate numbers from multivariate normal with large dimensions
 (5,000,000).

 Hi.

 I am replying to your first email, since the other did not arrive
 to my folder, possibly filtered out by a spam filter. I see them
 at the web interface.

 1. Error: cannot allocate vector of size 381.5 Mb

 The error message makes sense. The matrix requires m*n*8/2^20 MB,
 which is in your case

  m - 10
  n - 500
  m*n*8/2^20

  [1] 381.4697

 May be, you already have other large objects in the memory. Try to
 minimize the number and size of objects, which you need simultaneously
 in an R session.

 2. Generating a multivariate normal distribution.

 As Peter Dalgaard pointed out, a speed up is possible only
 for special types of the covariance matrix Sigma. A general
 way is to find a matrix A such that A A^t = Sigma. Then,
 the vector A X, where X is from N(0,I) and I is an identity
 matrix of an appropriate dimension, has covariance Sigma.
 This is also the way, how mvtnorm package works.

 A speed up is possible, if computing the product A X does not
 require to have matrix A explicitly represented in memory.

 The matrix A need not be a square matrix. In particular, the
 previous case may be understood as using the matrix A, which
 for a small m is as follows.

  m - 5
  rho - 0.5
  A - cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
  A


            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
  [1,] 0.7071068 0.7071068 0.000 0.000 0.000 0.000
  [2,] 0.7071068 0.000 0.7071068 0.000 0.000 0.000
  [3,] 0.7071068 0.000 0.000 0.7071068 0.000 0.000
  [4,] 0.7071068 0.000 0.000 0.000 0.7071068 0.000
  [5,] 0.7071068 0.000 0.000 0.000 0.000 0.7071068

  A %*% t(A)

       [,1] [,2] [,3] [,4] [,5]
  [1,]  1.0  0.5  0.5  0.5  0.5
  [2,]  0.5  1.0  0.5  0.5  0.5
  [3,]  0.5  0.5  1.0  0.5  0.5
  [4,]  0.5  0.5  0.5  1.0  0.5
  [5,]  0.5  0.5  0.5  0.5  1.0

 This construction is conceptually possible also for a large m because
 the structure of A allows to compute A X by simpler operations with
 the vector X than an explicit matrix product. Namely, the expression

  rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

 or, more clearly,

  sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)

 is equivalent to the required A X, where X consists of rnorm(1)
 and rnorm(m) together.

 If you have a specific Sigma, describe it and we can discuss,
 whether an appropriate A can be found.

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help

2012-02-19 Thread Rolf Turner


On 20/02/12 09:55, Bert Gunter wrote:

Folks:

Perhaps I am naive, ignorant, or foolish, but this whole discussion
seems rather ridiculous. What possible relation to reality could a
multivariate normal of the size requested have? Even for simulation,
it seems like nonsense.


Right on, bro'!!!

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help

2012-02-18 Thread li li

Dear all,
  I need to generate numbers from multivariate normal with large dimensions
(5,000,000).
Below is my code and the error I got from R. Sigma in the code is the
covariance
matrix. Can anyone give some idea on how to take care of this error. Thank
you.
Hannah

 m - 500
 m1 - 0.5*m
 rho - 0.5
 Sigma - rho* matrix(1, m, m)+diag(1-rho, m)
Error in matrix(1, m, m) : too many elements specified
 mu - c(rep(2, m1), rep(0, m-m1))
 x.mod1 - mvrnorm(n = 1, mu, Sigma, tol = 1e-6, empirical = FALSE)
Error in mvrnorm(n = 1, mu, Sigma, tol = 1e-06, empirical = FALSE) :
  incompatible arguments

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 108, Issue 16

2012-02-16 Thread Gian Maria Niccolò Benucci

Hi Petr,

You advice options(scipen=20) gave me the expected result and fix the
problem.
Thanks a lot!

Gian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 108, Issue 13

2012-02-13 Thread 丁飞

dear:
i want to know how to get a survival curve of the Cox proportional risk 
regression, thank you. 





 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 108, Issue 13

2012-02-13 Thread Milan Bouchet-Valat

Le lundi 13 février 2012 à 19:48 +0800, 丁飞 a écrit :
 dear:
 i want to know how to get a survival curve of the Cox proportional risk 
 regression, thank you. 
See the relevant part of the Survival Task View here:
http://cran.r-project.org/web/views/Survival.html

And more specifically, see the coxph() function in the survival package.
A tutorial is available from:
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf


(But RSiteSearch(coxph) or a simple Google search will give you many
more results.)

Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

< 1 2 3 4 5 6 7 >

201 - 300 of 641 matches

Mail list logo