[R] On the calulation of crossed differences

2013-01-28 Thread Francesca
Dear Contributors,
I am back asking for help concerning the same type of dataset I was asking
before, in a previous help request.

I needed to sum data over subsample of three time series each of them made
of 100 observations. The solution proposed
were various, among which:

db<-p

dim( db ) <- c(25,4,3)

db2 <- apply(db, c(2,3), sum)

db3 <- t(apply(db2, 1, function(poff) 100-(100*abs(poff/sum(poff)-(1/3))) )
)

My request now is about the function at the end of the calculation in db3.

IF instead of the difference from a number, here 1/3, I need to calculate
the following difference: consider that db3 is a matrix 4x3,
I need to calculate

(db3[1,1] -db3[1,2])+(db3[1,1] -db3[1,3])*0.5 and store it to a cell,
then

(db3[1,2] -db3[1,1])+(db3[1,2] -db3[1,3])*0.5 and store it to a cell,
then

(db3[1,3] -db3[1,2])+(db3[1,3] -db3[1,2])*0.5 and store it to a cell,
then

repeat this for each of the four row of the same matrix. The resulting
matrix should be composed of these distances.

I need to repeat this for each of the subsamples. I realize that there
arecalculations that are repeated but I did not find a strategy that does
not require

Francesca


--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thank you your help and one more question.

2013-01-28 Thread arun
HI,

How do you want to combine the results?
It looks like the 5 datasets are list elements.

If I take the first three list elements,
imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", 
"HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 
31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 
31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 
31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")))
#It could be combined by:
do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be 
the sum of the number of rows of each dataset.

I guess you want something like this:

res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3)
 names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="")
 res
#  ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3
#1  HM001  1223   24.9   24.9   24.9
#2  HM001  1224   25.2   25.2   25.2
#3  HM001  1225   25.5   25.5   25.5
#4  HM001  1226   25.24132   25.54828   25.46838
#5  HM001  1227   25.7   25.7   25.7
#6  HM001  1228   27.1   27.1   27.1
#7  HM001  1229   27.3   27.3   27.3
#8  HM001  1230   27.4   27.4   27.4
#9  HM001  1231   28.4   28.4   28.4
#10 HM001  1232   29.2   29.2   29.2
#11 HM001  1233   30.13770   29.89770   30.88185
#12 HM001  1234   31.17251   31.35045   31.57952
#13 HM001  1235   32.4   32.4   32.4
#14 HM001  1236   33.7   33.7   33.7
#15 HM001  1237   34.3   34.3   34.3
A.K.








From: 남윤주 
To: arun  
Sent: Monday, January 28, 2013 7:35 PM
Subject: Thank you your help and one more question.

http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q#
I deeply appreciate your help. Answering your question, I am software engineer. 
And I am developing system accumulating data to draw chart and table.
For higher perfromance, I have to deal missing value treatment.  So, I use 
Amelia Pacakge. Below is the result follwing your answer.

>temp2    #origin data
 ID CTIME WEIGHT
1  HM001  1223   24.9
2  HM001  1224   25.2
3  HM001  1225   25.5
4  HM001  1226 NA
5  HM001  1227   25.7
6  HM001  1228   27.1
7  HM001  1229   27.3
8  HM001  1230   27.4
9  HM001  1231   28.4
10 HM001  1232   29.2
11 HM001  1233 1221.0
12 HM001  1234 NA
13 HM001  1235   32.4
14 HM001  1236   33.7
15 HM001  1237   34.3 
> temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)
 >temp2    # After eliminating strange value
  ID CTIME WEIGHT
1  HM001  1223   24.9
2  HM001  1224   25.2
3  HM001  1225   25.5
4  HM001  1226 NA
5  HM001  1227   25.7
6  HM001  1228   27.1
7  HM001  1229   27.3
8  HM001  1230   27.4
9  HM001  1231   28.4
10 HM001  1232   29.2
11 HM001  1233 NA
12 HM001  1234 NA
13 HM001  1235   32.4
14 HM001  1236   33.7
15 HM001  1237   34.3
-- 
I have One more question. Below are codes and results.
--
> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)
-- Imputation 1 --
 1  2  3  4 
-- Imputation 2 --
 1  2  3 
-- Imputation 3 --
 1  2  3  4 
-- Imputation 4 --
 1  2  3 
-- Imputation 5 --
 1  2  3 

> a.out2$imputations
$imp1
  ID CTIME   WEIGHT
1  HM001  1223 24.9
2  HM001  1224 25.2
3  HM001  1225 25.5
4  HM001  1226 25.24132
5  HM001  1227 25.7
6  HM001  1228 27.1
7  HM001  1229 27.3
8  HM001  1230 27.4
9  HM001  1231 28.4
10 HM001  1232 29.2
11 HM001  1233 30.13770
12 HM001  1234 31.17251
13 HM001  1235 32.4
14 HM001  1236 33.7
15 HM001  1237 34.3
$imp2
  ID CTIME   WEIGHT
1  HM001  1223 24.9
2  HM001  1224 25.2
3  HM001  1225 25.5
4  HM001  1226 25.54828
5  HM001  1227 25.7
6  HM001  1228 2

[R] NA and Character(0) in List Element

2013-01-28 Thread Benjamin Ward (ENV)
Hi, This is probably a small query but one I'm struggling with: I have a list 
in which I had elements which were NA, I removed them, by doing: list2 <- 
lapply(list, na.omit),

However this leaves the element there with  'character(0)' in place as well as 
attributes:

e.g.
[[978]]
character(0)
attr(,"na.action")
[1] 1
attr(,"class")
[1] "omit"


 I want to get rid of these elements/positions in the list, since a function is 
supposed to sample the list for elements (each element is a collection of about 
20 numbers each).

Thanks,

Ben W.

UEA (ENV) - b.w...@uea.ac.uk



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pivot

2013-01-28 Thread farnoosh sheikhi
Hi,

I have a data set as follow:

X         Z
x1        102
x2        102
x2        102
x2        77
x3        23
 
I need to pivot this data as follows and assign the values based on frequency 
of column Z:

X       Z.102   Z.77 Z.23
x1          1        0        0
x2          21  0
x3         00  1

Thanks.

Best,Farnoosh Sheikhi
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variability Plot For Toray Microarray Data

2013-01-28 Thread Jim Lemon

On 01/28/2013 09:42 PM, Peverall Dubois wrote:

Is there any package that allow you to perform "MA plot" like graph
for Toray microarray data?


Unlike Affymetrix CEL file which contain 2 values (R and G),
Torray raw data only contain 1 value.

MA-plot is Affymetrix specific which usually available for in  (limma)
package.



Hi Peverall,
Have a look at "plotMA" in the limma package.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using relaimpo or relimp with PLM and GLS

2013-01-28 Thread Richard Asturia
Dears,

Unfortunatelly, the packages relaimpo and relimp do not seem to work with
plm function (plm package) or gls function (in nlm package). I've been
studying on how to adapt one of them for this pourpose. In that sense, I
have two questions regarding to this work:

1) have anyone hard of any workaround for those incompatibilities, or at
least of any ideas on that - especially for plm?

2) actually, it would make theoretical sense to adapt those packages for
plm or gls? It seems to me that it would for plm, but not necessarily for
gls due to the likelihood framework it works on. Anyone?

Thanks in advance!
Richard A.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thank you your help and one more question.

2013-01-28 Thread arun
Hi,

I think I understand your mistake.
imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", 
"HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377,
31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977,
31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001",
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9,
25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185,
31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")))
 imput<- list(imput1_2_3[1],imput1_2_3[2],imput1_2_3[3]) #what you tried.  You 
should use [[ ]]instead of [].  Here, it is not necessary
aggregate(.~ID+CTIME,data=imput,mean)
#Error in eval(expr, envir, enclos) : object 'ID' not found

#You don't need the above step.
class(imput1_2_3) #already a list
[1] "list"
 imput<-do.call(rbind,imput1_2_3)
 aggregate(.~ID+CTIME,data=imput,mean)
  #    ID CTIME   WEIGHT
#1  HM001  1223 24.9
#2  HM001  1224 25.2
#3  HM001  1225 25.5
#4  HM001  1226 25.41933
#5  HM001  1227 25.7
#6  HM001  1228 27.1
#7  HM001  1229 27.3
#8  HM001  1230 27.4
#9  HM001  1231 28.4
#10 HM001  1232 29.2
#11 HM001  1233 30.30575
#12 HM001  1234 31.36749
#13 HM001  1235 32.4
#14 HM001  1236 33.7
#15 HM001  1237 34.3
A.K.











From: 남윤주 
To: arun  
Sent: Tuesday, January 29, 2013 12:04 AM
Subject: Re: Thank you your help and one more question.


I decided to follow aggregate(). So i install library(plyr). 
But, While executing this statement 'res <- aggregate(.~ID+CIME, 
data=input,mean)', Error was occcured.
What should I do next time?
 > library(plyr) 
> a.out2$imputations
$imp1
  ID   CTIME ACTIVE_KWH
1  HM001 2.01212e+11   24.2
2  HM001 2.01212e+11   25.5
3  HM001 2.01212e+11   25.6
4  HM001 2.01212e+11   25.90065
5  HM001 2.01212e+11   26.6
6  HM001 2.01212e+11   26.7
7  HM001 2.01212e+11   27.1
8  HM001 2.01212e+11   27.4
9  HM001 2.01212e+11   27.5
10 HM001 2.01212e+11   27.8
11 HM001 2.01212e+11   28.2
12 HM001 2.01212e+11   28.44605
13 HM001 2.01212e+11   28.7
14 HM001 2.01212e+11   28.9
15 HM001 2.01212e+11   29.1
$imp2
  ID   CTIME ACTIVE_KWH
1  HM001 2.01212e+11   24.2
2  HM001 2.01212e+11   25.5
3  HM001 2.01212e+11   25.6
4  HM001 2.01212e+11   25.87163
5  HM001 2.01212e+11   26.6
6  HM001 2.01212e+11   26.7
7  HM001 2.01212e+11   27.1
8  HM001 2.01212e+11   27.4
9  HM001 2.01212e+11   27.5
10 HM001 2.01212e+11   27.8
11 HM001 2.01212e+11   28.2
12 HM001 2.01212e+11   28.68048
13 HM001 2.01212e+11   28.7
14 HM001 2.01212e+11   28.9
15 HM001 2.01212e+11   29.1 
> imput <- list(a.out2$imputations[1], a.out2$imputations[2])
> do.call(rbind, imput)

[[1]]
[[1]]$imp1
  ID   CTIME ACTIVE_KWH
1  HM001 2.01212e+11   24.2
2  HM001 2.01212e+11   25.5
3  HM001 2.01212e+11   25.6
4  HM001 2.01212e+11   25.90065
5  HM001 2.01212e+11   26.6
6  HM001 2.01212e+11   26.7
7  HM001 2.01212e+11   27.1
8  HM001 2.01212e+11   27.4
9  HM001 2.01212e+11   27.5
10 HM001 2.01212e+11   27.8
11 HM001 2.01212e+11   28.2
12 HM001 2.01212e+11   28.44605
13 HM001 2.01212e+11   28.7
14 HM001 2.01212e+11   28.9
15 HM001 2.01212e+11   29.1

[[2]]
[[2]]$imp2
  ID   CTIME ACTIVE_KWH
1  HM001 2.01212e+11   24.2
2  HM001 2.01212e+11   25.5
3  HM001 2.01212e+11   25.6
4  HM001 2.01212e+11   25.87163
5  HM001 2.01212e+11   26.6
6  HM001 2.01212e+11   26.7
7  HM001 2.01212e+11   27.1
8  HM001 2.01212e+11   27.4
9  HM001 2.01212e+11   27.5
10 HM001 2.01212e+11   27.8
11 HM001 2.01212e+11   28.2
12 HM001 2.01212e+11   28.68048
13 HM001 2.01212e+11   28.7
14 HM001 2.01212e+11   28.9
15 HM001 2.01212e+11   29.1
> res <- aggregate(.~ID+CTIME, data=imput,mean)
Follwing Error. eval(expr, envir, enclos) : no element 'ID'  # I transfer 
this line in english because it was written by my mother language.

-Original

[R] Repeated Measures and Dependen pre-test variable

2013-01-28 Thread Camilo Vieira Mejía

Hi,
I have an experiment where I measured two times the dependent variable (called 
Intention). So, I have IntentionPre and IntentionPost.
I am working with an anova for an independent variable called 
NumberOfModules.
The problem is that if I run the aov(intentionpre~NumberOfModules) I have that 
there is a significant difference between the means.
I want to know the difference between pre and post intentions based on the 
NumberOfModules. So, I've been told that I should make an ancova and the 
IntentionPre variable will work as a covariable, but I don't really know how 
should organize my data in order to get that.
My dataframe currently has the following columns: Email (userId), Intention 
(pre+post values), constrast (is it pre or post value?), and 
NumberOfModules.
What I would do if there were no significant difference between the means for 
the intentionpre, would be:
aov(intention~constrats*NumberOfModules+Error(Email/constrats), 
gdataframe)
What I was thinking to do is adding a column with  IntentionPre 
(duplicated for every pre and post value for the Intention column) and 
running it like:
aov(intention~intentionPre + 
constrats*NumberOfModules+Error(Email/constrats), gdataframe)
Is that a correct reasoning?
Any help would be great!!

Camilo V  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thank you your help and one more question.

2013-01-28 Thread arun


HI,

I don't have Amelia package installed.


If you want to get the mean value, you could use either ?aggregate(),  or 
?ddply() from library(plyr)

library(plyr)
imputNew<-do.call(rbind,imput1_2_3)
 res1<-ddply(imputNew,.(ID,CTIME),function(x) mean(x$WEIGHT))
 names(res1)[3]<-"WEIGHT"
 head(res1)
 #    ID CTIME   WEIGHT
#1 HM001  1223 24.9
#2 HM001  1224 25.2
#3 HM001  1225 25.5
#4 HM001  1226 25.41933
#5 HM001  1227 25.7
#6 HM001  1228 27.1


#or
res2<-aggregate(.~ID+CTIME,data=imputNew,mean)
#or
res3<-  do.call(rbind,lapply(split(imputNew,imputNew$CTIME),function(x) 
{x$WEIGHT<-mean(x[,3]);head(x,1)}))
row.names(res3)<-1:nrow(res3)
identical(res1,res2)
#[1] TRUE
 identical(res1,res3)
#[1] TRUE
A.K.



From: 남윤주 
To: arun  
Sent: Monday, January 28, 2013 9:47 PM
Subject: Re: Thank you your help and one more question.


Thank you for replying my question.
What I want is the matrix like below.
I have 3 data sets that named weightimp1, 2, 3. 
And, to get the matrix like below, I have to combine 3 data sets(named 
weightimp1, 2, 3).
I don't know how to 3data sets combined. It could be mean of 3 data set. Or, 
there might be a value(temp2$imputations$...) in Amelia package.
I prefer to use Amelia package method, but if it dosen't exist, can u recommend 
how to set as a mean value? 

#  ID CTIME WEIGHT (It represents 3 data sets(weightimp1, 2, 3)
#1  HM001  1223   24.9   
#2  HM001  1224   25.2 
#3  HM001  1225   25.5  
#4  HM001  1226   25.24132  
#5  HM001  1227   25.7   
#6  HM001  1228   27.1   
#7  HM001  1229   27.3   
#8  HM001  1230   27.4  
#9  HM001  1231   28.4   
#10 HM001  1232   29.2  
#11 HM001  1233   30.13770   
#12 HM001  1234   31.17251   
#13 HM001  1235   32.4   
#14 HM001  1236   33.7   
#15 HM001  1237   34.3   
-Original Message-
From: "arun" 
To: "남윤주"; 
Cc: "R help"; 
Sent: 2013-01-29 (화) 11:25:38
Subject: Re: Thank you your help and one more question.

HI,

How do you want to combine the results?
It looks like the 5 datasets are list elements.

If I take the first three list elements,
imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", 
"HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 
31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 
31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")),
imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", 
"HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 
25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 
31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15")))
#It could be combined by:
do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be 
the sum of the number of rows of each dataset.

I guess you want something like this:

res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3)
 names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="")
 res
#  ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3
#1  HM001  1223   24.9   24.9   24.9
#2  HM001  1224   25.2   25.2   25.2
#3  HM001  1225   25.5   25.5   25.5
#4  HM001  1226   25.24132   25.54828   25.46838
#5  HM001  1227   25.7   25.7   25.7
#6  HM001  1228   27.1   27.1   27.1
#7  HM001  1229   27.3   27.3   27.3
#8  HM001  1230   27.4   27.4   27.4
#9  HM001  1231   28.4   28.4   28.4
#10 HM001  1232   29.2   29.2   29.2
#11 HM001  1233   30.13770   29.89770   30.88185
#12 HM001  1234   31.17251   31.35045   31.57952
#13 HM001  1235   32.4   32.4   32.4
#14 HM001  1236   33.7   33.7   33.7
#15 HM001  1237   34.3   34.3   34.3
A.K.








From: 남윤주 @naver.com>
To: arun @yahoo.com> 
Sent: Monday, January 28, 2013 7:35 PM
Subject: Thank you your help and one more question.

http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q#
I deeply appreciate your help. Answering your question, I am software engineer. 
And I am developing sy

[R] gigFit problems

2013-01-28 Thread Raymond Rogers
Hi, I am having some problems with gigFit and would like confirmation on 
other platforms; mine is mint; basically Debian.
Although I got a good fit for the density function with the GIG equation 
in another curve fitting program, I would really like to use R's tools 
for confidence intervals and manipulations; but the problems below make 
me uneasy.


Problem one (from examples with parameter change):
Code:
paramStart.X2002<-c(2.044, .622, 2)
param<-paramStart.X2002
dataVector <- rgig(500, param = param)
gigFit(dataVector)
Result
Data:  dataVector
Parameter estimates:
 chi   psilambda
0.007543  0.722962  2.535284
Likelihood: -644.6349
Method: Nelder-Mead
Convergence code:   0
Iterations: 104
Notice that the chi parameter is significantly off.
---
Problem two
I have data that I have fit to the GIG equation in a curve fitting 
program with good fits and resulting parameters very close to the above 
paramStart.X2002
The gigFit program on the same data goes completely off into strange 
territory.  I hope that fixing the above problem would fix this.


Problem three:
When I do the above with plot
gigFit(dataVector,plot=TRUE)
I get a warning
Warning message:
"
In incompleteBesselK(x[i], y[i], -lambda, tol = ibfTol, nmax = nmax) :
  Maximum order exceeded
Result may be unreliable
"
Ray
-- This space is for my comments on the world
(Intentionally left blank.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R plot like candlestick

2013-01-28 Thread Denis Francisci
Hi all,
I'm new on this list so I greet all.
My question is: does exist in R a plot similar to candlestick plot but
not based on xts (time series)? I have to plot a range of 4 value: for
every item I have min value, max value and 2 intermediate values. I
would like plot this like a candlestick, i.e. with a box between 2
intermediate values and 1 segment between box and min value and a
segment between box and max value. Candlestick plot is provided by
quantmod package, but it is very specific for financial purpose and it
uses time series for x axis. I need a simpler method for plotting my
data that are stored in a table like this:

item, min, int_1, int_2, max
a, 2.5, 3, 4, 5.5
b, 2, 3.5, 4, 4.5
c, 3.5, 4, 4.5, 5
.

Does anyone know if there is an R-plot for this purpose?
Thank you very much,

D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can I define a object array in R?

2013-01-28 Thread cuiyan
Here is my problem,
100 decision trees were built(similar to random forest) and I want to
replace some of them by new trees.
How can I define a tree array including 100 trees, i.e. t[100], and every
t[n] is an "C5.0" object, 
such that
when a new tree comes,  i can do 
n<-10
t[n]<-C5.0(...)



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-I-define-a-object-array-in-R-tp4656909.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] converting XML document to table or dataframe

2013-01-28 Thread Anika Masters
I am a relatively new user to R, and I am trying to learn more about
converting data in an XML document into "2-dimensional format" such as a
table or array.  I might eventually wish to export this data into a
relational database such as SQL, and/or to work with this data within the R
package.

My sample XML document is located at "
http://www.sec.gov/Archives/edgar/data/743988/000124636013000561/form.xml";

I have successfully import the XML document and then converted the XML
document to a list.

I am "stuck" trying to convert the document into a "2-dimenional" table or
dataframe.

What is a "good" way to convert the XML document to a 2-dimensional table
or data.frame?  Ideally, I'd like a table with 1 row for each XML document,
and unique fieldnames.  If fieldnames repeat, I'd like the names to be
numbered sequentially

e.g.
$nonDerivativeTable$nonDerivativeTransaction$transactionAmounts$transactionPricePerShare$value_1
$nonDerivativeTable$nonDerivativeTransaction$transactionAmounts$transactionPricePerShare$value_2
$nonDerivativeTable$nonDerivativeTransaction$transactionAmounts$transactionPricePerShare$value_3

etc




myxml = xmlParse("
http://www.sec.gov/Archives/edgar/data/743988/000124636013000561/form.xml";)
mylist <- xmlToList(mydoc)
mydf <- xmlToDataFrame(mydoc)
mydf2 <- data.frame(mylist)
mytable <- as.table(mylist)
mydf2 <- data.frame(mydoc)
mytable <- as.table(mydoc)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fw: rpart

2013-01-28 Thread David Winsemius


On Jan 28, 2013, at 9:06 PM, carol white wrote:


Should I understand that this message was received?


It's always possible to check the Archives for this question.

--  
David.


Thanks


- Forwarded Message -
From: carol white 
To: "r-h...@stat.math.ethz.ch" 
Sent: Sunday, January 27, 2013 8:31 PM
Subject: rpart


Hi,
When I look at the summary of an rpart object run on my data, I get  
7 nodes but when I plot the rpart object, I get only 3 nodes. Should  
the number of nodes not match in the results of the 2 functions  
(summary and plot) or it is not always the same?


Look forward to your reply,

Carol

 summary(rpart.res)
Call:
rpart(formula = mydata$class ~ ., data = as.data.frame(t(mydata)))
  n= 62

 CP nsplit rel errorxerror  xstd
1 0.6363636  0 1.000 1.000 0.1712469
2 0.1363636  1 0.3636364 0.6818182 0.1532767
3 0.010  2 0.2272727 0.7727273 0.1596659

Variable importance
  Hsa.627   Hsa.692 Hsa.692.2  Hsa.3306   Hsa.601   Hsa.831  Hsa. 
1832  Hsa.2456

   19
13111010 8 6 6
 Hsa.8147  Hsa.1131 Hsa.692.1
6 5 5

Node number 1: 62 observations,complexity param=0.6363636
  predicted class=t  expected loss=0.3548387  P(node) =1
class counts:2240
   probabilities: 0.355 0.645
  left son=2 (14 obs) right son=3 (48 obs)
  Primary splits:
  Hsa.627   < 59.83to the left,  improve=15.05376, (0
missing)
  Hsa.8147  < 1696.23  to the right, improve=14.46790, (0 missing)
  Hsa.37937 < 379.39   to the right, improve=13.75358, (0 missing)
  Hsa.692.2 < 842.305  to the right, improve=12.38710, (0 missing)
  Hsa.1832  < 735.805  to the right, improve=11.90495, (0 missing)
  Surrogate splits:
  Hsa.692.2 < 1086.655 to the right, agree=0.903, adj=0.571, (0  
split)
  Hsa.3306  < 170.515  to the left,  agree=0.887, adj=0.500, (0  
split)
  Hsa.601   < 88.065   to the left,  agree=0.887, adj=0.500, (0  
split)

  Hsa.692   < 1251.99  to the right, agree=0.871, adj=0.429, (0
split)
  Hsa.831   < 281.54   to the left,  agree=0.871, adj=0.429, (0  
split)


Node number 2: 14 observations
  predicted class=n  expected loss=0  P(node) =0.2258065
class counts:14 0
   probabilities: 1.000 0.000

Node number 3: 48 observations,complexity param=0.1363636
  predicted class=t  expected loss=0.167  P(node) =0.7741935
class counts: 840
   probabilities: 0.167 0.833
  left son=6 (7 obs) right son=7 (41 obs)
  Primary splits:
  Hsa.8147  < 1722.605 to the right, improve=4.915215, (0 missing)
  Hsa.1832  < 681.145  to the right, improve=4.915215, (0
missing)
  Hsa.1410  < 49.985   to the left,  improve=4.915215, (0 missing)
  Hsa.2456  < 186.195  to the right, improve=4.915215, (0 missing)
  Hsa.11616 < 969.085  to the right, improve=4.915215, (0 missing)
  Surrogate splits:
  Hsa.1832  < 681.145  to the right, agree=1.000, adj=1.000, (0  
split)
  Hsa.2456  < 186.195  to the right, agree=1.000, adj=1.000, (0  
split)
  Hsa.692   < 1048.375 to the right, agree=0.979, adj=0.857, (0  
split)
  Hsa.692.1 < 1136.75  to the right, agree=0.979, adj=0.857, (0  
split)
  Hsa.1131  < 1679.54  to the right, agree=0.979, adj=0.857, (0  
split)


Node
number 6: 7 observations
  predicted class=n  expected loss=0.2857143  P(node) =0.1129032
class counts: 5 2
   probabilities: 0.714 0.286

Node number 7: 41 observations
  predicted class=t  expected loss=0.07317073  P(node) =0.6612903
class counts: 338
   probabilities: 0.073 0.927  
__

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fw: rpart

2013-01-28 Thread carol white
Should I understand that this message was received?

Thanks


- Forwarded Message -
From: carol white 
To: "r-h...@stat.math.ethz.ch"  
Sent: Sunday, January 27, 2013 8:31 PM
Subject: rpart
 

Hi,
When I look at the summary of an rpart object run on my data, I get 7 nodes but 
when I plot the rpart object, I get only 3 nodes. Should the number of nodes 
not match in the results of the 2 functions (summary and plot) or it is not 
always the same?

Look forward to your reply,

Carol

 summary(rpart.res)
Call:
rpart(formula = mydata$class ~ ., data = as.data.frame(t(mydata)))
  n= 62 

 CP nsplit rel error    xerror  xstd
1 0.6363636  0 1.000 1.000 0.1712469
2 0.1363636  1 0.3636364 0.6818182 0.1532767
3 0.010  2 0.2272727 0.7727273 0.1596659

Variable importance
  Hsa.627   Hsa.692 Hsa.692.2  Hsa.3306   Hsa.601   Hsa.831  Hsa.1832  Hsa.2456 
   19   
 13    11    10    10 8 6 6 
 Hsa.8147  Hsa.1131 Hsa.692.1 
    6 5 5 

Node number 1: 62 observations,    complexity param=0.6363636
  predicted class=t  expected loss=0.3548387  P(node) =1
    class counts:    22    40
   probabilities: 0.355 0.645 
  left son=2 (14 obs) right son=3 (48 obs)
  Primary splits:
  Hsa.627   < 59.83    to the left,  improve=15.05376, (0
 missing)
  Hsa.8147  < 1696.23  to the right, improve=14.46790, (0 missing)
  Hsa.37937 < 379.39   to the right, improve=13.75358, (0 missing)
  Hsa.692.2 < 842.305  to the right, improve=12.38710, (0 missing)
  Hsa.1832  < 735.805  to the right, improve=11.90495, (0 missing)
  Surrogate splits:
  Hsa.692.2 < 1086.655 to the right, agree=0.903, adj=0.571, (0 split)
  Hsa.3306  < 170.515  to the left,  agree=0.887, adj=0.500, (0 split)
  Hsa.601   < 88.065   to the left,  agree=0.887, adj=0.500, (0 split)
  Hsa.692   < 1251.99  to the right, agree=0.871, adj=0.429, (0
 split)
  Hsa.831   < 281.54   to the left,  agree=0.871, adj=0.429, (0 split)

Node number 2: 14 observations
  predicted class=n  expected loss=0  P(node) =0.2258065
    class counts:    14 0
   probabilities: 1.000 0.000 

Node number 3: 48 observations,    complexity param=0.1363636
  predicted class=t  expected loss=0.167  P(node) =0.7741935
    class counts: 8    40
   probabilities: 0.167 0.833 
  left son=6 (7 obs) right son=7 (41 obs)
  Primary splits:
  Hsa.8147  < 1722.605 to the right, improve=4.915215, (0 missing)
  Hsa.1832  < 681.145  to the right, improve=4.915215, (0
 missing)
  Hsa.1410  < 49.985   to the left,  improve=4.915215, (0 missing)
  Hsa.2456  < 186.195  to the right, improve=4.915215, (0 missing)
  Hsa.11616 < 969.085  to the right, improve=4.915215, (0 missing)
  Surrogate splits:
  Hsa.1832  < 681.145  to the right, agree=1.000, adj=1.000, (0 split)
  Hsa.2456  < 186.195  to the right, agree=1.000, adj=1.000, (0 split)
  Hsa.692   < 1048.375 to the right, agree=0.979, adj=0.857, (0 split)
  Hsa.692.1 < 1136.75  to the right, agree=0.979, adj=0.857, (0 split)
  Hsa.1131  < 1679.54  to the right, agree=0.979, adj=0.857, (0 split)

Node
 number 6: 7 observations
  predicted class=n  expected loss=0.2857143  P(node) =0.1129032
    class counts: 5 2
   probabilities: 0.714 0.286 

Node number 7: 41 observations
  predicted class=t  expected loss=0.07317073  P(node) =0.6612903
    class counts: 3    38
   probabilities: 0.073 0.927 <>__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Best way to plot normalization in R

2013-01-28 Thread Gundala Viswanath
I have a data that looks like this:

 mRNA Value
---
mRNA1  30
mRNA2 199
......   ...  
mRNA1000   13
   

Then I'll normalize the value based on the sum.
What is the best way to show/plot that my normalization works?


Is there any R package for that?

- G.V.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confidence / prediction ellipse

2013-01-28 Thread Rolf Turner


I believe that the value of "radius" that you are using is incorrect. 
If you have a data
matrix X whose columns  are jointly distributed N(mu,Sigma) then a 
confidence

ellipse for mu is determined by

n * (x - Xbar)' S^{-1}(x - Xbar) ~ T^2

where Xbar is the mean vector for X and S is the sample covariance matrix,
and where "T^2" means Hotelling's T-squared distribution, which is equal to

(n-1)*2/(n-2) * F_{2,n-2}

the latter representing the F distribution on 2 and n-2 degrees of freedom.

Thus (I think) your radius should be

radius <- sqrt(2 * (npts-1) * qf(0.95, 2, npts-2)/(npts*(npts-2)))

where npts <- length(a).  Note that it is qf(0.95,2,npts-2) and *NOT*
qf(0.95,2,npts-1).

To get the corresponding *prediction* ellipse simply multiply the foregoing
radius by sqrt(npts+1).  By "prediction ellipse" I mean an ellipse such that
the probability that a new independent observation from the same population
will fall in that ellipse is the given probability (e.g. 0.95). Note 
that this does
not mean that 95% of the data will fall in the calculated ellipse 
(basically because

of the *dependence* between S and the individual observations).

These confidence and prediction ellipses are (I'm pretty sure) valid under
the assumption that the data are (two dimensional, independent) Gaussian,
and that you use the sample covariance and sample mean as "shape" and
"centre".  I don't know what impact your robustification procedure of using
cov.trob() will/would have on the properties of these ellipses.

A script which does the ellipses for your toy data, using the sample 
covariance

and sample mean (rather than output from cov.trob()) is as follows:

#
# Script scr.amatulli
#

require(car)
a <- c(12,12,4,5,63,63,23)
b <- c(13,15,7,10,73,83,43)
npts   <- length(a)
shape  <- var(cbind(a, b))
center <- c(mean(a),mean(b))
rconf  <- sqrt(2 * (npts-1) * qf(0.95, 2, npts-2)/(npts*(npts-2)))
rpred  <- sqrt(npts+1)*rconf

conf.elip <- ellipse(center, shape, rconf,draw = FALSE)
pred.elip <- ellipse(center, shape, rpred,draw = FALSE)
plot(pred.elip, type='l')
points(a,b)
lines(conf.elip,col="red")

cheers,

Rolf Turner

On 01/27/2013 10:12 AM, Giuseppe Amatulli wrote:

Hi,
I'm using the R library(car) to draw confidence/prediction ellipses in a
scatterplot.
>From what i understood  the ellipse() function return an ellipse based
parameters:  shape, center,  radius .
If i read  dataEllipse() function i can see how these parameters are
calculated for a confidence ellipse.

ibrary(car)

a=c(12,12,4,5,63,63,23)
b=c(13,15,7,10,73,83,43)

v <- cov.trob(cbind(a, b))
shape <- v$cov
center <- v$center

radius <- sqrt(2 * qf(0.95, 2, length(a) - 1))   # radius <- sqrt(dfn *
qf(level, dfn, dfd))

conf.elip = ellipse(center, shape, radius,draw = F)
plot(conf.elip, type='l')
points(a,b)

My question is how I can calculate shape, center and radius  to obtain a
prediction ellipses rather than a confidence ellipse?
Thanks in Advance
Giuseppe



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why are the number of coefficients varying? [mgcv][gam]

2013-01-28 Thread Andrew Crane-Droesch

Hi Simon,

Thanks for replying.

On further investigation, I can't reproduce this error on my local 
machine -- it only occurs when sending to a cluster (to run the multiple 
imputations in parallel) that I've got access to.  I send to a friend's 
web server, and I get the same sort of error (but a different set of 
results!) that the cluster gave me.  The seed is set identically across 
the three machines.  gam.check indicates convergence after 16 iterations 
locally, but 21 iterations on both remote machines.  And both remote 
machines give results that penalize the random effects, and the first, 
second and fourth spline terms effectively to zero (res.df ~1e-7).


I then checked versions.  My local machine has mgcv 1.7.22, the cluster 
has 1.7.3, and the server has 1.7.12.  My local machine has R 2.15.1, 
the cluster has 2.12.2, and the server has 2.14.1.  I updated the 
server's R version, and the result was fixed.  Will see if the people 
who manage the cluster can update the cluster.


The tryCatch is there because my imputation models that feed the gams 
are not bug-free.


Thanks anyway for replying.

Best,
Andrew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Ista Zahn
Hi Nate,

I infer from the stat_density2d documentation that the calculation is
carried out by the kde2d function in the MASS package. Refer to ?kde2d
for details.

Best,
Ista

On Mon, Jan 28, 2013 at 3:56 PM, Nathan Miller  wrote:
> Hi Ista,
>
> Thanks. That does look pretty nice and I hadn't realized that was possible.
> Do you know how to extract information regarding those curves? I'd like to
> be able to report something about what portion of the data they encompass or
> really any other feature about them in a figure legend. I'll look into
> stat_density2d and see if I can determine how they are set.
>
> Thanks for your help,
>
> Nate
>
>
> On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn  wrote:
>>
>> Hi Nate,
>>
>> You can make it less busy using the bins argument. This is not
>> documented, except in the examples to stat_contour, but try
>>
>> ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+
>> geom_point()+
>> stat_density2d(bins=2)
>>
>> HTH,
>> Ista
>>
>> On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller 
>> wrote:
>> > Thanks Ista,
>> >
>> > I have played a bit with stat_density2d as well. It doesn't completely
>> > capture what I am looking for and ends up being quite busy at the same
>> > time.
>> > I'm looking for a way of helping those looking that the figure to see
>> > the
>> > broad patterns of where in the x/y space the data from different groups
>> > are
>> > distributed. Using the 95% CI type idea is so that I don't end up
>> > arbitrarily drawing circles around each set of points. I appreciate your
>> > direction though.
>> >
>> > Nate
>> >
>> >
>> > On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn  wrote:
>> >>
>> >> Hi Nathan,
>> >>
>> >> This only fits some of your criteria, but have you looked at
>> >> ?stat_density2d?
>> >>
>> >> Best,
>> >> Ista
>> >>
>> >> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller
>> >> 
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > I have been looking for means of add a contour around some points in
>> >> > a
>> >> > scatterplot as a means of representing the center of density for of
>> >> > the
>> >> > data. I'm imagining something like a 95% confidence estimate drawn
>> >> > around
>> >> > the data.
>> >> >
>> >> > So far I have found some code for drawing polygons around the data.
>> >> > These
>> >> > look nice, but in some cases the polygons are strongly influenced by
>> >> > outlying points. Does anyone have a thought on how to draw a contour
>> >> > which
>> >> > is more along the lines of a 95% confidence space?
>> >> >
>> >> > I have provided a working example below to illustrate the drawing of
>> >> > the
>> >> > polygons. As I said I would rather have three "ovals"/95% contours
>> >> > drawn
>> >> > around the points by "level" to capture the different density
>> >> > distributions
>> >> > without the visualization being heavily influenced by outliers.
>> >> >
>> >> > I have looked into the code provided here from Hadley
>> >> >
>> >> > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
>> >> > using the mvtnorm package and the dmvnorm function, but haven't been
>> >> > able
>> >> > to get it work for my data example. The calculated densities are
>> >> > always
>> >> > zero (at this step of Hadley's code: dgrid$dens <-
>> >> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
>> >> >
>> >> > I appreciate any assistance.
>> >> >
>> >> > Thanks,
>> >> > Nate
>> >> >
>> >> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
>> >> > seq(0.4,0.6,length.out=30))
>> >> >
>> >> >
>> >> > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
>> >> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
>> >> >
>> >> >
>> >> > find_hull <- function(data) data[chull(data$x, data$y), ]
>> >> > hulls <- ddply(data, .(level), find_hull)
>> >> >
>> >> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
>> >> > fill=level))+geom_point()
>> >> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
>> >> > fig1
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > __
>> >> > R-help@r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Hi Ista,

Thanks. That does look pretty nice and I hadn't realized that was possible.
Do you know how to extract information regarding those curves? I'd like to
be able to report something about what portion of the data they encompass
or really any other feature about them in a figure legend. I'll look into
stat_density2d and see if I can determine how they are set.

Thanks for your help,

Nate


On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn  wrote:

> Hi Nate,
>
> You can make it less busy using the bins argument. This is not
> documented, except in the examples to stat_contour, but try
>
> ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+
> geom_point()+
> stat_density2d(bins=2)
>
> HTH,
> Ista
>
> On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller 
> wrote:
> > Thanks Ista,
> >
> > I have played a bit with stat_density2d as well. It doesn't completely
> > capture what I am looking for and ends up being quite busy at the same
> time.
> > I'm looking for a way of helping those looking that the figure to see the
> > broad patterns of where in the x/y space the data from different groups
> are
> > distributed. Using the 95% CI type idea is so that I don't end up
> > arbitrarily drawing circles around each set of points. I appreciate your
> > direction though.
> >
> > Nate
> >
> >
> > On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn  wrote:
> >>
> >> Hi Nathan,
> >>
> >> This only fits some of your criteria, but have you looked at
> >> ?stat_density2d?
> >>
> >> Best,
> >> Ista
> >>
> >> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller  >
> >> wrote:
> >> > Hi all,
> >> >
> >> > I have been looking for means of add a contour around some points in a
> >> > scatterplot as a means of representing the center of density for of
> the
> >> > data. I'm imagining something like a 95% confidence estimate drawn
> >> > around
> >> > the data.
> >> >
> >> > So far I have found some code for drawing polygons around the data.
> >> > These
> >> > look nice, but in some cases the polygons are strongly influenced by
> >> > outlying points. Does anyone have a thought on how to draw a contour
> >> > which
> >> > is more along the lines of a 95% confidence space?
> >> >
> >> > I have provided a working example below to illustrate the drawing of
> the
> >> > polygons. As I said I would rather have three "ovals"/95% contours
> drawn
> >> > around the points by "level" to capture the different density
> >> > distributions
> >> > without the visualization being heavily influenced by outliers.
> >> >
> >> > I have looked into the code provided here from Hadley
> >> >
> https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
> >> > using the mvtnorm package and the dmvnorm function, but haven't been
> >> > able
> >> > to get it work for my data example. The calculated densities are
> always
> >> > zero (at this step of Hadley's code: dgrid$dens <-
> >> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
> >> >
> >> > I appreciate any assistance.
> >> >
> >> > Thanks,
> >> > Nate
> >> >
> >> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
> >> > seq(0.4,0.6,length.out=30))
> >> >
> >> >
> y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
> >> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
> >> >
> >> >
> >> > find_hull <- function(data) data[chull(data$x, data$y), ]
> >> > hulls <- ddply(data, .(level), find_hull)
> >> >
> >> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
> >> > fill=level))+geom_point()
> >> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
> >> > fig1
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Ista Zahn
Hi Nate,

You can make it less busy using the bins argument. This is not
documented, except in the examples to stat_contour, but try

ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+
geom_point()+
stat_density2d(bins=2)

HTH,
Ista

On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller  wrote:
> Thanks Ista,
>
> I have played a bit with stat_density2d as well. It doesn't completely
> capture what I am looking for and ends up being quite busy at the same time.
> I'm looking for a way of helping those looking that the figure to see the
> broad patterns of where in the x/y space the data from different groups are
> distributed. Using the 95% CI type idea is so that I don't end up
> arbitrarily drawing circles around each set of points. I appreciate your
> direction though.
>
> Nate
>
>
> On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn  wrote:
>>
>> Hi Nathan,
>>
>> This only fits some of your criteria, but have you looked at
>> ?stat_density2d?
>>
>> Best,
>> Ista
>>
>> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller 
>> wrote:
>> > Hi all,
>> >
>> > I have been looking for means of add a contour around some points in a
>> > scatterplot as a means of representing the center of density for of the
>> > data. I'm imagining something like a 95% confidence estimate drawn
>> > around
>> > the data.
>> >
>> > So far I have found some code for drawing polygons around the data.
>> > These
>> > look nice, but in some cases the polygons are strongly influenced by
>> > outlying points. Does anyone have a thought on how to draw a contour
>> > which
>> > is more along the lines of a 95% confidence space?
>> >
>> > I have provided a working example below to illustrate the drawing of the
>> > polygons. As I said I would rather have three "ovals"/95% contours drawn
>> > around the points by "level" to capture the different density
>> > distributions
>> > without the visualization being heavily influenced by outliers.
>> >
>> > I have looked into the code provided here from Hadley
>> > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
>> > using the mvtnorm package and the dmvnorm function, but haven't been
>> > able
>> > to get it work for my data example. The calculated densities are always
>> > zero (at this step of Hadley's code: dgrid$dens <-
>> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
>> >
>> > I appreciate any assistance.
>> >
>> > Thanks,
>> > Nate
>> >
>> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
>> > seq(0.4,0.6,length.out=30))
>> >
>> > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
>> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
>> >
>> >
>> > find_hull <- function(data) data[chull(data$x, data$y), ]
>> > hulls <- ddply(data, .(level), find_hull)
>> >
>> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
>> > fill=level))+geom_point()
>> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
>> > fig1
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adjusted R-squared formula in lm()

2013-01-28 Thread Ben Bolker
Nicole Janz  gmail.com> writes:

> 
> What is the exact formula used in R lm() for the Adjusted R-squared? How can I
interpret it?

>From the code of summary.lm():

   ans$r.squared <- mss/(mss + rss)
ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - 
df.int)/rdf)


  Does that answer your question?

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Requirement for Informatica Administrator [REQ:1066499]

2013-01-28 Thread Binh Vu

   [1]Click here to unsubscribe if you no longer wish to receive our emails

   Dear Recruiter,

   Here   is   ourDirect  client  requirement  which  can  be  filled
   immediately. Kindly respond to this requirement with your consultant resume,
   contact and current location info to speed up the interview process.

   [2]Click  here  to submit for this position online and to speed up the
   process.
   Job Title:

   Informatica Administrator


   Location:
   Albany, NY


   Rate: 77/hr., # of Positions: 1, Duration: 



   Description:

   Mandatory Qualification
   1. 84 Months Demonstrated experience administering a data warehouse ETL tool
   on a UNIX environment.
   2.  60  Months Demonstrated experience on the installation and upgrade
   Informatica Powercenter 7x, 8x, 9x on HP PARisc/Itanium Unix 11.11, 11.31
   environment having Oracle 9i, 10g , 11g repository
   3.  60  Months  Demonstrated  experience  administering users, groups,
   connections, privileges, repository backups and restores, migrating objects
   using Informatica 7x, 8x, 9x on HP PARisc 11.11, Itanium 11.31 environments
   with Oracle 9x, 10g or 11g repository on Dev, QA and Prod environments.
   4. 84 Months Demonstrated experience as a team lead collaborating with
   developers, subject matter experts, and analysts in designing, developing,
   testing and deploying dimensional star schemas in a data warehouse.
   5. 60 Months Demonstrated experience using a ERWIN or ERSTUDIO data modeling
   tool.
   6. 60 months Demonstrated experience collaborating with developers as a team
   lead in designing Informatica ETL architecture having complex mappings,
   mapplets, sessions, worklets and workflows as well as developing, testing
   and implementation of ETL processes in dev, QA and Prod environments.
   7. 24 Months Demonstrated experience on the installation and upgrade and
   support of Informatica Data Quality 8x, 9x, and Informatica Address Doctor
   8x, 9x on HP PARisc/Itanium Unix 11.11, 11.31 environment having Oracle 9i,
   10g , 11g repository. Support includes demonstrated experience collaborating
   with  developers  in  designing  Informatica  Data Quality plans to be
   implemented throughout the enterprise
   8. 84 Months Demonstrated experience on the installation and upgrade and
   support of Oracle Web Server which includes Oracle Application Server 9 or
   higher or OBIEE 11.1 on IBM AIX/Linux environment and troubleshoot and
   resolve critical problems by working with Oracle customer support.
   9. 84 Months Demonstrated experience with Unix Shell scripting, PL/SQL
   programming,  developing  web reports using Oracle Portal or OBIEE Web
   reports.
   10. 60 Months Experience working with data contained in Patient Medical
   Records, Employee Records and Healthcare Provider Records in a State Agency,
   Federal Agency or Private Organization.
   [3]Click  here  to submit for this position online and to speed up the
   process.
   Please  respond  with you consultant resume, contact, rate and current
   location info to speed up the interview process. I will contact you if I
   need further details.

   Thank you,

   Binh Vu

   Fraank Systems | Recruitments, Sales & Marketing |

   Phone: 408-262- Ext: 106|eMail: [4]b...@fraank.com | Fax: 408-262-6662 |
   [5]www.fraank.com |
   [6]Click here to unsubscribe from our mailing list and your name will be
   removed immediately.

 _

   This email is generated using [7]CONREP software.
   [8][8011234857.jpg] 

   G5540

References

   1. 
http://adso.conrep.com/conrep/actions/mail/unsubscribe.php?param=QVBSSUQ9MTEwMTAzODA2NTQxJkFQTUlEPTQyNCZDTVBDRD01NTQwJkZST009YmluQGZyYWFuay5jb20mVE89ci1oZWxwQHItcHJvamVjdC5vcmcmSk9CSUQ9MTEwMTAwNTg5MTQyJlNVQko9UmVxdWlyZW1lbnQgZm9yIEluZm9ybWF0aWNhIEFkbWluaXN0cmF0b3IgIFtSRVE6MTA2NjQ5OV0mdXNyaWQ9MTEwMjUyODI1Mzc3JkpCQ0lEPTYxMDAxMDU4MzcwMA==
   2. 
http://adso.conrep.com/conrep/web/parse/resumeparsing/screen1.php?CUSID=5540110100010337&weblinkflg=1&REQID=110345593918&apmid=412&APMID=412&RECTR=110252825229&CPFID=&VENID=&JOBID=610010583700&Source=JobPortal&LEAID=110103806541&SOURC=VM
   3. 
http://adso.conrep.com/conrep/web/parse/resumeparsing/screen1.php?CUSID=5540110100010337&weblinkflg=1&REQID=110345593918&apmid=412&APMID=412&RECTR=110252825229&CPFID=&VENID=&JOBID=610010583700&Source=JobPortal&LEAID=110103806541&SOURC=VM
   4. mailto:b...@fraank.com
   5. http://www.fraank.com/
   6. 
http://adso.conrep.com/conrep/actions/mail/unsubscribe.php?param=QVBSSUQ9MTEwMTAzODA2NTQxJkFQTUlEPTQyNCZDTVBDRD01NTQwJkZST009YmluQGZyYWFuay5jb20mVE89ci1oZWxwQHItcHJvamVjdC5vcmcmSk9CSUQ9MTEwMTAwNTg5MTQyJlNVQko9UmVxdWlyZW1lbnQgZm9yIEluZm9ybWF0aWNhIEFkbWluaXN0cmF0b3IgIFtSRVE6MTA2NjQ5OV0mdXNyaWQ9MTEwMjUyODI1Mzc3JkpCQ0lEPTYxMDAxMDU4MzcwMA==
   7. 
http://www.conrep.com/?emlsrc=-110252825377&emailed=r-help@r-project.org&T=MM
   8. 
http://www.conrep.com/?emlsrc=-11025282537

Re: [R] sorting/grouping/classification problem?

2013-01-28 Thread Bart Joosen


Hi,
 
after all, brute force seems  the way to go.
 
I will use a simplified example to illustrate what I want (dump of dat4 is
below):
suppose dat4:
ID  rrt Mnd Result
 1 0.45   00.1
 1 0.48   00.3
 1 1.24   00.5
 2 0.45   30.2
 2 0.48   30.6
 2 1.22   30.4
 
I want to generate all possible combinations of Mnd 0 with Mnd 3 to
calculate the sum of the squared differences divided by the number of rrt's
eg:
Mnd 0 rrt 0.45 gives result 0.1 where Mnd 3 rrt 0.45 gives 0.2 
At the same time rrt 0.48 at Mnd 0 gives 0.3 where rrt 0.48 at Mnd 3 gives
0.6
The same for rrt 1.24 at Mnd 0: 0.5 
This gives (0.1-0.2) ^2 + (0.3-0.6)^2  +   
 
The permutations should follow this rules:
- rrt's can never differ more than 10%
- rrt's can never switch (eg if rrt 0.45 at 0 Mnd is coupled to 0.48 mnd at
3 Mnd, then 0.48 at 0Mnd can not be coupled to 0.45 at 3 Mnd)
- rrt's can be coupled to NA values and shouldn't be coupled necessarily.  
 
 
I already played with combn, and expand.grid, but couldn't figure out how to
generate the combinations...
 
The  goal is to minimize the resulting value, but be aware of the fact that
the problem above is simplified, and thus isn't limited to only 2 Mnd
values, but maybe 5 - 10.
 
Thanks
 
Bart
 
 
dat4 <-
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), rrt = c(0.45, 
0.48, 1.24, 0.45, 0.48, 1.22), Mnd = c(0L, 0L, 0L, 3L, 3L, 3L
), Result = c(0.1, 0.3, 0.5, 0.2, 0.6, 0.4)), .Names = c("ID", 
"rrt", "Mnd", "Result"), row.names = c(NA, 6L), class = "data.frame")
 
 
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Thanks Ista,

I have played a bit with stat_density2d as well. It doesn't completely
capture what I am looking for and ends up being quite busy at the same
time. I'm looking for a way of helping those looking that the figure to see
the broad patterns of where in the x/y space the data from different groups
are distributed. Using the 95% CI type idea is so that I don't end up
arbitrarily drawing circles around each set of points. I appreciate your
direction though.

Nate


On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn  wrote:

> Hi Nathan,
>
> This only fits some of your criteria, but have you looked at
> ?stat_density2d?
>
> Best,
> Ista
>
> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller 
> wrote:
> > Hi all,
> >
> > I have been looking for means of add a contour around some points in a
> > scatterplot as a means of representing the center of density for of the
> > data. I'm imagining something like a 95% confidence estimate drawn around
> > the data.
> >
> > So far I have found some code for drawing polygons around the data. These
> > look nice, but in some cases the polygons are strongly influenced by
> > outlying points. Does anyone have a thought on how to draw a contour
> which
> > is more along the lines of a 95% confidence space?
> >
> > I have provided a working example below to illustrate the drawing of the
> > polygons. As I said I would rather have three "ovals"/95% contours drawn
> > around the points by "level" to capture the different density
> distributions
> > without the visualization being heavily influenced by outliers.
> >
> > I have looked into the code provided here from Hadley
> > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
> > using the mvtnorm package and the dmvnorm function, but haven't been able
> > to get it work for my data example. The calculated densities are always
> > zero (at this step of Hadley's code: dgrid$dens <-
> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
> >
> > I appreciate any assistance.
> >
> > Thanks,
> > Nate
> >
> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
> > seq(0.4,0.6,length.out=30))
> >
> y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
> >
> >
> > find_hull <- function(data) data[chull(data$x, data$y), ]
> > hulls <- ddply(data, .(level), find_hull)
> >
> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
> > fill=level))+geom_point()
> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
> > fig1
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query on package to use for investment quotes

2013-01-28 Thread R. Michael Weylandt
I'd look into the quantmod package.

Cheers,
MW

On Mon, Jan 28, 2013 at 1:52 PM, Bruce Miller  wrote:
> Hi all,
>
> Diverging from my research based number crunching I am interested to see
> what, if any R packages are out there that can access daily market values of
> investment funds (e.g. using
> http://quote.morningstar.com/fund/f.aspx?t=PDMIX) then store the data value
> (i.e. NAV =$11.56 ) with date retrieved so these can be plotted with ggplot2
> on some regular basis.
>
> A web search turned up a confusing and complex array of R prediction models
> etc. but I am looking for plain vanilla way to retrieve fund values and
> store the values.
>
> Thanks,
>
> Bruce
>
> --
> Bruce W. Miller, Ph.D.
> Conservation Ecologist
> Neotropical Bat Projects
>
>
> office details
> 11384 Alpine Road
> Stanwood, Mi. 49346
> Phone (231) 679-6059
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing continuous zero-inflated response

2013-01-28 Thread Kay Cichini
Many thanks - this was very helpful!

Regards, Kay
Am 28.01.2013 13:19 schrieb "Achim Zeileis" :

> On Sun, 27 Jan 2013, Kay Cichini wrote:
>
>  That said,
>>
>>  wilcox_test(x ~ factor(y), distribution = "exact")
>>>
>>
>> or the same with oneway_test, i.e would be ok?
>>
>
> Yep, exactly.
>
> And you could also look at chisq_test(factor(x > 0) ~ factor(y),
> distribtuion = approximate()) or something like that. Or of course
> Fisher's exact test.
>
> Best,
> Z
>
>
>> 2013/1/27 Achim Zeileis 
>>
>>  On Sun, 27 Jan 2013, Kay Cichini wrote:
>>>
>>>  Thanks for the reply!
>>>

 Still, aren't there issues with 2-sample test vs y and excess zeroes
 (->many ties), like for Mann-Whitney-U tests?


>>> If you use the (approximate) exact distribution, that is no problem.
>>>
>>> The problem with the Wilcoxon/Mann-Whitney test and ties is only that the
>>> simple recursion formula for computing the exact distribution only works
>>> without ties. Thus, it's not the exact distribution that is wrong but
>>> only
>>> the standard algorithm for evaluating it.
>>>
>>> Best,
>>> Z
>>>
>>>  Kind regards,
>>>
 Kay


 2013/1/26 Achim Zeileis 

  On Fri, 25 Jan 2013, Kay Cichini wrote:

>
>  Hello,
>
>
>> I'm searching for a test that applies to a dataset (N=36) with a
>> continuous zero-inflated dependent variable
>>
>>
>>  In a regression setup, one can use a regression model with a response
> censored at zero. survreg() in survival fits such models, tobit() in
> AER
> is
> a convenience interface for this special case.
>
> If the effects of a regressor can be different for the probability of a
> zero and the mean of the non-zero observations, then a two-part model
> can
> be used. E.g. a probit fit (via glm) plus a truncated regression (via
> truncreg in the package of the same name).
>
> However:
>
>
>  and only one nominal grouping variable with 2 levels (balanced).
>
>
>>
>>  In that case I would probably use no regression model but two-sample
> permutation tests, e.g. via the "coin" package.
>
>
>  In fact there are 4 response variables of this kind which I plan to
> test
>
>  seperately - the amount of zeroes ranges from 75 to 97%..
>>
>>
>>  That means you have between one (!) and nine non-zero observations.
> In
> the
> former case, it will be hard to model anything. And even in the latter
> case
> it will be hard to investigate the probability of zero and the mean of
> the
> non-zero observations separately.
>
> I would start out with a simple two-way table of (y > 0) vs group and
> conduct Fisher's exact test.
>
> And then you might try also your favorite two sample test of y vs
> group,
> preferably using the approximate exact distribution.
>
> Hope that helps,
> Z
>
>  I searched the web and found several modelling approaches but have the
>
>  feeling that they are overly complex for my very simple dataset.
>>
>> Thanks in advance for any help!
>> Kay
>>
>> --
>>
>> Kay Cichini, MSc Biol
>>
>> Grubenweg 22, 6071 Aldrans
>>
>> Tel.: 0650 9359101
>>
>> E-Mail: kay.cich...@gmail.com
>>
>> Web: www.theBioBucket.blogspot.co.**at> **
>> blogspot.co.at 
>> 
>> >>
>> > http://blogspot.co.at/><
>> http://www.**thebiobucket.**blogspot.co.at/
>> 
>> >
>>
>>>
>>>
>>  
>>> 
>>> >
>>> 
>>> 
>>> >
>>>


>>>  --
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> 
>> >
>> 
>> https://stat.ethz.ch/mailman/listinfo/r-help>
>> >
>>
>>>
>>>  PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html 
>> 

Re: [R] pROC in "R"

2013-01-28 Thread Ista Zahn
And what exactly is the problem? Your code produced no errors (or if
it did you have not shown them to us...)

Best,
Ista

On Mon, Jan 28, 2013 at 9:44 AM, Fethi BEZOUBIRI  wrote:
>
>
>  Dear,
>
> I would like to use "pROC" software for my study, but  I could not
> uploaded it in "R". Could you please help me to overcome this problem?
> This is the message when I write "Library (pROC)" :
>
> Le chargement a nécessité le package : plyr
> Type 'citation("pROC")' for a citation.
>
> Attachement du package : ‘pROC’
>
> The following object(s) are masked from ‘package:stats’:
>
>  cov, smooth, var
> Thank you in advance
>
> Best Regards
> Fethi
>
> Student at Geneva university
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pROC in "R"

2013-01-28 Thread Fethi BEZOUBIRI


 Dear,

I would like to use "pROC" software for my study, but  I could not
uploaded it in "R". Could you please help me to overcome this problem?
This is the message when I write "Library (pROC)" :

Le chargement a nécessité le package : plyr
Type 'citation("pROC")' for a citation.

Attachement du package : ‘pROC’

The following object(s) are masked from ‘package:stats’:

     cov, smooth, var
Thank you in advance

Best Regards
Fethi

Student at Geneva university
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Query on package to use for investment quotes

2013-01-28 Thread Bruce Miller

Hi all,

Diverging from my research based number crunching I am interested to see 
what, if any R packages are out there that can access daily market 
values of investment funds (e.g. using 
http://quote.morningstar.com/fund/f.aspx?t=PDMIX) then store the data 
value (i.e. NAV =$11.56 ) with date retrieved so these can be plotted 
with ggplot2 on some regular basis.


A web search turned up a confusing and complex array of R prediction 
models etc. but I am looking for plain vanilla way to retrieve fund 
values and store the values.


Thanks,

Bruce

--
Bruce W. Miller, Ph.D.
Conservation Ecologist
Neotropical Bat Projects


office details
11384 Alpine Road
Stanwood, Mi. 49346
Phone (231) 679-6059

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Ista Zahn
Hi Nathan,

This only fits some of your criteria, but have you looked at ?stat_density2d?

Best,
Ista

On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller  wrote:
> Hi all,
>
> I have been looking for means of add a contour around some points in a
> scatterplot as a means of representing the center of density for of the
> data. I'm imagining something like a 95% confidence estimate drawn around
> the data.
>
> So far I have found some code for drawing polygons around the data. These
> look nice, but in some cases the polygons are strongly influenced by
> outlying points. Does anyone have a thought on how to draw a contour which
> is more along the lines of a 95% confidence space?
>
> I have provided a working example below to illustrate the drawing of the
> polygons. As I said I would rather have three "ovals"/95% contours drawn
> around the points by "level" to capture the different density distributions
> without the visualization being heavily influenced by outliers.
>
> I have looked into the code provided here from Hadley
> https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
> using the mvtnorm package and the dmvnorm function, but haven't been able
> to get it work for my data example. The calculated densities are always
> zero (at this step of Hadley's code: dgrid$dens <-
> dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
>
> I appreciate any assistance.
>
> Thanks,
> Nate
>
> x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
> seq(0.4,0.6,length.out=30))
> y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
> data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
>
>
> find_hull <- function(data) data[chull(data$x, data$y), ]
> hulls <- ddply(data, .(level), find_hull)
>
> fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
> fill=level))+geom_point()
> fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
> fig1
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Hi all,

I have been looking for means of add a contour around some points in a
scatterplot as a means of representing the center of density for of the
data. I'm imagining something like a 95% confidence estimate drawn around
the data.

So far I have found some code for drawing polygons around the data. These
look nice, but in some cases the polygons are strongly influenced by
outlying points. Does anyone have a thought on how to draw a contour which
is more along the lines of a 95% confidence space?

I have provided a working example below to illustrate the drawing of the
polygons. As I said I would rather have three "ovals"/95% contours drawn
around the points by "level" to capture the different density distributions
without the visualization being heavily influenced by outliers.

I have looked into the code provided here from Hadley
https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
using the mvtnorm package and the dmvnorm function, but haven't been able
to get it work for my data example. The calculated densities are always
zero (at this step of Hadley's code: dgrid$dens <-
dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )

I appreciate any assistance.

Thanks,
Nate

x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
seq(0.4,0.6,length.out=30))
y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)


find_hull <- function(data) data[chull(data$x, data$y), ]
hulls <- ddply(data, .(level), find_hull)

fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)),
fill=level))+geom_point()
fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2)
fig1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting inline hook to a function identical to default in knitr turns of exponential formatting

2013-01-28 Thread Yihui Xie
This is a little bit hard to explain because there are two levels of
default hooks (the system default and the document-format-specific
default). The best way to explain it is probably the source code:
https://github.com/yihui/knitr/blob/master/R/output.R#L183-L189

In short, the "default" hooks you mentioned are not really used by
knitr at all; they are only placeholders in the system. If you do not
modify them in advance, knitr will set the appropriate hooks according
to the document format, e.g. the inline hook for LaTeX will use
scientific notation for numbers.

The absolute default inline hook does not do scientific notation, as
you can see from the its source:

## function (x)
## {
##if (is.numeric(x))
##x = round(x, getOption("digits"))
##paste(as.character(x), collapse = ", ")
## }

I recommend you to ignore these system default hooks, unless you want
to study the technical implementation of this package. For the default
hooks according to the output format, see
http://yihui.name/knitr/hooks and ?render_latex in the documentation.

Regards,
Yihui
--
Yihui Xie 
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


On Mon, Jan 28, 2013 at 4:42 AM, mlell08  wrote:
> Hello List,
>
> while dealing with a questin of 'xiaodao' (
> http://r.789695.n4.nabble.com/Problem-with-large-small-numbers-in-knitr-tp4653986.html)
>
> I noticed that copying the default inline hook function obtained by
> knit_hooks$get("inline")
> into a knit_hook$set(inline = <...>) call turns off exponential
> fomatting in the resulting .tex file.
>
> I used a stripped version of 'xiaodao's example:
> \documentclass{article}
> \begin{document}
>
> <<>>=
> a<-1e-13
> b<-2.5e-10
> ssrr<-123456.12
> ssru<-123400.00
> @
>
> $
> c=\Sexpr{a}/\Sexpr{b}
> f=\Sexpr{ssrr-ssru}/\Sexpr{ssru}
> $
>
> \end{document}
>
> so:
> knit_hooks$restore()
> knit_hooks$get("inline")
>
> ## yields:
> ## function (x)
> ## {
> ##if (is.numeric(x))
> ##x = round(x, getOption("digits"))
> ##paste(as.character(x), collapse = ", ")
> ## }
> ## 
>
> knit("FILENAME.Rnw")
>
> ## .tex-file:
> ## c=$10^{-13}$/$2.5\times 10^{-10}$
> ## f=56.12/$1.234\times 10^{5}$
>
>
> ## then run knit_hooks$set() with exactly the same function:
> knit_hooks$set(inline= function (x)
> {
> if (is.numeric(x))
> x = round(x, getOption("digits"))
> paste(as.character(x), collapse = ", ")
> }
> )
> knit("FILENAME.Rnw")
>
> ## .tex-File
> ## c=0/0
> ## f=56.12/123400
>
> knit_hooks$get("inline")
> ## function (x)
> ## {
> ##  if (is.numeric(x))
> ##  x = round(x, getOption("digits"))
> ##  paste(as.character(x), collapse = ", ")
> ## }
>
>
> The only thing that changed is no  is
> printed anymore
>
> Why does knitr change its output?
>
> Regards, Moritz
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adjusted R-squared formula in lm()

2013-01-28 Thread JLucke
THE adjusted R^2 is [1-(1-R2)·(n-1)/(n-v-1)], which you call McNemar’s 
formula.  It was actually proposed first by Fisher in 1924.  Theil's 
formula is equal to Fisher's.

Wherry's formula, as you give it,  is correct but was proposed to estimate 
the cross-validated R2, which is different from R2.  Neither Lord nor 
Stein actually proposed their respective formulas.  They were instead 
proposed by Darlington to estimate the CVR2 but are based on a mistaken 
assumption. Neither Wherry's, Lord's, or Stein's formulas estimates what 
they had hoped to estimate, and most likely are not appropriate to your 
problem.  Browne found the correct estimator of CVR2.

R actually uses Fisher's formula but misattributes it to Wherry .  The 
adjusted-R2 is a better estimator of the population coefficient of 
determination than is R2 itself.  It has much less bias and, unlike R2, 
its expectation is not a function of v, the number of variables.   In 
particular, if the population coefficient of determination is truly zero, 
R2 can be expected to give the value v/(n-1), whereas the adjR2 will have 
an expected value of 0.





Ista Zahn  
Sent by: r-help-boun...@r-project.org
01/28/2013 08:34 AM

To
Nicole Janz , 
cc
r-help@r-project.org
Subject
Re: [R] Adjusted R-squared formula in lm()






Hi Nicole,

One nice thing about R is that it is often easy to see the code for
many functions. For summary.lm just type the name at the command
prompt (no brackets) to see the function definition. There you will
find

ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n -
df.int)/rdf)

Best,
Ista

On Mon, Jan 28, 2013 at 6:03 AM, Nicole Janz  wrote:
> What is the exact formula used in R lm() for the Adjusted R-squared? How 
can I interpret it?
>
> There seem to exist several formula's to calculate Adjusted R-squared.
>
> Wherry’s formula [1-(1-R2)·(n-1)/(n-v)]
>
> McNemar’s formula [1-(1-R2)·(n-1)/(n-v-1)]
>
> Lord’s formula [1-(1-R2)(n+v-1)/(n-v-1)]
>
> Stein 1-(n-1/n-k-1)(n-2)/n-k-2) (n+1/n)
>
> Theil's formula (found here: 
http://en.wikipedia.org/wiki/Coefficient_of_determination)
>
> According to the textbook Field, Discovering Statistics Using R (2012, 
p. 273) R uses Wherry's equation which "tells us how much variance in Y 
would be accounted for if the model had been derived from th. population 
from which the sample was taken". He does not give the formula for Wherry. 
He recommends using Stein's formula (by hand) to check how well the model 
cross-validates.
> Kleiber/Zeileis, Applied Econometrics with R (2008,p. 59) claim it's 
"Theil's adjusted R-squared" and don't say exactly how its interpretation 
varies from the multiple R-squared.
>
> Dalgaard, Introductory Statistics with R (2008, p.113) writes that "if 
you multiply [adjusted R-squared] by 100%, it can be interpreted as '% 
variance reduction'. He does not say to which formula this corresponds.
>
> I had previously thought, and read widely, that R-squared penalizes for 
adding additional variables to the model. Now the use of these different 
formulas seems to call for different interpretations?
>
> My two questions in short: Which formula is used by R lm()? How can I 
interpret it?
>
> Thank you!
>
>
>
>
> Nicole Janz, PhD Cand.
> Lecturer at Social Sciences Research Methods Centre 2012/13
> University of Cambridge
> Department of Politics and International Studies
> www.nicolejanz.de | nj...@cam.ac.uk | Mobile: +44 (0) 7905 70 1 69 4
> Skype: nicole.janz
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parse/eval and character encoded expressions: How to deal with non-encoding strings?

2013-01-28 Thread William Dunlap
I said
> you can attach an attribribute called ".Environment" to your object
> that environment(object) will retrieve

You can also use
   environment(object) <- envir
instead of
   object <- structure(object, .Environment = envir)
to set the ".Environment" attribute to envir.  This makes the code more
symmetric and you don't have remember the magic attribute name.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of William Dunlap
> Sent: Monday, January 28, 2013 8:14 AM
> To: Johannes Graumann; r-h...@stat.math.ethz.ch
> Subject: Re: [R] parse/eval and character encoded expressions: How to deal 
> with non-
> encoding strings?
> 
> Instead of saving a string that can be parsed into a language object (and
> later evaluated), I would just save something that that could be evaluated
> directly.  Note that a literal like "MyFile" or 3.14 evaluates to itself so 
> you
> can save a literal string or a language object and use eval() on either.  
> Unless
> you do this there is no good way to know if "TestDir/TestFile*.txt" means to
> do some multiplication and division or to use a glob pattern to find a group 
> of files.
> E.g.,
> 
>   makeFoo <- function(file, ...) {
>   structure(c(...), fileExpr=substitute(file), class="Foo")
>   }
>   getFileFoo <- function(FooObject) {
>   eval(attr(FooObject, "fileExpr"))
>   }
> 
> used as
> 
>   > z <- makeFoo(system.file(package="fpp", "DESCRIPTION"), 1, 2, 3)
>   > z
>   [1] 1 2 3
>   attr(,"fileExpr")
>   system.file(package = "fpp", "DESCRIPTION")
>   attr(,"class")
>   [1] "Foo"
>   > getFileFoo(z)
>   [1] "C:/Program Files/R/R-2.15.2/library/fpp/DESCRIPTION"
> or
>   > z <- makeFoo("C:/Temp/File.txt", 1, 2, 3)
>   > z
>   [1] 1 2 3
>   attr(,"fileExpr")
>   [1] "C:/Temp/File.txt"
>   attr(,"class")
>   [1] "Foo"
>   > getFileFoo(z)
>   [1] "C:/Temp/File.txt"
> 
> One thing you might worry about is in which environment the language object is
> evaluated (the envir= argument to eval()).  If you embed your object in a 
> formula
> then environment(formula) tells you where to evaluate it.  If the formula 
> syntax
> is distracting then you can attach an attribribute called ".Environment" to 
> your object
> that environment(object) will retrieve.  E.g.,
> 
> makeFooEnv <- function(file, ..., envir = parent.frame()) {
>  fileExpr <- structure(substitute(file), .Environment = envir)
>  structure(c(...), fileExpr = fileExpr, class="Foo")
> }
> getFileFoo <- function(FooObject) {
>  fileExpr <- attr(FooObject, "fileExpr")
>  eval(fileExpr, envir=environment(fileExpr))
> }
> 
> used as
>   > fooObjs <- lapply(1:3, function(i)makeFooEnv(paste0("File",i,".txt"), 
> i^2))
>   > fooObjs[[2]]
>   [1] 4
>   attr(,"fileExpr")
>   paste0("File", i, ".txt")
>   attr(,"fileExpr")attr(,".Environment")
>   
>   attr(,"class")
>   [1] "Foo"
>   >
>   > i <- 17
>   > getFileFoo(fooObjs[[2]]) # Note that eval gets value of i as 2, not 17
>   [1] "File2.txt"
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> > Behalf
> > Of Johannes Graumann
> > Sent: Sunday, January 27, 2013 11:29 PM
> > To: r-h...@stat.math.ethz.ch
> > Subject: [R] parse/eval and character encoded expressions: How to deal with 
> > non-
> > encoding strings?
> >
> > Hi,
> >
> > I am intending to save a path-describing character object in a slot of a
> > class I'm working on. In order to have the option to use "system.file" etc
> > in such string-saved path definitions, I wrote this
> >
> > ExpressionEvaluator <- function(x){
> >   x <- tryCatch(
> > expr=base::parse(text=x),
> > error = function(e){return(as.expression(x))},
> > finally=TRUE)
> >   return(x)
> > }
> >
> > This produces
> >
> > > ExpressionEvaluator("system.file(\"INDEX\")")
> > expression(system.file("INDEX"))
> > > eval(ExpressionEvaluator("system.file(\"INDEX\")"))
> > [1] "/usr/lib/R/library/base/INDEX"
> >
> > Which is what I want. However,
> >
> > > eval(ExpressionEvaluator("Test"))
> > Error in eval(expr, envir, enclos) : object 'Test' not found
> >
> > prevents me from general usage (also in cases where "x" does NOT encode an
> > expression).
> >
> > I don't understand why it is that
> > > base::parse(text="Test")
> > will return
> > [1] expression(Test)
> > while
> > > as.expression("Test")
> > produces
> > [1] expression("Test")
> > which would work with the eval call.
> >
> > Can anyone point out to me how to solve this generally? How can I feed the
> > function a character object and get back an eval-able expression independent
> > of whether there was an expression "encoded" in the input or not.
> >
> > Thank you for any hints.
> >
> > Sincerely, Joh
> >
> > __
> > R-help@r-proj

[R] RandomForest and Missing Values

2013-01-28 Thread Lorenzo Isella
Dear All,
I would like to use a randomForest algorithm on a dataset.
The set is not particularly large/difficult to handle, but it has some
missing values (both factors and numerical values).
According to what I found

https://stat.ethz.ch/pipermail/r-help/2005-September/078880.html
https://stat.ethz.ch/pipermail/r-help/2007-January/123117.html

the randomForest package has a problem with missing data (essentially
you have to resort to some "trick" to introduce them into your dataset
--a median value, the most common factor, a linear interpolation
etc...).
Seen that I could not find a clear workaround for this (but I cannot
be the only one who has in mind to do a randomForest on a less than
perfect data set), can anyone help me out?
I am concerned about the consequences of introducing the missing
values into the data set.
The cforest function in the "Party" package does not seem to have this
limitation, but on the other hand the randomForest package has passed
the test of timeso should I drop it in this case?
Any suggestion is appreciated.
Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parse/eval and character encoded expressions: How to deal with non-encoding strings?

2013-01-28 Thread William Dunlap
Instead of saving a string that can be parsed into a language object (and
later evaluated), I would just save something that that could be evaluated
directly.  Note that a literal like "MyFile" or 3.14 evaluates to itself so you
can save a literal string or a language object and use eval() on either.  Unless
you do this there is no good way to know if "TestDir/TestFile*.txt" means to
do some multiplication and division or to use a glob pattern to find a group of 
files.
E.g.,

  makeFoo <- function(file, ...) {
  structure(c(...), fileExpr=substitute(file), class="Foo")
  }
  getFileFoo <- function(FooObject) {
  eval(attr(FooObject, "fileExpr"))
  }

used as

  > z <- makeFoo(system.file(package="fpp", "DESCRIPTION"), 1, 2, 3)
  > z
  [1] 1 2 3
  attr(,"fileExpr")
  system.file(package = "fpp", "DESCRIPTION")
  attr(,"class")
  [1] "Foo"
  > getFileFoo(z)
  [1] "C:/Program Files/R/R-2.15.2/library/fpp/DESCRIPTION"
or
  > z <- makeFoo("C:/Temp/File.txt", 1, 2, 3)
  > z
  [1] 1 2 3
  attr(,"fileExpr")
  [1] "C:/Temp/File.txt"
  attr(,"class")
  [1] "Foo"
  > getFileFoo(z)
  [1] "C:/Temp/File.txt"

One thing you might worry about is in which environment the language object is
evaluated (the envir= argument to eval()).  If you embed your object in a 
formula
then environment(formula) tells you where to evaluate it.  If the formula syntax
is distracting then you can attach an attribribute called ".Environment" to 
your object
that environment(object) will retrieve.  E.g.,

makeFooEnv <- function(file, ..., envir = parent.frame()) {
 fileExpr <- structure(substitute(file), .Environment = envir)
 structure(c(...), fileExpr = fileExpr, class="Foo")
}
getFileFoo <- function(FooObject) {
 fileExpr <- attr(FooObject, "fileExpr")
 eval(fileExpr, envir=environment(fileExpr))
}

used as
  > fooObjs <- lapply(1:3, function(i)makeFooEnv(paste0("File",i,".txt"), i^2))
  > fooObjs[[2]]
  [1] 4
  attr(,"fileExpr")
  paste0("File", i, ".txt")
  attr(,"fileExpr")attr(,".Environment")
  
  attr(,"class")
  [1] "Foo"
  > 
  > i <- 17
  > getFileFoo(fooObjs[[2]]) # Note that eval gets value of i as 2, not 17
  [1] "File2.txt"

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Johannes Graumann
> Sent: Sunday, January 27, 2013 11:29 PM
> To: r-h...@stat.math.ethz.ch
> Subject: [R] parse/eval and character encoded expressions: How to deal with 
> non-
> encoding strings?
> 
> Hi,
> 
> I am intending to save a path-describing character object in a slot of a
> class I'm working on. In order to have the option to use "system.file" etc
> in such string-saved path definitions, I wrote this
> 
> ExpressionEvaluator <- function(x){
>   x <- tryCatch(
> expr=base::parse(text=x),
> error = function(e){return(as.expression(x))},
> finally=TRUE)
>   return(x)
> }
> 
> This produces
> 
> > ExpressionEvaluator("system.file(\"INDEX\")")
> expression(system.file("INDEX"))
> > eval(ExpressionEvaluator("system.file(\"INDEX\")"))
> [1] "/usr/lib/R/library/base/INDEX"
> 
> Which is what I want. However,
> 
> > eval(ExpressionEvaluator("Test"))
> Error in eval(expr, envir, enclos) : object 'Test' not found
> 
> prevents me from general usage (also in cases where "x" does NOT encode an
> expression).
> 
> I don't understand why it is that
> > base::parse(text="Test")
> will return
> [1] expression(Test)
> while
> > as.expression("Test")
> produces
> [1] expression("Test")
> which would work with the eval call.
> 
> Can anyone point out to me how to solve this generally? How can I feed the
> function a character object and get back an eval-able expression independent
> of whether there was an expression "encoded" in the input or not.
> 
> Thank you for any hints.
> 
> Sincerely, Joh
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] importing data

2013-01-28 Thread Ivan Calandra
This code is indeed much shorter. About the speed, I guess it should be 
faster, but you should test it with system.time()


I'm glad that it helped.
Ivan


--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calan...@u-bourgogne.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 28/01/13 16:38, Ray Cheung a écrit :

Dear Ivan,
It works perfectly fine now. I love this code more since I need not 
delete the NULL list myself (and it should be faster, right?). Thank 
you very much for your help!

cheers,
Ray

On Mon, Jan 28, 2013 at 5:32 PM, Ivan Calandra 
mailto:ivan.calan...@u-bourgogne.fr>> 
wrote:


What about this then:
list_of_datasets <- lapply(file_names, read.table,
other_args_to_read.table)

Something that might then be useful is:
names(list_of_datasets) <- file_names

Does it do it now?


Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06 
ivan.calan...@u-bourgogne.fr 
http://biogeosciences.u-bourgogne.fr/calandra

Le 28/01/13 07:34, Ray Cheung a écrit :

Thanks a million for all help provided!! I can do what I
intend to using the "for loop". However, I'm still eager to
try the list.files approach. Here is the error message that I
got using Ivan's code:

> list_of_dataset <- do.call(read.table, file_names)
Error in do.call(read.table, file_names) : second argument
must be a list

Please advise.

Ray

On Sun, Jan 27, 2013 at 10:57 PM, Ivan Calandra
mailto:ivan.calan...@u-bourgogne.fr>
>> wrote:

Hi Ray!

I'm insisting with list.files...!

What about like this (untested)?
file_names <- list.files(path="C:/.../data", pattern=".dat$",
full.names=TRUE)
list_of_dataset <- do.call(read.table, file_names)

Let me know if this helps!
Ivan

-- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282
Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE
+33(0)3.80.39.63.06 

ivan.calan...@u-bourgogne.fr

>

http://biogeosciences.u-bourgogne.fr/calandra

Le 26/01/13 10:03, Ray Cheung a écrit :

Thanks for your commands, Ivan and Michael! However, I
am still
not producing the right codes. Would you please help me on
this? I've written the following codes. Please
comment. Thank you
very much.
Task: Reading data1.dat to data1000.dat (with missing
files) into
R. Missing files can be omitted in the list.
###FUNCTION TO READ FILES
little_helpful <- function(n) {
file_name <- paste0("C:/.../data", n, ".dat")
read.table(file_name)
}
###RETURN AN OBJECT WHICH CHECKS FOR THE EXISTENCE OF
FILES
check  <- function(n) {
a <- ifelse(file.exists(paste0("C:/.../data", n,
".dat")), 1, 0)
a
}
###Combining the functions
IMPORT <- function(n) {
   L <- check(1:n)
   for (i in 1:n) {
  if (L[i] == 1)
  list_of_datasets <- lapply(i, little_helpful) else
list_of_datasets <- 0
  }
   list_of_datasets
   }
Thanks for all comments.
Best Regards,
Ray

On Fri, Jan 25, 2013 at 5:48 PM, Ivan Calandra
mailto:ivan.calan...@u-bourgogne.fr>
>> wrote:

Hi,

Not sure this is what you need, but what about
list.files()?
It can get you all the files from a given folder,
and you
could then work this list with regular expressions
for example.

HTH,
Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06 

ivan.calan...@u-bourgogne.fr


Re: [R] importing data

2013-01-28 Thread Ray Cheung
Dear Ivan,

It works perfectly fine now. I love this code more since I need not delete
the NULL list myself (and it should be faster, right?). Thank you very much
for your help!

cheers,
Ray

On Mon, Jan 28, 2013 at 5:32 PM, Ivan Calandra  wrote:

> What about this then:
> list_of_datasets <- lapply(file_names, read.table,
> other_args_to_read.table)
>
> Something that might then be useful is:
> names(list_of_datasets) <- file_names
>
> Does it do it now?
>
>
> Ivan
>
> --
> Ivan CALANDRA
> Université de Bourgogne
> UMR CNRS/uB 6282 Biogéosciences
> 6 Boulevard Gabriel
> 21000 Dijon, FRANCE
> +33(0)3.80.39.63.06
> ivan.calan...@u-bourgogne.fr
> http://biogeosciences.u-**bourgogne.fr/calandra
>
> Le 28/01/13 07:34, Ray Cheung a écrit :
>
>> Thanks a million for all help provided!! I can do what I intend to using
>> the "for loop". However, I'm still eager to try the list.files approach.
>> Here is the error message that I got using Ivan's code:
>>
>> > list_of_dataset <- do.call(read.table, file_names)
>> Error in do.call(read.table, file_names) : second argument must be a list
>>
>> Please advise.
>>
>> Ray
>>
>> On Sun, Jan 27, 2013 at 10:57 PM, Ivan Calandra <
>> ivan.calan...@u-bourgogne.fr 
>> >
>> wrote:
>>
>> Hi Ray!
>>
>> I'm insisting with list.files...!
>>
>> What about like this (untested)?
>> file_names <- list.files(path="C:/.../data", pattern=".dat$",
>> full.names=TRUE)
>> list_of_dataset <- do.call(read.table, file_names)
>>
>> Let me know if this helps!
>> Ivan
>>
>> -- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282
>> Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE
>> +33(0)3.80.39.63.06 
>> ivan.calan...@u-bourgogne.fr 
>> > >
>>
>> 
>> http://biogeosciences.u-**bourgogne.fr/calandra
>>
>> Le 26/01/13 10:03, Ray Cheung a écrit :
>>
>>> Thanks for your commands, Ivan and Michael! However, I am still
>>> not producing the right codes. Would you please help me on
>>> this? I've written the following codes. Please comment. Thank you
>>> very much.
>>> Task: Reading data1.dat to data1000.dat (with missing files) into
>>> R. Missing files can be omitted in the list.
>>> ###FUNCTION TO READ FILES
>>> little_helpful <- function(n) {
>>> file_name <- paste0("C:/.../data", n, ".dat")
>>> read.table(file_name)
>>> }
>>> ###RETURN AN OBJECT WHICH CHECKS FOR THE EXISTENCE OF FILES
>>> check  <- function(n) {
>>> a <- ifelse(file.exists(paste0("C:/**.../data", n, ".dat")), 1, 0)
>>> a
>>> }
>>> ###Combining the functions
>>> IMPORT <- function(n) {
>>>L <- check(1:n)
>>>for (i in 1:n) {
>>>   if (L[i] == 1)
>>>   list_of_datasets <- lapply(i, little_helpful) else
>>> list_of_datasets <- 0
>>>   }
>>>list_of_datasets
>>>}
>>> Thanks for all comments.
>>> Best Regards,
>>> Ray
>>>
>>> On Fri, Jan 25, 2013 at 5:48 PM, Ivan Calandra
>>> >> >
>>> wrote:
>>>
>>> Hi,
>>>
>>> Not sure this is what you need, but what about list.files()?
>>> It can get you all the files from a given folder, and you
>>> could then work this list with regular expressions for example.
>>>
>>> HTH,
>>> Ivan
>>>
>>> --
>>> Ivan CALANDRA
>>> Université de Bourgogne
>>> UMR CNRS/uB 6282 Biogéosciences
>>> 6 Boulevard Gabriel
>>> 21000 Dijon, FRANCE
>>> +33(0)3.80.39.63.06 
>>> ivan.calan...@u-bourgogne.fr
>>> >> >
>>>
>>> 
>>> http://biogeosciences.u-**bourgogne.fr/calandra
>>>
>>> Le 25/01/13 10:00, R. Michael Weylandt a écrit :
>>>
>>> On Fri, Jan 25, 2013 at 6:11 AM, Ray Cheung
>>> mailto:ray1...@gmail.com>> wrote:
>>>
>>> Dear Michael,
>>>
>>> Thanks for your codes. However, lapply does not work
>>> in my case since I've
>>> some files missing in the data (say, the file
>>> data101.dat). Do you have any
>>> suggestions on this?? Thank you very much.
>>>
>>> You could simply add a test using file.exists() but I'm
>>> not sure what
>>> you want to do with the M matrix then -- omit the slice
>>> (so the others
>>> are all shifted down one) or fill it entirely with NA's.
>>>
>>> Michael
>>>
>>> __**
>>> R-help@r-project.org 
>>>
>>> mailing list
>>> 
>>> https://stat.ethz.ch/mail

Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression

2013-01-28 Thread Frank Harrell

For lrm fits, predict(fit, type='mean') predicts the mean Y, not a
probability.
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thank you your help.

2013-01-28 Thread arun


Hi,
temp3<- read.table(text="
ID CTIME WEIGHT
HM001 1223 24.0
HM001 1224 25.2
HM001 1225 23.1
HM001 1226 NA
HM001 1227 32.1
HM001 1228 32.4
HM001 1229 1323.2
HM001 1230 27.4
HM001 1231 22.4236 #changed here to test the previous solution
",sep="",header=TRUE,stringsAsFactors=FALSE)
 tempnew<- na.omit(temp3)


 grep("\\d{4}",temp3$WEIGHT) 
#[1] 7 9 #not correct


temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the 
decimals
tail(temp3)
# ID CTIME  WEIGHT
#4 HM001  1226  NA
#5 HM001  1227 32.1000
#6 HM001  1228 32.4000
#7 HM001  1229  NA
#8 HM001  1230 27.4000
#9 HM001  1231 22.4236

#Based on the variance,
You could set up some limit, for example 50 and use:
tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT)
A.K.






From: 남윤주 
To: arun  
Sent: Monday, January 28, 2013 2:20 AM
Subject: Re: Thank you your help.



Thank you for your reply again.  Your understanding is exactly right.
I attached a picture that show dataset.
'weight' is a dependent variable. And CTIME means hour/minute. This data will 
have accumulated for years.
Speaking of accepted variance range, it would be from 10 to 50. 
Actually, I am java programmer. So, I am strange this R Language.
Can u give me some example to use grep function?
-Original Message-
From: "arun" 
To: "jamansymp...@naver.com"; 
Cc: 
Sent: 2013-01-28 (월) 15:27:12
Subject: Re: Thank you your help.

Hi,
Your original post was that 
"...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 
kg".

So, my understanding was that you get values 2000 or 2000-4000 reads in place 
of 20-40 occasionally due to some misreading.

If your dataset contains observed value, strange value and NA and you want to 
replace the strange value to NA, could you mention the range of strange values. 
 If the strange value ranges anywhere between 1000-, it should get replaced 
with the ?grep() solution.  But, if it depends upon something else, you need to 
specify.  Also, regarding the variance, what is your accepted range of variance.
A.K.





- Original Message -
From: "jamansymp...@naver.com" @naver.com>
To: smartpink...@yahoo.com
Cc: 
Sent: Monday, January 28, 2013 1:15 AM
Subject: Thank you your help.

Thank you to answer my question. 
It is not exactly what I want. I should have informed detailed situation. 
There is a sensor get data every minute. And that data will be accumulated and 
be portion of dataset. 
And the dataset contains observed value, strange value and NA. 
Namely, I am not sure where strange value will be occured. 
And I can't expect when strange value will be occured. 

I need the procedure performing like below.  
1. using a method, set the range of variance 
2. using for(i) statement, check whether variance(weihgt) is in the range. 
3. when variance is out of range, impute weight[i] as NA. 

Thank you. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variability Plot For Toray Microarray Data

2013-01-28 Thread Bert Gunter
Wrong list! You want Bioconductor.

 -- Bert

On Mon, Jan 28, 2013 at 2:42 AM, Peverall Dubois
 wrote:
> Is there any package that allow you to perform "MA plot" like graph
> for Toray microarray data?
>
>
> Unlike Affymetrix CEL file which contain 2 values (R and G),
> Torray raw data only contain 1 value.
>
> MA-plot is Affymetrix specific which usually available for in  (limma)
> package.
>
>
> P. Dubois
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] platform specific "Depends" argument?

2013-01-28 Thread R. Michael Weylandt
I think this is only a legacy question, right? In recent R, you
can/should use "parallel" instead of either multicore or snowfall.

That said, no answer if you want back compatability with older versions of R.

MW

On Mon, Jan 28, 2013 at 10:43 AM, Florian Schwaiger
 wrote:
> Dear R users,
>
> we have a problem when building R packages which depend on platform
> specific packages. The following example will illustrate our problem:
>
> For parallel computing (in our own package) we want to use the multicore
> package. Since multicore is not available for Windows we subtitute it by
> the snowfall package. Currently we create two packages with different
> "Depends" argument in the DESCRIPTION files (one for unix and one for
> Windows).
>
> We considered using the .onLoad function and calling library(multicore)
> resp. library(snowfall) depending on the employed system, but it is said
> in the help of that function that it is not "Good practice" to do so.
>
> Is there a better practice than building two packages (with the same
> name and almost same code) for the different platforms. Any help is
> appreciated.
>
>
> Matthias and Florian
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adjusted R-squared formula in lm()

2013-01-28 Thread Ista Zahn
Hi Nicole,

One nice thing about R is that it is often easy to see the code for
many functions. For summary.lm just type the name at the command
prompt (no brackets) to see the function definition. There you will
find

ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n -
df.int)/rdf)

Best,
Ista

On Mon, Jan 28, 2013 at 6:03 AM, Nicole Janz  wrote:
> What is the exact formula used in R lm() for the Adjusted R-squared? How can 
> I interpret it?
>
> There seem to exist several formula's to calculate Adjusted R-squared.
>
> Wherry’s formula [1-(1-R2)·(n-1)/(n-v)]
>
> McNemar’s formula [1-(1-R2)·(n-1)/(n-v-1)]
>
> Lord’s formula [1-(1-R2)(n+v-1)/(n-v-1)]
>
> Stein 1-(n-1/n-k-1)(n-2)/n-k-2) (n+1/n)
>
> Theil's formula (found here: 
> http://en.wikipedia.org/wiki/Coefficient_of_determination)
>
> According to the textbook Field, Discovering Statistics Using R (2012, p. 
> 273) R uses Wherry's equation which "tells us how much variance in Y would be 
> accounted for if the model had been derived from th. population from which 
> the sample was taken". He does not give the formula for Wherry. He recommends 
> using Stein's formula (by hand) to check how well the model cross-validates.
> Kleiber/Zeileis, Applied Econometrics with R (2008,p. 59) claim it's "Theil's 
> adjusted R-squared" and don't say exactly how its interpretation varies from 
> the multiple R-squared.
>
> Dalgaard, Introductory Statistics with R (2008, p.113) writes that "if you 
> multiply [adjusted R-squared] by 100%, it can be interpreted as '% variance 
> reduction'. He does not say to which formula this corresponds.
>
> I had previously thought, and read widely, that R-squared penalizes for 
> adding additional variables to the model. Now the use of these different 
> formulas seems to call for different interpretations?
>
> My two questions in short: Which formula is used by R lm()? How can I 
> interpret it?
>
> Thank you!
>
>
>
>
> Nicole Janz, PhD Cand.
> Lecturer at Social Sciences Research Methods Centre 2012/13
> University of Cambridge
> Department of Politics and International Studies
> www.nicolejanz.de | nj...@cam.ac.uk | Mobile: +44 (0) 7905 70 1 69 4
> Skype: nicole.janz
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adjusted R-squared formula in lm()

2013-01-28 Thread Nicole Janz
What is the exact formula used in R lm() for the Adjusted R-squared? How can I 
interpret it?

There seem to exist several formula's to calculate Adjusted R-squared.

Wherry’s formula [1-(1-R2)·(n-1)/(n-v)]

McNemar’s formula [1-(1-R2)·(n-1)/(n-v-1)]

Lord’s formula [1-(1-R2)(n+v-1)/(n-v-1)]

Stein 1-(n-1/n-k-1)(n-2)/n-k-2) (n+1/n)

Theil's formula (found here: 
http://en.wikipedia.org/wiki/Coefficient_of_determination)

According to the textbook Field, Discovering Statistics Using R (2012, p. 273) 
R uses Wherry's equation which "tells us how much variance in Y would be 
accounted for if the model had been derived from th. population from which the 
sample was taken". He does not give the formula for Wherry. He recommends using 
Stein's formula (by hand) to check how well the model cross-validates.
Kleiber/Zeileis, Applied Econometrics with R (2008,p. 59) claim it's "Theil's 
adjusted R-squared" and don't say exactly how its interpretation varies from 
the multiple R-squared.

Dalgaard, Introductory Statistics with R (2008, p.113) writes that "if you 
multiply [adjusted R-squared] by 100%, it can be interpreted as '% variance 
reduction'. He does not say to which formula this corresponds.

I had previously thought, and read widely, that R-squared penalizes for adding 
additional variables to the model. Now the use of these different formulas 
seems to call for different interpretations? 

My two questions in short: Which formula is used by R lm()? How can I interpret 
it? 

Thank you!




Nicole Janz, PhD Cand.
Lecturer at Social Sciences Research Methods Centre 2012/13
University of Cambridge
Department of Politics and International Studies
www.nicolejanz.de | nj...@cam.ac.uk | Mobile: +44 (0) 7905 70 1 69 4
Skype: nicole.janz







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using loop for a random vector (Montecarlo method)

2013-01-28 Thread mlell08
On 28.01.2013 12:06, mary wrote:
> Hi,
> 
> I would like to replicate a sort of Montecarlo experiment:
> 
> I have to generate a random variable N(0,1) with 100 observations but I have
> to contaminate this values at certain point in order to obtain different
> vectors with different samples:

Hi,

tab<-function(N,n,E,L,a){
for(i in 1:100){
X1<-rnorm(N*(1-a),0,1)
X2<-rnorm(N*(a),E,L)
X_tab<-sample(c(X1,X2),n,replace=T)
}
return(X_tab)
}

l <- list(n = c(5,10,20), a=c(0.1,0.2,0.3), E=c(5,10),L=c(.01, .025, 1))

# translate permutation number in indices for rotating numbers, like in
# a code lock for a bike
#   n a E L
# i.e. Permutation  #0: 0,0,0,0
#   #1: 0,0,0,1
#   #2: 0,0,0,2
#   #3: 0,0,1,0
#   #4: 0,0,1,1
# Note these indices start at 0 for the sake of easier calculation.
# you have to add +1 manually,
# then you get i.e. for permutation #4: n[1], a[1], E[2], L[2]
# n is the permutation number, li is a 'list' with your parameters (l)
permPar <- function(n,li)
{
lengths<-sapply(li,length)  

len.l <- length(lengths)
len.l

ind <- rep(0,len.l)
names(ind) <- names(li)
for(i in seq(len.l,1,-1)){
ind[i] <- n %% lengths[i]
n <-  n %/% lengths[i]
if(n==0) break
}
if(n>0) stop("Index too big")
return(ind)
}
# Number of possible permutations:
tl <- 1
for (i in 1:length(l)){
tl <- tl*length(l[[i]])
}

# for each possible permutation:
# execute permPar(,li=l)
# and swap columns and rows of the resulting matrix
t(
  sapply(1:tl-1, permPar, li=l)
)

this table od parameter indices can be used for acessing the parameters
in the list l

Good Luck,
Moritz

-- 
GnuPG Key: 0x7340821E

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variability Plot For Toray Microarray Data

2013-01-28 Thread Peverall Dubois
Is there any package that allow you to perform "MA plot" like graph
for Toray microarray data?


Unlike Affymetrix CEL file which contain 2 values (R and G),
Torray raw data only contain 1 value.

MA-plot is Affymetrix specific which usually available for in  (limma)
package.


P. Dubois

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] platform specific "Depends" argument?

2013-01-28 Thread Florian Schwaiger
Dear R users,

we have a problem when building R packages which depend on platform
specific packages. The following example will illustrate our problem:

For parallel computing (in our own package) we want to use the multicore
package. Since multicore is not available for Windows we subtitute it by
the snowfall package. Currently we create two packages with different
"Depends" argument in the DESCRIPTION files (one for unix and one for
Windows).

We considered using the .onLoad function and calling library(multicore)
resp. library(snowfall) depending on the employed system, but it is said
in the help of that function that it is not "Good practice" to do so.

Is there a better practice than building two packages (with the same
name and almost same code) for the different platforms. Any help is
appreciated.


Matthias and Florian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modeling presence only data in R

2013-01-28 Thread Benito, Xavier
Dear R users,

Is graspeR package current operative? If yes, anyone would be able to say me 
how I can access to the website given by the author? (http://www.cscf.ch/grasp)

Thanks a lot to all for your compression,

Regards,

Xavier Benito Granell
PhD student
IRTA- Ecosistemes Aquàtics
Ctra. de Poblenou Km. 5,5
43540 Sant Carles de la Ràpita
(Tarragona) Spain
xavier.ben...@irta.cat
www.irta.cat


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] FI 1.0 avaliable on CRAN

2013-01-28 Thread David Valentim Dias
guRus and useRs,

FI 1.0 is new submission available on CRAN implementing common forest
inventory volume equations and factors (form and stacking).
Package is well documented and also accompanies sample dataset to start.

Development and bugs occur in http://github.com/dvdscripter/FI

Regards,
David Valentim Dias

-- 
Consultor Bioestatístico e Programador da Cogito
http://www.facebook.com/CogitoConsultoriaEstatistica
Currículo: http://lattes.cnpq.br/7541377569511492

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-28 Thread Tim Howard
Yes!  This does this trick. Thank You!
Tim

>>> peter dalgaard  1/26/2013 11:49 AM >>>

On Jan 26, 2013, at 16:32 , Tim Howard wrote:

> Duncan, 
> Good point - I guess I am expecting too much. I'll work on a global replace 
> before import or chopping with strsplit while importing. 
> 
> FYI everyone - these folks with the non-standard csv are the US Feds:
> https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm
> 
> (see off-airport AEA 2005 for a csv with issues.)

Does this do the trick?

dd <- read.csv(text=gsub("\\\"", "\"\"", fixed=TRUE,
   readLines("~/Downloads/OffAirportAEA2005List.csv")))



> 
> Thank you for the help. I really do appreciate the suggested solutions.
> Also, thanks to John Kane for the link to csvEdit. 
> 
> Tim
> 
 Duncan Murdoch  01/25/13 6:37 PM >>>
> On 13-01-25 4:37 PM, Tim Howard wrote:
>> David,
>> Thank you again for the reply. I'll try to make readLines() and strplit() 
>> work.  What bugs me is that I think it would import fine if the folks who 
>> created the csv had used double quotes "" rather than an escaped quote \" 
>> for those pesky internal quotes. Since that's the case, I'd think there 
>> would be a solution within read.csv() ... or perhaps scan()?, I just can't 
>> figure it out.
> 
> What you say doesn't make sense.  Let me paraphrase:
> 
> "The folks who sent me the data created a weird, non-standard .csv file. 
>  You'd think read.csv() could handle it."
> 
> The standard way to handle quotes in strings is to double them.  They 
> didn't, so you've got a weird file on your hands.
> 
> You can probably fix it by doing a global substitution of all "backslash 
> doublequote" pairs with "doublequote doublequote".  Unless they have 
> some "backslash backslash doublequote" triples that they want you to 
> interpret as "backslash doublequote".  Or some other weirdness.
> 
> I'd suggest you try the global subst mentioned above. or ask them for 
> data in a reasonably standardized format.
> 
> Duncan Murdoch
> 
>> best,
>> Tim
>> 
> David Winsemius  1/25/2013 4:16 PM >>>
>> 
>> On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:
>> 
>>> Great point, your fix (quote="") works for the example I gave. 
>>> Unfortunately, these text strings have commas in them as well(!).  Throw a 
>>> few commas in any of the text strings and it breaks again.  Sorry about not 
>>> including those in the example.
>>> 
>>> So, I need to incorporate commas *and* quotes with the escape character 
>>> within a single string.
>> 
>> Well you need to have _some_ delimiter. At the moment it sounds as though 
>> you might end upusing readLines() and strsplit( . , split="\\'\\,\\s\\").
>> 
>> 
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing continuous zero-inflated response

2013-01-28 Thread Achim Zeileis

On Sun, 27 Jan 2013, Kay Cichini wrote:


That said,


wilcox_test(x ~ factor(y), distribution = "exact")


or the same with oneway_test, i.e would be ok?


Yep, exactly.

And you could also look at chisq_test(factor(x > 0) ~ factor(y), 
distribtuion = approximate()) or something like that. Or of course 
Fisher's exact test.


Best,
Z



2013/1/27 Achim Zeileis 


On Sun, 27 Jan 2013, Kay Cichini wrote:

 Thanks for the reply!


Still, aren't there issues with 2-sample test vs y and excess zeroes
(->many ties), like for Mann-Whitney-U tests?



If you use the (approximate) exact distribution, that is no problem.

The problem with the Wilcoxon/Mann-Whitney test and ties is only that the
simple recursion formula for computing the exact distribution only works
without ties. Thus, it's not the exact distribution that is wrong but only
the standard algorithm for evaluating it.

Best,
Z

 Kind regards,

Kay


2013/1/26 Achim Zeileis 

 On Fri, 25 Jan 2013, Kay Cichini wrote:


 Hello,



I'm searching for a test that applies to a dataset (N=36) with a
continuous zero-inflated dependent variable



In a regression setup, one can use a regression model with a response
censored at zero. survreg() in survival fits such models, tobit() in AER
is
a convenience interface for this special case.

If the effects of a regressor can be different for the probability of a
zero and the mean of the non-zero observations, then a two-part model can
be used. E.g. a probit fit (via glm) plus a truncated regression (via
truncreg in the package of the same name).

However:


 and only one nominal grouping variable with 2 levels (balanced).





In that case I would probably use no regression model but two-sample
permutation tests, e.g. via the "coin" package.


 In fact there are 4 response variables of this kind which I plan to test


seperately - the amount of zeroes ranges from 75 to 97%..



That means you have between one (!) and nine non-zero observations. In
the
former case, it will be hard to model anything. And even in the latter
case
it will be hard to investigate the probability of zero and the mean of
the
non-zero observations separately.

I would start out with a simple two-way table of (y > 0) vs group and
conduct Fisher's exact test.

And then you might try also your favorite two sample test of y vs group,
preferably using the approximate exact distribution.

Hope that helps,
Z

 I searched the web and found several modelling approaches but have the


feeling that they are overly complex for my very simple dataset.

Thanks in advance for any help!
Kay

--

Kay Cichini, MSc Biol

Grubenweg 22, 6071 Aldrans

Tel.: 0650 9359101

E-Mail: kay.cich...@gmail.com

Web: www.theBioBucket.blogspot.co.at>
<
http://www.**thebiobucket.blogspot.co.at/











 --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help




PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html 





and provide commented, minimal, self-contained, reproducible code.





--

Kay Cichini, MSc Biol

Grubenweg 22, 6071 Aldrans

Tel.: 0650 9359101

E-Mail: kay.cich...@gmail.com

Web: www.theBioBucket.blogspot.co.**at





--

[[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.





--

Kay Cichini, MSc Biol

Grubenweg 22, 6071 Aldrans

Tel.: 0650 9359101

E-Mail: kay.cich...@gmail.com

Web: 
www.theBioBucket.blogspot.co.at
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide co

Re: [R] lapply and SpatialGridDataFrame error

2013-01-28 Thread Roger Bivand
Irucka Embry  mail2world.com> writes:

> 
> Hi all, I have a set of 54 files that I need to convert from ASCII grid
> format to .shp files to .bnd files for BayesX.
> 
> I have the following R code to operate on those files:
> 
> library(maptools)
> library(Grid2Polygons)
> library(BayesX)
> library(BayesXsrc)
> library(R2BayesX)
> 
> readfunct <- function(x)
> {
> u <- readAsciiGrid(x)
> }
> 
> modfilesmore <- paste0("MaxFloodDepth_", 1:54, ".txt")
> modeldepthsmore <- lapply(modfilesmore, readfunct)
> 
> maxdepth.plys <- lapply(modeldepthsmore, Grid2Polygons(modeldepthsmore,
> level = FALSE))
> 
...
> 
> This is the error message that I receive:
> > maxdepth.plys <- lapply(modeldepthsmore,
> Grid2Polygons(modeldepthsmore, level = FALSE))
> Error in Grid2Polygons(modeldepthsmore, level = FALSE) : Grid object not
> of class SpatialGridDataFrame
> 
> Can someone assist me in modifying the R code so that I can convert the
> set of files to .shp files and then to .bnd files for BayesX?

You also posted on R-sig-geo a few hours after posting here - certainly a more
relevant choice of list, but you are rather impatient.

I'm assuming that you have read up on how lapply() works, and realised what is
wrong with your understanding. But just in case, 

> maxdepth.plys <- lapply(modeldepthsmore, Grid2Polygons(modeldepthsmore,
> level = FALSE))

does not pass the list component from modeldepthsmore anywhere, but tries to run
Grid2Polygons on the whole list. Something like (untried):

maxdepth.plys <- lapply(modeldepthsmore, function(x) Grid2Polygons(x, level =
FALSE))

should do that. Please summarise to R-sig-geo.

Roger


> 
> Thank-you.
> 
> Irucka Embry 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract values from a raster according to Lat and long of the values?

2013-01-28 Thread Jonsson
extract will extract values if you provide the x , y but then who to know
which lat and long correspond to which x and y



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-extract-values-from-a-raster-according-to-Lat-and-long-of-the-values-tp4656767p4656848.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract values from a raster according to Lat and long of the values?

2013-01-28 Thread Jonsson
jholtman:  I do not understand you question?



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-extract-values-from-a-raster-according-to-Lat-and-long-of-the-values-tp4656767p4656847.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dummy encoding in metafor

2013-01-28 Thread Alma Wilflinger
Dear Wolfgang,

Thank you very much for answering!

1) No, I am doing a Meta Analysis and take the
 ds from existing studies. To be exactly, I do use D (Means of 
D-Measure which is not exactly d, but very similar) 
3) Yes CMA makes a new column with st. error and variance differ from 1.

for
 example my mean is 0,6, and I take 1 for the st. dev and 154 is my N, the new 
values are 0,080582 for the st. error and 6,49E-03 for the 
variance

So did I make a mistake by taking the parameters for R?
vi is my mean
yi is my variance. I took the uncalculated variance which is 1. is this wrong? 
do I have to take the variances from the new column from CMA?

kind regards,
alma



 From: Viechtbauer Wolfgang (STAT) 


t.co.uk>; "r-help@r-project.org"  
Sent: Monday, January 28, 2013 10:38 AM
Subject: RE: [R] dummy encoding in metafor

Dear Alma,

either there is a whole lot of miscommunicaton here, or you (and your 
supervisor) are way in over your head.

You say that you are working with Cohen's d values. And you mentioned CMA. So, 
let me ask you some questions:

1) Has CMA computed those d values for you?
2) If yes, what information did you supply to CMA for the computation of those 
d values? (means, standard deviations, and the sample sizes of the two groups?)
3) Did CMA then provide you with a column of values that are the corresponding 
sampling variances or standard errors? (those values should NOT all be equal to 
1!)

Best,
Wolfgang

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Alma Wilflinger
> Sent: Sunday, January 27, 2013 21:52
> To: Michael Dewey; r-help@r-project.org
> Subject: Re: [R] dummy encoding in metafor
> 
> Hi Michael!
> 
> Yes, I do use Cohens d. As a matter of fact my thesis supervisor told me
> to use 1 as the value for standard deviation for all of my studies.
> Unfortunately I am not totally sure myself why to do this have you ever
> used such an approach?
> 
> kind regards,
> Alma
> 
> 
>  From: Michael Dewey 
> 
> t.co.uk>; "r-help@r-project.org" 
> Sent: Thursday, January 24, 2013 6:57 PM
> Subject: Re: [R] dummy encoding in metafor
> 
> At 22:06 23/01/2013, Alma Wilflinger wrote:
> 
> > Hi Michael,
> >
> > The supervisor for my Master's Thesis told me that my means are the
> effect size and cause of this I have to take figure 1 for all standard
> deviations. So I hope that was the right information.
> 
> Alma
> There is a fairly comprehensive list of all the things which might be an
> effect size on
> http://en.wikipedia.org/wiki/Effect_size
> Is what you call Mean one of them?
> 
> 
> >
> > From: Michael Dewey 
> 
> wolfgang.viechtba...@maastrichtuniversity.nl>; Michael Dewey
> ; "r-help@r-project.org" 
> > Sent: Wednesday, January 23, 2013 10:22 AM
> > Subject: Re: [R] dummy encoding in metafor
> >
> > At 08:30 23/01/2013, Alma Wilflinger wrote:
> > > Dear Wolfgang and Michael,
> > >
> [[elided Yahoo spam]]
> > >
> > > Concerning the Variance: I took the variance I used for CMA (which
> is always 1), so I think it should be the right one.
> >
> > It seems unlikely to me that the variance from each study would be the
> same although I suppose it could be possible. Are you sure you are
> supplying the right values to CMA?
> >
> >
> > > Thank you for noticing and mentioning though :)
> > >
> > > I really appreciate how helpful you both are.
> > >
> > > best,
> > > Alma
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using loop for a random vector (Montecarlo method)

2013-01-28 Thread mary
Hi,

I would like to replicate a sort of Montecarlo experiment:

I have to generate a random variable N(0,1) with 100 observations but I have
to contaminate this values at certain point in order to obtain different
vectors with different samples:

tab<-function(N,n,E,L){
for(i in 1:100){
X1<-rnorm(N*(1-a),0,1)
X2<-rnorm(N*(a),E,L)
X_tab<-sample(c(X1,X2),n,replace=T)
}
return(X_tab)}

N=100 ,n=5, a= 0.1, E=5 , L=0.01

I have to replicate it for different values of this parameters, but i
wouldn't rewrite each times values, i would like a loop...
for n=5
N=100 
a=c(0.1,0.2,0.3)
E=5
L=0.01

and repeat 
for n=5
N=100 
a=c(0.1,0.2,0.3)
E=5
L=0.025

for n=5
N=100 
a=c(0.1,0.2,0.3)
E=5
L=1

I have to do this for n=5,10,20
and for E=5,10

how to do??
Thanks
M.

 



--
View this message in context: 
http://r.789695.n4.nabble.com/Using-loop-for-a-random-vector-Montecarlo-method-tp4656845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting inline hook to a function identical to default in knitr turns of exponential formatting

2013-01-28 Thread mlell08
Hello List,

while dealing with a questin of 'xiaodao' (
http://r.789695.n4.nabble.com/Problem-with-large-small-numbers-in-knitr-tp4653986.html)

I noticed that copying the default inline hook function obtained by
knit_hooks$get("inline")
into a knit_hook$set(inline = <...>) call turns off exponential
fomatting in the resulting .tex file.

I used a stripped version of 'xiaodao's example:
\documentclass{article}
\begin{document}

<<>>=
a<-1e-13
b<-2.5e-10
ssrr<-123456.12
ssru<-123400.00
@

$
c=\Sexpr{a}/\Sexpr{b}
f=\Sexpr{ssrr-ssru}/\Sexpr{ssru}
$

\end{document}

so:
knit_hooks$restore()
knit_hooks$get("inline")

## yields:
## function (x)
## {
##if (is.numeric(x))
##x = round(x, getOption("digits"))
##paste(as.character(x), collapse = ", ")
## }
## 

knit("FILENAME.Rnw")

## .tex-file:
## c=$10^{-13}$/$2.5\times 10^{-10}$
## f=56.12/$1.234\times 10^{5}$


## then run knit_hooks$set() with exactly the same function:
knit_hooks$set(inline= function (x)
{
if (is.numeric(x))
x = round(x, getOption("digits"))
paste(as.character(x), collapse = ", ")
}
)
knit("FILENAME.Rnw")

## .tex-File
## c=0/0
## f=56.12/123400

knit_hooks$get("inline")
## function (x)
## {
##  if (is.numeric(x))
##  x = round(x, getOption("digits"))
##  paste(as.character(x), collapse = ", ")
## }


The only thing that changed is no  is
printed anymore

Why does knitr change its output?

Regards, Moritz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with large/small numbers in knitr

2013-01-28 Thread mlell08
On 26.12.2012 23:28, xiaodao wrote:
> I have problems with very large numbers using knitr. In the following, my a
> and b are extremely small and ssrr and ssru are extremely large. 

> 
> \documentclass{article}
> \begin{document}
> 
> <>=
> ## numbers >= 10^5 will be denoted in scientific notation,
> ## and rounded to 2 digits
> options(scipen = 1, digits = 2)
> 
> <<>>=
> a<-1e-13
> b<-2.5e-10
> ssrr<-123456.12
> ssru<-123400.00
> @
> 
> $
> c=\Sexpr{a}/\Sexpr{b} % either this formula or the following formula will
> has error message "missing $" after click "complie" in Rstudio.
> f=\Sexpr{ssrr-ssru}/\Sexpr{ssru}
> $
> 
> \end{document}
> 

I copied your file to "abc.Pnw", then opened an R session and run
setwd("")
library(knitr)
knit("abc.Rnw")

I got the following text in the resulting abc.tex file:


$
c=$10^{-13}$/$2.5\times 10^{-10}$ % either this formula or ...
f=56.12/$1.23\times 10^{5}$
$

knitr converts large numbers in Sexpr to the exponential notation, so
you've got a nested math environment here.

You can convert your numbers to strings with as.character()

Alternative:
if you run knit_hooks$get("inline") you get the function used for inline
output procession


https://github.com/yihui/knitr/issues/33


-> if you write
knit_hooks$set()$inline
function (x)
{
if (is.numeric(x))
x = round(x, getOption("digits"))
paste(as.character(x), collapse = ", ")
}

in your setup chunk (did you forget the closing @ in your setup chunk?),
it also works. (which is strange, because this function is exactly what
is predefinded by knitr but that's another question

Regards, Moritz
-- 
GnuPG Key: 0x7340821E

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression

2013-01-28 Thread Olivier Collignon

Dear Dr Harrell,

About the mean probabilities, I was refering to the ones computed with the 
command predict(...,type="mean").
I tried to set the binwidth in SAS to 0.0001 as you suggested. 
After having negated the predictors, I found a C index of 0.968, which is 
exactly the same that rcorr.cens in R and almost the same that lrm, as you 
explained.
This solves the problem.
For information I tried to change the binwidth value several time before 
posting the message. The problem was that I found everytime the same results 
whatever the binwidth,
which I couldn't understand.
I just discover that Enterprise Guide did not take into account these changes, 
whereas SAS did it. This enabled me to found the correct results thanks to your 
advice.

Thank you again for you help,
With best wishes,
Olivier

> Date: Thu, 24 Jan 2013 13:09:46 -0600
> From: f.harr...@vanderbilt.edu
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal 
> logistic regression
> 
> Please define 'mean probabilities'.
> 
> To compute the C-index or Dxy you need anything that is monotonically 
> related to the prediction of interest, including the linear combination 
> of covariates ignoring all intercepts.   In other words you don't need 
> to go to the trouble of computing probabilities unless you are binning, 
> as the binning is usually done on a controllable 0-1 scale.   When I bin 
> I just choose the middle intercept, I seem to recall.  Also try running 
> SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer 
> for C.  [I wrote the original algorithm SAS uses for this in the old SAS 
> PROC LOGIST.  Binning was just for speed.]
> 
> You might also re-run SAS after negating the response variable.
> Frank
> 
> blackscorpio wrote
> > Dear Dr Harrell,
> > Thank you very much for your answer. Actually I also tried to found the C
> > index by hand on these data using the mean probabilities and I found
> > 0.968, as you just showed.
> > I understand now why I had a slight difference with the outpout of lrm. I
> > am thus convinced that this result is correct.
> >
> > I read on the SAS help that the procedure logistic also proceed to some
> > binning (BINWIDTH option) :
> >
> > http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
> >
> > But I cannot explain why the difference between the two softwares is that
> > huge, especially since the class probabilities are the same.
> >
> > Do you think it could be due to the fact that mean probabilities are
> > computed differently ?
> >
> > Thank for your help and best regards,
> > OC
> >
> >
> >> Date: Thu, 24 Jan 2013 05:28:13 -0800
> >> From:
> 
> > f.harrell@
> 
> >> To:
> 
> > r-help@
> 
> >> Subject: Re: [R] Difference between R and SAS in Corcordance index in
> >> ordinal logistic regression
> >>
> >> lrm does some binning to make the calculations faster.  The exact
> >> calculation
> >> is obtained by running
> >>
> >> f <- lrm(...)
> >> rcorr.cens(predict(f), DA), which results in:
> >>
> >>C IndexDxy   S.D.  n
> >> missing
> >> 0.96814404 0.93628809 0.0380833632.
> >> 0.
> >> uncensored Relevant Pairs Concordant  Uncertain
> >>32.   722.   699. 0.
> >>
> >> I.e., C=68 instead of .963.  But this is even farther away than the
> >> value
> >> from SAS you reported.
> >>
> >> If you don't believe the rcorr.cens result, create a tiny example and do
> >> the
> >> calculations by hand.
> >> Frank
> >>
> >>
> >> blackscorpio81 wrote
> >> > Dear R users,
> >> >
> >> > Please allow to me ask for your help.
> >> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
> >> > logistic regression with proportional odds. In order to assess model
> >> > predictive ability, C concordance index is displayed and equals to
> >> 0.963.
> >> >
> >> > This is the code I used with the data attached
> >> > data.csv ;
> >> >  :
> >> >
> >> >>require(rms)
> >> >>a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
> >> >>lrm(DA~SJ+TJ,data=
> >> >
> >> > Logistic Regression Model
> >> >
> >> > lrm(formula =A~SJ+TJ, data = a)
> >> >
> >> > Frequencies of Responses
> >> >
> >> >  1  2  3  4
> >> >  6 13  9  4
> >> >
> >> >   Model Likelihood
> >> > Discrimination  Rank Discrim.
> >> >  Ratio Test
> >> > Indexes   Indexes
> >> > Obs32  LR chi2  53.14
> >> R2
> >> > 0.875  C   0.963
> >> > max |deriv| 6e-06 d.f. 2g
> >> > 8.690Dxy 0.925
> >> >  Pr(> chi2) <0.0001
> >> gr
> >> > 5942.469  

Re: [R] dummy encoding in metafor

2013-01-28 Thread Viechtbauer Wolfgang (STAT)
Dear Alma,

either there is a whole lot of miscommunicaton here, or you (and your 
supervisor) are way in over your head.

You say that you are working with Cohen's d values. And you mentioned CMA. So, 
let me ask you some questions:

1) Has CMA computed those d values for you?
2) If yes, what information did you supply to CMA for the computation of those 
d values? (means, standard deviations, and the sample sizes of the two groups?)
3) Did CMA then provide you with a column of values that are the corresponding 
sampling variances or standard errors? (those values should NOT all be equal to 
1!)

Best,
Wolfgang

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Alma Wilflinger
> Sent: Sunday, January 27, 2013 21:52
> To: Michael Dewey; r-help@r-project.org
> Subject: Re: [R] dummy encoding in metafor
> 
> Hi Michael!
> 
> Yes, I do use Cohens d. As a matter of fact my thesis supervisor told me
> to use 1 as the value for standard deviation for all of my studies.
> Unfortunately I am not totally sure myself why to do this have you ever
> used such an approach?
> 
> kind regards,
> Alma
> 
> 
>  From: Michael Dewey 
> 
> t.co.uk>; "r-help@r-project.org" 
> Sent: Thursday, January 24, 2013 6:57 PM
> Subject: Re: [R] dummy encoding in metafor
> 
> At 22:06 23/01/2013, Alma Wilflinger wrote:
> 
> > Hi Michael,
> >
> > The supervisor for my Master's Thesis told me that my means are the
> effect size and cause of this I have to take figure 1 for all standard
> deviations. So I hope that was the right information.
> 
> Alma
> There is a fairly comprehensive list of all the things which might be an
> effect size on
> http://en.wikipedia.org/wiki/Effect_size
> Is what you call Mean one of them?
> 
> 
> >
> > From: Michael Dewey 
> 
> wolfgang.viechtba...@maastrichtuniversity.nl>; Michael Dewey
> ; "r-help@r-project.org" 
> > Sent: Wednesday, January 23, 2013 10:22 AM
> > Subject: Re: [R] dummy encoding in metafor
> >
> > At 08:30 23/01/2013, Alma Wilflinger wrote:
> > > Dear Wolfgang and Michael,
> > >
> [[elided Yahoo spam]]
> > >
> > > Concerning the Variance: I took the variance I used for CMA (which
> is always 1), so I think it should be the right one.
> >
> > It seems unlikely to me that the variance from each study would be the
> same although I suppose it could be possible. Are you sure you are
> supplying the right values to CMA?
> >
> >
> > > Thank you for noticing and mentioning though :)
> > >
> > > I really appreciate how helpful you both are.
> > >
> > > best,
> > > Alma

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] importing data

2013-01-28 Thread Ivan Calandra

What about this then:
list_of_datasets <- lapply(file_names, read.table, other_args_to_read.table)

Something that might then be useful is:
names(list_of_datasets) <- file_names

Does it do it now?

Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calan...@u-bourgogne.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 28/01/13 07:34, Ray Cheung a écrit :
Thanks a million for all help provided!! I can do what I intend to 
using the "for loop". However, I'm still eager to try the list.files 
approach. Here is the error message that I got using Ivan's code:


> list_of_dataset <- do.call(read.table, file_names)
Error in do.call(read.table, file_names) : second argument must be a list

Please advise.

Ray

On Sun, Jan 27, 2013 at 10:57 PM, Ivan Calandra 
mailto:ivan.calan...@u-bourgogne.fr>> 
wrote:


Hi Ray!

I'm insisting with list.files...!

What about like this (untested)?
file_names <- list.files(path="C:/.../data", pattern=".dat$",
full.names=TRUE)
list_of_dataset <- do.call(read.table, file_names)

Let me know if this helps!
Ivan

-- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282
Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE
+33(0)3.80.39.63.06 
ivan.calan...@u-bourgogne.fr 
http://biogeosciences.u-bourgogne.fr/calandra

Le 26/01/13 10:03, Ray Cheung a écrit :

Thanks for your commands, Ivan and Michael! However, I am still
not producing the right codes. Would you please help me on
this? I've written the following codes. Please comment. Thank you
very much.
Task: Reading data1.dat to data1000.dat (with missing files) into
R. Missing files can be omitted in the list.
###FUNCTION TO READ FILES
little_helpful <- function(n) {
file_name <- paste0("C:/.../data", n, ".dat")
read.table(file_name)
}
###RETURN AN OBJECT WHICH CHECKS FOR THE EXISTENCE OF FILES
check  <- function(n) {
a <- ifelse(file.exists(paste0("C:/.../data", n, ".dat")), 1, 0)
a
}
###Combining the functions
IMPORT <- function(n) {
   L <- check(1:n)
   for (i in 1:n) {
  if (L[i] == 1)
  list_of_datasets <- lapply(i, little_helpful) else
list_of_datasets <- 0
  }
   list_of_datasets
   }
Thanks for all comments.
Best Regards,
Ray

On Fri, Jan 25, 2013 at 5:48 PM, Ivan Calandra
mailto:ivan.calan...@u-bourgogne.fr>> wrote:

Hi,

Not sure this is what you need, but what about list.files()?
It can get you all the files from a given folder, and you
could then work this list with regular expressions for example.

HTH,
Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06 
ivan.calan...@u-bourgogne.fr

http://biogeosciences.u-bourgogne.fr/calandra

Le 25/01/13 10:00, R. Michael Weylandt a écrit :

On Fri, Jan 25, 2013 at 6:11 AM, Ray Cheung
mailto:ray1...@gmail.com>> wrote:

Dear Michael,

Thanks for your codes. However, lapply does not work
in my case since I've
some files missing in the data (say, the file
data101.dat). Do you have any
suggestions on this?? Thank you very much.

You could simply add a test using file.exists() but I'm
not sure what
you want to do with the M matrix then -- omit the slice
(so the others
are all shifted down one) or fill it entirely with NA's.

Michael

__
R-help@r-project.org 
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.


__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Attempting to confirm a program i wrote in C (normalize 2 datasets, transform into histogram, transform into CDF, perform KS test)

2013-01-28 Thread Tarskin
The C program takes 2 mzML files from which the binary strings (according to
the X data and Y data is uncompressed/decoded), it then examines spectral
(xy data) similiary and combines both datasets into a new one and finally
after all similar spectra have been merged it writes it all back into 1 new
mzML file. I am assuming that people do not want to get several complete
mzML files however, I will however include 2 spectra from different sources
that were extracted from the total mzML files that I have been using to test
this with.

IgG2_G1F.data   
IgG2_G1F.data   

The steps i performed in R:

experimental<-read.table(, header=T)
predicted<-read.table(,header=T)
hist_e<-hist(rep(experimental[,1],experimental[,2]), breaks=seq(0,2500,1))
hist_p<-hist(rep(predicted[,1],predicted[,2]),breaks=seq(0,2500,1))
-- up to here it does what I expect, my own C program would now transform
the GSL hist object to a GSL cdf object --
ecdf(hist_p) seemed the logical choice to me however it complains about
error in rank and my attempts to figure out what this means haven't been
very clear so far.

I would appreciate any pointer saying why this doesn't do what I'm
expecting.







--
View this message in context: 
http://r.789695.n4.nabble.com/Attempting-to-confirm-a-program-i-wrote-in-C-normalize-2-datasets-transform-into-histogram-transform-tp4656704p4656833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a random matrix with conditions

2013-01-28 Thread Jeff Newmiller
First you must learn to be more specific in your description of what you want. 
That will allow others to understand what you actually want rather than 
guessing.

Perhaps try creating a small (3 var by 10 values) example, and describe the 
actual correlations you have created.

If your problem is that you don't know how to create what you want with any 
tool, then you should figure that out before posting here. 
stats.stackexchange.com might be helpful if that is the case.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Simon Givoli  wrote:

>Hi!
>
>I want to create a random matrix with 15 variables, each variable
>having
>1000 observations.
>Between each two variables, I want to define a specific (*not *random)
>correlations between them, but still saving the "randomness" of each
>variable (mean=zero, s.d=1).
>How can I do this in R?
>
>thanks,
>Simon
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loops

2013-01-28 Thread Francesca
Thanks to you all,
they are very useful and I am learning a lot.
Best,
Francesca

On 27 January 2013 19:20, arun  wrote:

>
>
> Hi,
>
> You could use library(plyr) as well
> library(plyr)
> pnew<-colSums(aaply(laply(split(as.data.frame(p),((1:nrow(as.data.frame(p))-1)%/%
> 25)+1),as.matrix),c(2,3),function(x) x))
> res<-rbind(t(pnew),colSums(p))
> row.names(res)<-1:nrow(res)
> res<- 100-100*abs(res/rowSums(res)-(1/3))
> A.K.
>
>
> - Original Message -
> From: Rui Barradas 
> To: Francesca 
> Cc: r-help@r-project.org
> Sent: Sunday, January 27, 2013 6:17 AM
> Subject: Re: [R] Loops
>
> Hello,
>
> I think there is an error in the expression
>
> 100-(100*abs(fa1[i]/sum(fa1[i])-(1/3)))
>
> Note that fa1[i]/sum(fa1[i]) is always 1. If it's fa1[i]/sum(fa1), try
> the following, using lists to hold the results.
>
>
> # Make up some data
> set.seed(6628)
> p <- matrix(runif(300), nrow = 100)
>
> idx <- seq(1, 100, by = 25)
> fa <- lapply(idx, function(i) colSums(p[i:(i + 24), ]))
> fa[[5]] <- colSums(p)
>
> fab <- lapply(fa, function(x) 100 - 100*abs(x/sum(x) - 1/3))
> fab
>
> You can give names to the lists elements, if you want to.
>
>
> names(fa) <- paste0("fa", 1:5)
> names(fab) <- paste0("fa", 1:5, "b")
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 27-01-2013 08:02, Francesca escreveu:
> > Dear Contributors,
> > I am asking help on the way how to solve a problem related to loops for
> > that I always get confused with.
> > I would like to perform the following procedure in a compact way.
> >
> > Consider that p is a matrix composed of 100 rows and three columns. I
> need
> > to calculate the sum over some rows of each
> > column separately, as follows:
> >
> > fa1<-(colSums(p[1:25,]))
> >
> > fa2<-(colSums(p[26:50,]))
> >
> > fa3<-(colSums(p[51:75,]))
> >
> > fa4<-(colSums(p[76:100,]))
> >
> > fa5<-(colSums(p[1:100,]))
> >
> >
> >
> > and then I need to  apply to each of them the following:
> >
> >
> > fa1b<-c()
> >
> > for (i in 1:3){
> >
> > fa1b[i]<-(100-(100*abs(fa1[i]/sum(fa1[i])-(1/3
> >
> > }
> >
> >
> > fa2b<-c()
> >
> > for (i in 1:3){
> >
> > fa2b[i]<-(100-(100*abs(fa2[i]/sum(fa2[i])-(1/3
> >
> > }
> >
> >
> > and so on.
> >
> > Is there a more efficient way to do this?
> >
> > Thanks for your time!
> >
> > Francesca
> >
> > --
> > Francesca Pancotto, PhD
> > Università di Modena e Reggio Emilia
> > Viale A. Allegri, 9
> > 40121 Reggio Emilia
> > Office: +39 0522 523264
> > Web: https://sites.google.com/site/francescapancotto/
> > --
> >
> > [[alternative HTML version deleted]]
> >
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a random matrix with conditions

2013-01-28 Thread D. Rizopoulos
You didn't say how you want these variables to be distributed, but in 
case you want a multivariate normal, then have a look at function 
mvrnorm() from package MASS, and especially at the 'empirical' argument, 
e.g.,

library(MASS)

# assumed covariance matrix
V <- cbind(c(2, 1), c(1, 1.2))
V

x1 <- mvrnorm(1000, c(0,0), V, empirical = FALSE)
var(x1)

x2 <- mvrnorm(1000, c(0,0), V, empirical = TRUE)
var(x2)


I hope it helps.

Best,
Dimitris


On 1/28/2013 7:11 AM, Simon Givoli wrote:
> Hi!
>
> I want to create a random matrix with 15 variables, each variable having
> 1000 observations.
> Between each two variables, I want to define a specific (*not *random)
> correlations between them, but still saving the "randomness" of each
> variable (mean=zero, s.d=1).
> How can I do this in R?
>
> thanks,
> Simon
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.