date:20111202

Re: [R] how to get inflection point in binomial glm

2011-12-02 Thread Rubén Roa

René,

Yes, to fit a re-parameterized logistic model I think you'd have to code the 
whole enchilada yourself, not relying on glm (but not nls() as nls() deals with 
least squares minimization whereas here we want to minimize a minus log 
binomial likelihood).

I did that and have the re-parameterized logistic model in a package I wrote 
for a colleague (this package has the logistic fit fully functional and 
documented).
My code though only considers one continuous predictor.

If you want I may email you this package and you figure out how to deal with 
the categorical predictor.
One thing I believe at this point is that you'd have to do the inference on the 
continuous predictor _conditional_ on certain level(s) of the categorical 
predictor.

Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de René Mayer
Enviado el: jueves, 01 de diciembre de 2011 20:34
Para: David Winsemius
CC: r-help Help
Asunto: Re: [R] how to get inflection point in binomial glm

Thanks David and Rubén!

@David: indeed 15 betas I forgot the interaction terms, thanks for correcting!

@Rubén:  the re-parameterize would be done within nls()? how to do this 
practically with including the factor predictor?

and yes, we can solve within each group for Y=0 getting

0=b0+b1*X |-b0
-b0=b1*X |/b1
-b0/b1=X

but I was hoping there might a more general solution for the case of multiple 
logistic regression.


HTH

René

Zitat von David Winsemius dwinsem...@comcast.net:


 On Dec 1, 2011, at 8:24 AM, René Mayer wrote:

 Dear All,

 I have a binomial response with one continuous predictor (d) and one 
 factor (g) (8 levels dummy-coded).

 glm(resp~d*g, data, family=binomial)

 Y=b0+b1*X1+b2*X2 ... b7*X7

 Dear Dr Mayer;

 I think it might be a bit more complex than that. I think you should 
 get 15 betas rather than 8. Have you done it?


 how can I get the inflection point per group, e.g., P(d)=.5

 Wouldn't that just be at d=1/beta in each group? (Thinking, perhaps 
 naively, in the case of X=X1 that

 (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta *d*(X==X1) )  # all other 
 terms = 0

 And taking the log of both sides, and then use middle school math to solve.

 Oh, wait. Muffed my first try on that for sure.  Need to add back both 
 the constant intercept and the baseline d coefficient for the
 non-b0 levels.

 (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d +
 beta_n + beta_d_n *d*(X==Xn) )

 And just

 (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d ) # for the 
 reference level.

 This felt like an exam question in my categorical analysis course 25 
 years ago. (Might have gotten partial credit for my first stab, 
 depending on how forgiving the TA was that night.)


 I would be grateful for any help.

 Thanks in advance,
 René

 --

 David Winsemius, MD
 West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R2Cuba package, failed with message ‘Dimension out of range’

2011-12-02 Thread Paul Hiemstra

Hi Sachin,

In this mail there is not enough context to provide you with advice.
Please read the posting guide:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


regards,
Paul

On 12/02/2011 05:09 AM, Sachinthaka Abeywardana wrote:
 Hi All,

 I get the message failed with message Dimension out of range when using
 cuhre in package R2Cuba. Does anyone know what this mean? Or would I need
 to email the package author?

 The funny thing is it does give a result and comparing it to
 adaptIntegrate in package cubature, the two numbers are very close.

 Thanks,
 Sachin

   [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help in dbWriteTable

2011-12-02 Thread Andreas Borg


Hello,

you can fetch the column names of a table with dbListFields and then 
reorder or rename the data frame according to those.


If you want more specific help, provide an example (RSQLite would be a 
good choice as database engine to make it easily reproducible for others).


Best regards,

Andreas

arunkumar schrieb:
hi 

 I need some help in dbWriteTable. 
I'm not able to insert the rows in the table if the column order are not

same in the database and in the dataframe which i'm inserting. Also facing
issue if the table is already created externally and inserting it thru
dbWrite.

is there some way that we can sepecify the rownames in the dbwrite..or any
method which will solve my problem



--
View this message in context: 
http://r.789695.n4.nabble.com/help-in-dbWriteTable-tp4145110p4145110.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

  



--
Andreas Borg
Abteilung Medizinische Informatik

Universitätsmedizin der Johannes Gutenberg-Universität Mainz
Institut für Med. Biometrie, Epidemiologie und Informatik (IMBEI)
Obere Zahlbacher Straße 69, 55131 Mainz

Tel:  +49 (0) 6131 17-5062

E-Mail: andreas.b...@uni-mainz.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem subsetting: undefined columns

2011-12-02 Thread Aurélien PHILIPPOT

Dear R-users,
-I am new to R, and I am struggling with the following problem.

-I am repeating the following  operations hundreds of times, within a loop:
I want to subset a data frame by columns. I am interested in the columns
names that are given by the rows of another data frame that was built in
parallel. The solution I have so far works well as long as the elements of
the second data frame are included in the column names of the first data
frame but if an element from the second object is not a column name of the
first one, then it bugs.


-More concretely, I have the following data frames d and v:
mmdd-c(19720601, 19720602, 19720605)
sret.10006-c(1,2,3)
sret.10014-c(5,9,7)
sret.10065-c(10,2,11)


d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
sret.10014=sret.10014, sret.10065=sret.10065)

v- data.frame(V1=sret.10006, V2=sret.10090)
v- sapply(v, function(x) levels(x)[x])

-I want to do the following subsetting:
sub- subset(d, select=c(v))


and I get the following error message:
Error in `[.data.frame`(x, r, vars, drop = drop) :
  undefined columns selected



Any help would be very much appreciated,

Best,
Aurelien

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What's the baseline model when using coxph with factor variables?

2011-12-02 Thread Andreas Schlicker


William and David, thanks for your help.
The contrasts option was indeed what I was looking for but didn't find.

andi

On 01.12.2011 20:56, David Winsemius wrote:


On Dec 1, 2011, at 1:00 PM, William Dunlap wrote:


Terry will correct me if I'm wrong, but I don't think the
answer to this question is specific to the coxph function.


It depends on our interpretation of the questioner's intent. My answer
was predicated on the assumption that the phrase baseline model
meant baseline survival function, ... S_0(t) in survival analysis
notation.



For all the [well-written] formula-based modelling functions
(essentially, those that call model.frame and model.matrix to
interpret
the formula) the option contrasts controls how factor
variables are parameterized in the model matrix.  contr.treatment
makes the baseline the first factor level, contr.SAS makes
the baseline the last, contr.sum makes the baseline the mean,
etc.  E.g.,


df- data.frame(time=sin(1:20)+2,

   cens=rep(c(0,0,1), len=20),
   var1=factor(rep(0:1, each=10)),
   var2=factor(rep(0:1, 10)))

options(contrasts=c(contr.treatment, contr.treatment))
coxph(Surv(time, cens) ~ var1 + var2, data=df)

Call:
coxph(formula = Surv(time, cens) ~ var1 + var2, data = df)


coef exp(coef) se(coef)  zp
var11 0.1640  1.180.822 0.1995 0.84
var21 0.0806  1.080.830 0.0971 0.92

Likelihood ratio test=0.05  on 2 df, p=0.974  n= 20, number of
events= 6

options(contrasts=c(contr.SAS, contr.SAS))
coxph(Surv(time, cens) ~ var1 + var2, data=df)

Call:
coxph(formula = Surv(time, cens) ~ var1 + var2, data = df)


 coef exp(coef) se(coef)   zp
var10 -0.1640 0.8490.822 -0.1995 0.84
var20 -0.0806 0.9230.830 -0.0971 0.92

Likelihood ratio test=0.05  on 2 df, p=0.974  n= 20, number of
events= 6

options(contrasts=c(contr.sum, contr.sum))
coxph(Surv(time, cens) ~ var1 + var2, data=df)

Call:
coxph(formula = Surv(time, cens) ~ var1 + var2, data = df)


 coef exp(coef) se(coef)   zp
var11 -0.0820 0.9210.411 -0.1995 0.84
var21 -0.0403 0.9600.415 -0.0971 0.92

Likelihood ratio test=0.05  on 2 df, p=0.974  n= 20, number of
events= 6

(lm() has a contrasts argument that can override
getOption(contrasts)
and set different contrasts for each variable but coxph() does not
have
that argument.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
] On Behalf Of David Winsemius
Sent: Thursday, December 01, 2011 9:36 AM
To: a.schlic...@nki.nl
Cc: r-help@r-project.org
Subject: Re: [R] What's the baseline model when using coxph with
factor variables?


On Dec 1, 2011, at 12:00 PM, Andreas Schlicker wrote:


Hi all,

I'm trying to fit a Cox regression model with two factor variables
but have some problems with the interpretation of the results.
Considering the following model, where var1 and var2 can assume
value 0 and 1:

coxph(Surv(time, cens) ~ factor(var1) * factor(var2),  data=temp)

What is the baseline model? Is that considering the whole population
or the case when both var1 and var2 = 0?


This has been discussed several times in the past on rhelp. My
suggestion would be to search your favorite rhelp archive using
baseline hazard Therneau, since Terry Therneau is the author of
survival. (The answer is closer to the first than to the second.)



Kind regards,
andi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Resampling with replacement on a binary (0, 1) dataset to get Cis

2011-12-02 Thread lincoln

Thanks.
Anyway, it is not homework and I was not told to do that. My question has
not been answered yet, I'll try to reformulate it:
Does it make (statistical) sense to resample with replacement in this
situation to get an estimate of the CIs? In case it does, how could I do it
in R?

Some further details on my real case study:
10 independent samples from a population in ten sessions. Each sample
consists of a number (somehow variable) of random individuals that are
classified as 0 or 1 depending on one specific state (presence or absence of
a disease).
I can calculate, for each session, the percentage of individuals diseased
but I have nothing about the CIs, any suggestion? 


--
View this message in context: 
http://r.789695.n4.nabble.com/Resampling-with-replacement-on-a-binary-0-1-variable-to-get-CIs-tp4127990p4145733.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] export array

2011-12-02 Thread Ana

What is the best way to export 1 array??

the array i am trying to export has 3 dimensions (long,lat,observations)

how can i export each dimension independently?
e.g. one csv file with only the long

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] export array

2011-12-02 Thread Jim Holtman

What do you want to do with it after you export; that will probably define what 
the data format would look like.  Why would you want each dimension separately? 
 How would you correlate them later?  Is it really 3 dimensions, or is your 
data just three columns where each row is long, lat and observation?  A small 
subset of the data would be helpful. Are you going to read it back into R, or 
send it somewhere else?  More information would be useful because you can 
create almost any output format that you want.

Sent from my iPad

On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote:

 What is the best way to export 1 array??
 
 the array i am trying to export has 3 dimensions (long,lat,observations)
 
 how can i export each dimension independently?
 e.g. one csv file with only the long
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem subsetting: undefined columns

2011-12-02 Thread Jim Holtman

?try

If you know that you might have a problem with undefined columns, or whatever, 
then trap the error with 'try' so your program can recover.  You could also 
validate the data that you are going to use before entering the loop; standard 
defensive programming - errors are always going to happen, so guard against 
them.

Sent from my iPad

On Dec 2, 2011, at 2:20, Aurélien PHILIPPOT aurelien.philip...@gmail.com 
wrote:

 Dear R-users,
 -I am new to R, and I am struggling with the following problem.
 
 -I am repeating the following  operations hundreds of times, within a loop:
 I want to subset a data frame by columns. I am interested in the columns
 names that are given by the rows of another data frame that was built in
 parallel. The solution I have so far works well as long as the elements of
 the second data frame are included in the column names of the first data
 frame but if an element from the second object is not a column name of the
 first one, then it bugs.
 
 
 -More concretely, I have the following data frames d and v:
 mmdd-c(19720601, 19720602, 19720605)
 sret.10006-c(1,2,3)
 sret.10014-c(5,9,7)
 sret.10065-c(10,2,11)
 
 
 d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
 sret.10014=sret.10014, sret.10065=sret.10065)
 
 v- data.frame(V1=sret.10006, V2=sret.10090)
 v- sapply(v, function(x) levels(x)[x])
 
 -I want to do the following subsetting:
 sub- subset(d, select=c(v))
 
 
 and I get the following error message:
 Error in `[.data.frame`(x, r, vars, drop = drop) :
  undefined columns selected
 
 
 
 Any help would be very much appreciated,
 
 Best,
 Aurelien
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem subsetting: undefined columns

2011-12-02 Thread Paul Hiemstra

On 12/02/2011 07:20 AM, Aurélien PHILIPPOT wrote:
 Dear R-users,
 -I am new to R, and I am struggling with the following problem.

 -I am repeating the following  operations hundreds of times, within a loop:
 I want to subset a data frame by columns. I am interested in the columns
 names that are given by the rows of another data frame that was built in
 parallel. The solution I have so far works well as long as the elements of
 the second data frame are included in the column names of the first data
 frame but if an element from the second object is not a column name of the
 first one, then it bugs.

Hi Aurelien,

I would call this a feature, not a bug. I think R does what it should
do, you request a non-existent column and it throws an error. What kind
of behavior are you looking for instead of this error?

regards,
Paul


 -More concretely, I have the following data frames d and v:
 mmdd-c(19720601, 19720602, 19720605)
 sret.10006-c(1,2,3)
 sret.10014-c(5,9,7)
 sret.10065-c(10,2,11)


 d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
 sret.10014=sret.10014, sret.10065=sret.10065)

 v- data.frame(V1=sret.10006, V2=sret.10090)
 v- sapply(v, function(x) levels(x)[x])

 -I want to do the following subsetting:
 sub- subset(d, select=c(v))


 and I get the following error message:
 Error in `[.data.frame`(x, r, vars, drop = drop) :
   undefined columns selected



 Any help would be very much appreciated,

 Best,
 Aurelien

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] export array

2011-12-02 Thread Ana

Hi!

I would just like to have a way to check if my functions are working ok.
If the subset I am extracting is ok (both coordinates and dataset).

The files are nectdf format that I import into R (I only import a
small geographic subset).
Is there another software that will allow me to do this just to check
if my code is ok?




On Fri, Dec 2, 2011 at 11:00 AM, Jim Holtman jholt...@gmail.com wrote:
 What do you want to do with it after you export; that will probably define 
 what the data format would look like.  Why would you want each dimension 
 separately?  How would you correlate them later?  Is it really 3 dimensions, 
 or is your data just three columns where each row is long, lat and 
 observation?  A small subset of the data would be helpful. Are you going to 
 read it back into R, or send it somewhere else?  More information would be 
 useful because you can create almost any output format that you want.

 Sent from my iPad

 On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote:

 What is the best way to export 1 array??

 the array i am trying to export has 3 dimensions (long,lat,observations)

 how can i export each dimension independently?
 e.g. one csv file with only the long

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] export array

2011-12-02 Thread Jim Holtman

Depends on how you want to 'check'. I usually use 'View' to see if the data 
looks OK. You could write some more code to check the 'reasonableness' of the 
data.  It sounds like you have to learn some ways of 'debugging' your code.  
Checking your data depends on what the criteria is for determining correctness. 
I will also export to Excel to let other people see if it is reasonable, but 
again it depends on the problem you are trying to solve.

Sent from my iPad

On Dec 2, 2011, at 5:07, Ana rrast...@gmail.com wrote:

 Hi!
 
 I would just like to have a way to check if my functions are working ok.
 If the subset I am extracting is ok (both coordinates and dataset).
 
 The files are nectdf format that I import into R (I only import a
 small geographic subset).
 Is there another software that will allow me to do this just to check
 if my code is ok?
 
 
 
 
 On Fri, Dec 2, 2011 at 11:00 AM, Jim Holtman jholt...@gmail.com wrote:
 What do you want to do with it after you export; that will probably define 
 what the data format would look like.  Why would you want each dimension 
 separately?  How would you correlate them later?  Is it really 3 dimensions, 
 or is your data just three columns where each row is long, lat and 
 observation?  A small subset of the data would be helpful. Are you going to 
 read it back into R, or send it somewhere else?  More information would be 
 useful because you can create almost any output format that you want.
 
 Sent from my iPad
 
 On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote:
 
 What is the best way to export 1 array??
 
 the array i am trying to export has 3 dimensions (long,lat,observations)
 
 how can i export each dimension independently?
 e.g. one csv file with only the long
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about plsr() results

2011-12-02 Thread Bjørn-Helge Mevik

Vytautas Rakeviius vytautas1...@yahoo.com writes:

 But still I have question about results interpretation. In the end I
 want to construct prediction function in form:
 Y=a1x1+a2x2

The predict() function does the prediction for you.  If you want to
construct the prediction _equation_, you can extract the coefficients
from the model with

coef(yourmodel, ncomp = thenumberofcomponents, intercept = TRUE)

See ?coef.mvr for details.

 Documentation do not describe this.

The pls package is designed to work as much as possible like the lm()
function and its methods, helpers.  So read any introduction to linear
models in R, and you will come a long way.

There is also a paper in JSS about the pls package: 
http://www.jstatsoft.org/v18/i02/

-- 
Cheers,
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] find and replace string

2011-12-02 Thread Alaios

Dear all,
I would like to search in a string for the second occurrence of a symbol and 
replace the symbol after it

For example my strings look like

sta_+1+0_field2ndtry_$01.cfg

I want to find the digit that comes after the second +, in that case is zero
and then over a loop create the strings below

sta_+1+0_field2ndtry_$01.cfg

sta_+1+1_field2ndtry_$01.cfg

sta_+1+2_field2ndtry_$01.cfg

sta_+1+3_field2ndtry_$01.cfg

and so on..
I have already tried strsplit but this will make things more complex... 

Could you please help me with that?

B.R
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find and replace string

2011-12-02 Thread jim holtman

try this:

 x - c('sta_+1+0_field2ndtry_$01.cfg'
+  , 'sta_+1+0_field2ndtry_$01.cfg'
+  , 'sta_+1-0_field2ndtry_$01.cfg'
+  , 'sta_+1+0_field2ndtry_$01.cfg'
+  )
 # find matching fields
 values - grep([^+]*\\+[^+]*\\+0, x, value = TRUE)
 # split into two pieces
 splitValues - sub(([^+]*\\+[^+]*\\+)0(.*), \\1^\\2, values)
 for (i in splitValues){
+ for (j in 0:3){
+ print(sub(\\^, j, i))
+ }
+ }
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg

On Fri, Dec 2, 2011 at 6:30 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to search in a string for the second occurrence of a symbol and 
 replace the symbol after it

 For example my strings look like

 sta_+1+0_field2ndtry_$01.cfg

 I want to find the digit that comes after the second +, in that case is zero
 and then over a loop create the strings below

 sta_+1+0_field2ndtry_$01.cfg

 sta_+1+1_field2ndtry_$01.cfg

 sta_+1+2_field2ndtry_$01.cfg

 sta_+1+3_field2ndtry_$01.cfg

 and so on..
 I have already tried strsplit but this will make things more complex...

 Could you please help me with that?

 B.R
 Alex

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Intersection of 2 matrices

2011-12-02 Thread oluwole oyebamiji

Hi all,
    I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. I would 
like to find the number of rows of matrix B that I can find in matrix A (rows 
that are common to both matrices with or without sorting).

I have tried the intersection and is.element functions in R but it only 
working for the vectors and not matrix
i.e,    intersection(A,B) and is.element(A,B).

Any suggestions please.
 
 Oluwole
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rmpi installation problems

2011-12-02 Thread Jos Elkink

Hi all,

I am trying to install the Rmpi package in R and, while the installation
itself works, it breaks down when trying to load the library. I think it
has something to do with shared vs static loading of helper libraries,
or the order in which shared libraries are loaded, but I am not sure.

R version: 2.14.0
Linux version: Gentoo, i686-pc-linux-gnu (32-bit)
GCC version: 4.5.3 (Gentoo 4.5.3-r1 p1.0)
OpenMPI version: 1.5.4

Output from R CMD INSTALL . in Rmpi source directory:

* installing to library ‘/home/jos/R/i686-pc-linux-gnu-library/2.14’
* installing *source* package ‘Rmpi’ ...
checking for gcc... i686-pc-linux-gnu-gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether i686-pc-linux-gnu-gcc -std=gnu99 accepts -g... yes
checking for i686-pc-linux-gnu-gcc -std=gnu99 option to accept ISO C89... none 
needed
I am here /usr and it is OpenMPI
Trying to find mpi.h ...
Found in /usr/include
Trying to find libmpi.so or libmpich.a ...
Found libmpi in /usr/lib
checking for openpty in -lutil... yes
checking for main in -lpthread... yes
configure: creating ./config.status
config.status: creating src/Makevars
** Creating default NAMESPACE file
** libs
make: Nothing to be done for `all'.
installing to /home/jos/R/i686-pc-linux-gnu-library/2.14/Rmpi/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
/usr/lib/R/bin/exec/R: symbol lookup error: 
/usr/lib/openmpi/mca_paffinity_hwloc.so: undefined symbol: 
mca_base_param_reg_int
ERROR: loading failed
* removing ‘/home/jos/R/i686-pc-linux-gnu-library/2.14/Rmpi’

Any help would be greatly appreciated!

Jos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] about quantreg() package loading

2011-12-02 Thread narendarreddy kalam

Hi all,
i have installed the quantreg Package using the install packages from local
zip fiels option.
then i got the following message
utils:::menuInstallLocal()
package ‘quantreg’ successfully unpacked and MD5 sums checked
is that mean quantreg package got installed on my machine??
if so why i am encountered the following error when loading quantreg package
using
library(quantreg) command
Loading required package: SparseM
Error: package ‘SparseM’ could not be loaded
In addition: Warning message:
In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
lib.loc) :
  there is no package called ‘SparseM’
what is the reason for it..
pls reply as early as possible

--
View this message in context: 
http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146366.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about quantreg() package loading

2011-12-02 Thread narendarreddy kalam

Hi all, 
my os is windows 7 and  R version is 2.14.and i used the qunatreg zip
file(binary version for windows) to install.

--
View this message in context: 
http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146390.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find and replace string

2011-12-02 Thread christiaan pauw

If the length of the fists part is constant (the sta_+1+ part) the
you can use substr()




On 2 December 2011 13:30, Alaios ala...@yahoo.com wrote:

 Dear all,
 I would like to search in a string for the second occurrence of a symbol and 
 replace the symbol after it

 For example my strings look like

 sta_+1+0_field2ndtry_$01.cfg

 I want to find the digit that comes after the second +, in that case is zero
 and then over a loop create the strings below

 sta_+1+0_field2ndtry_$01.cfg

 sta_+1+1_field2ndtry_$01.cfg

 sta_+1+2_field2ndtry_$01.cfg

 sta_+1+3_field2ndtry_$01.cfg

 and so on..
 I have already tried strsplit but this will make things more complex...

 Could you please help me with that?

 B.R
 Alex




--
Christiaan Pauw
Nova Institute
www.nova.org.za

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error message in Genetic Matching

2011-12-02 Thread shyam basnet

Dear R Users,

I am a novice learner of R software. I am working with Genetic Matching 
- GenMatch(), but I am getting an Error message as follows: 

Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
BalanceMatrix.binarynp,  : 
  GenMatch(): input includes NAs

Could you please suggest me correcting the above problem?

My GenMatch command is,

 gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
 BalanceMatrix.binarynp, popsize = 1000)

Thanking you,

Sincerely Yours,

Shyam Kumar Basnet
SLU, Uppsala
Sweden
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find and replace string

2011-12-02 Thread Alaios

You are too good :)
Thanks a lot have a nice weekend

B.R
Alex




 From: jim holtman jholt...@gmail.com

Cc: R-help@r-project.org R-help@r-project.org 
Sent: Friday, December 2, 2011 1:51 PM
Subject: Re: [R] find and replace string

try this:

 x - c('sta_+1+0_field2ndtry_$01.cfg'
+      , 'sta_+1+0_field2ndtry_$01.cfg'
+      , 'sta_+1-0_field2ndtry_$01.cfg'
+      , 'sta_+1+0_field2ndtry_$01.cfg'
+      )
 # find matching fields
 values - grep([^+]*\\+[^+]*\\+0, x, value = TRUE)
 # split into two pieces
 splitValues - sub(([^+]*\\+[^+]*\\+)0(.*), \\1^\\2, values)
 for (i in splitValues){
+     for (j in 0:3){
+         print(sub(\\^, j, i))
+     }
+ }
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg
[1] sta_+1+0_field2ndtry_$01.cfg
[1] sta_+1+1_field2ndtry_$01.cfg
[1] sta_+1+2_field2ndtry_$01.cfg
[1] sta_+1+3_field2ndtry_$01.cfg


 Dear all,
 I would like to search in a string for the second occurrence of a symbol and 
 replace the symbol after it

 For example my strings look like

 sta_+1+0_field2ndtry_$01.cfg

 I want to find the digit that comes after the second +, in that case is zero
 and then over a loop create the strings below

 sta_+1+0_field2ndtry_$01.cfg

 sta_+1+1_field2ndtry_$01.cfg

 sta_+1+2_field2ndtry_$01.cfg

 sta_+1+3_field2ndtry_$01.cfg

 and so on..
 I have already tried strsplit but this will make things more complex...

 Could you please help me with that?

 B.R
 Alex

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about quantreg() package loading

2011-12-02 Thread R. Michael Weylandt michael.weyla...@gmail.com

It means you also need to install SparseM on which quantreg depends. This can 
be done in exactly the same way, either by direct download using 
install.packages() or local install. 

Michael

On Dec 2, 2011, at 6:30 AM, narendarreddy kalam narendarcse...@gmail.com 
wrote:

 Hi all, 
 my os is windows 7 and  R version is 2.14.and i used the qunatreg zip
 file(binary version for windows) to install.
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146390.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of 2 matrices

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:


Hi all,
I have matrix A of 67420 by 2 and another matrix B of 59199 by  
2. I would like to find the number of rows of matrix B that I can  
find in matrix A (rows that are common to both matrices with or  
without sorting).


I have tried the intersection and is.element functions in R but  
it only working for the vectors and not matrix

i.e,intersection(A,B) and is.element(A,B).


Have you considered the 'duplicated' function?

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-02 Thread Patrick Breheny


On 12/01/2011 08:00 PM, Ben quant wrote:

The data I am using is the last file called l_yx.RData at this link (the
second file contains the plots from earlier):
http://scientia.crescat.net/static/ben/


The logistic regression model you are fitting assumes a linear 
relationship between x and the log odds of y; that does not seem to be 
the case for your data.  To illustrate:


x - l_yx[,x]
y - l_yx[,y]
ind1 - x = .002
ind2 - (x  .002  x = .0065)
ind3 - (x  .0065  x = .13)
ind4 - (x  .0065  x = .13)

 summary(glm(y[ind1]~x[ind1],family=binomial))
...
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -2.791740.02633 -106.03   2e-16 ***
x[ind1] 354.98852   22.78190   15.58   2e-16 ***

 summary(glm(y[ind2]~x[ind2],family=binomial))
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -2.158050.02966 -72.766   2e-16 ***
x[ind2] -59.929346.51650  -9.197   2e-16 ***

 summary(glm(y[ind3]~x[ind3],family=binomial))
...
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -2.367206   0.007781 -304.22   2e-16 ***
x[ind3] 18.104314   0.346562   52.24   2e-16 ***

 summary(glm(y[ind4]~x[ind4],family=binomial))
...
Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -1.315110.08549 -15.383   2e-16 ***
x[ind4]  0.062610.08784   0.7130.476

To summarize, the relationship between x and the log odds of y appears 
to vary dramatically in both magnitude and direction depending on which 
interval of x's range we're looking at.  Trying to summarize this 
complicated pattern with a single line is leading to the fitted 
probabilities near 0 and 1 you are observing (note that only 0.1% of the 
data is in region 4 above, although region 4 accounts for 99.1% of the 
range of x).


--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Resampling with replacement on a binary (0, 1) dataset to get Cis

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 3:55 AM, lincoln wrote:


Thanks.
Anyway, it is not homework and I was not told to do that. My  
question has

not been answered yet, I'll try to reformulate it:
Does it make (statistical) sense to resample with replacement in this
situation to get an estimate of the CIs? In case it does, how could  
I do it

in R?



Some further details on my real case study:
10 independent samples from a population in ten sessions. Each sample
consists of a number (somehow variable) of random individuals that are
classified as 0 or 1 depending on one specific state (presence or  
absence of

a disease).
I can calculate, for each session, the percentage of individuals  
diseased

but I have nothing about the CIs, any suggestion?


I do not see much advantage to using resampling in this instance. The  
variance of a proportion is not theoretically complicated and you have  
introduced no further complicating factors that would call into  
question the validity of the estimates you would get from prop.test.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find and replace string

2011-12-02 Thread Sarah Goslee

You've been given a workable solution already, but here's a one-liner:

 x - c('sta_+1+0_field2ndtry_$01.cfg' , 
 'sta_+B+0_field2ndtry_$01.cfg' , 'sta_+1+0_field2ndtry_$01.cfg' , 
 'sta_+9+0_field2ndtry_$01.cfg')
 sapply(1:length(x), function(i)gsub(\\+(.*)\\+., paste(\\+\\1\\+, i, 
 sep=), x[i]))
[1] sta_+1+1_field2ndtry_$01.cfg sta_+B+2_field2ndtry_$01.cfg
[3] sta_+1+3_field2ndtry_$01.cfg sta_+9+4_field2ndtry_$01.cfg


Sarah, fan of regular expressions

On Fri, Dec 2, 2011 at 6:30 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to search in a string for the second occurrence of a symbol and 
 replace the symbol after it

 For example my strings look like

 sta_+1+0_field2ndtry_$01.cfg

 I want to find the digit that comes after the second +, in that case is zero
 and then over a loop create the strings below

 sta_+1+0_field2ndtry_$01.cfg

 sta_+1+1_field2ndtry_$01.cfg

 sta_+1+2_field2ndtry_$01.cfg

 sta_+1+3_field2ndtry_$01.cfg

 and so on..
 I have already tried strsplit but this will make things more complex...

 Could you please help me with that?

 B.R
 Alex


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: calculate mean of multiple rows in a data frame

2011-12-02 Thread Jean V Adams

It's easier for folks to help you if you put your example data in a format 
that can be readily read in R.  See, for example, the dput() function, 
which you can use to provide us with something like this:

DF - structure(list(NAME = c(Control_1, Control_2, Control_1, 
Control_3, MM0289~RFU:11810.15, MM0289~RFU:9238.41, 
MM16597~RFU:36765.38, 
MM16597~RFU:41258.94), ID = c(probe~B01R01C01, probe~B01R01C02, 
probe~B01R09C01, probe~B01R09C02, probe~B29R13C06, 
probe~B29R13C05, 
probe~B44R15C20, probe~B44R15C19), a = c(3L, 712L, 937L, 
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = 
c(NAME, 
ID, a, b, c, d), class = data.frame, row.names = c(1, 
2, 3, 4, 5, 6, 7, 8))

If I understand what you're after, you want to summarize data within 
groups, but your NAME variable is not as general as you would like.  You 
can get around this by creating a new variable which is a shorter and more 
general version of the NAME variable.  I did this by saving just the part 
of the NAME before the colon, :.

shortname - sapply(strsplit(DF$NAME, :), [, 1)
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) 

shortname   a b c d
1   Control_1 470 423.0 912.0 721.0
2   Control_2 712  13.0  32.0 179.0
3   Control_3 464 836.0 508.0  53.0
4  MM0289~RFU 352 573.5 734.5 779.5
5 MM16597~RFU 416 850.0 358.0 788.5

Jean


 Jabez Wilson wrote on 12/01/2011 03:15:39 PM:

 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686
 
 
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686
 Sorry, that should look like this:
 
 
 
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686 NAME ID a b c d 
 1 Control_1 probe~B01R01C01 3 22 926 774 
 2 Control_2 probe~B01R01C02 712 13 32 179 
 3 Control_1 probe~B01R09C01 937 824 898 668 
 4 Control_3 probe~B01R09C02 464 836 508 53 
 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995
 
 --- On Thu, 1/12/11, Jabez Wilson jabez...@yahoo.co.uk wrote:
 
 
 From: Jabez Wilson jabez...@yahoo.co.uk
 Subject: calculate mean of multiple rows in a data frame
 To: R-Help r-h...@stat.math.ethz.ch
 Date: Thursday, 1 December, 2011, 20:45
 
 Dear all, I have a data frame (DF) in the following format:
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686.
 I would like to consolidate the data frame by parsing through the 
 rows, and where the NAME is identical, consolidate into one row and 
 return the mean.
 I can do this for the first lines (Control_1 etc) by using aggregate()
 aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
 but since aggregate looks for unique lines it won't consolidate e.g.
 lines 5/6 and 7/8.
 Is there a way of telling aggregate to grep just the first part of 
 the name (i.e. up to ~) and consolidate those?
 I could pre-grep the file before importing into R, but I'd like to 
 do it within R if possible.
 Thanks for any suggestions

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

[R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread Tal Galili

Hello dear all,

I am unable to understand why when I run the following three lines:

set.seed(4254)
 a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T)))
 summary(lm(y ~ x, a))


The output I get includes factor levels which are not relevant to what I am
actually using:

Call:
 lm(formula = y ~ x, data = a)
 Residuals:
 Min  1Q  Median  3Q Max
 -1.4096 -0.6400 -0.1244  0.5886  2.1891
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept) -0.032760.15169  -0.2160.830
 x.L -0.289680.33866  -0.8550.398
 x.Q -0.388130.33851  -1.1470.259
 x.C -0.271830.34027  -0.7990.430
 x^4  0.259930.33935   0.7660.449
 Residual standard error: 0.9564 on 35 degrees of freedom
 Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878
 F-statistic: 0.8202 on 4 and 35 DF,  p-value: 0.5211


I am guessing that this is having something to do with the contrast matrix
that is used, but this is not clear to me.
Can anyone suggest a good read, or an explanation?

Thanks.


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] what is used as height in hclust for ward linkage?

2011-12-02 Thread james.foadi

Dear R community,
I am trying to understand how the ward linkage works from a quantitative point 
of view.
To test it I have devised a simple 3-members set:

   G = c(0,2,10)

The distances between all couples are:

d(0,2)  =  2
d(0,10) = 10
d(2,10) =  8

The smallest distance corresponds to merging 0 and 2. The corresponding ESS are:

ESS(0,2) = 2*var(c(0,2)) = 4
ESS(0,10) = 2*var(c(0,10)) = 100
ESS(2,10) = 2*var(c(2,10)) = 64

and, indeed, the smallest ESS corresponds to merging 0 and 2. The next element 
that should be added
to 0 and 2 is obviously 10. This is where I don't understand how the hclust 
algorithm in R works. We have

 G - c(0,2,10)
 G.dist - dist(G)
 G.hc - hclust(G.dist,method=ward)
 G.hc$merge
 [,1] [,2]
[1,]   -1   -2
[2,]   -31
 G.hc$height
[1]  2.0 11.3

Now, according to standard definitions, the distance between two clusters with 
elements Nr and Ns is:

  d(Rs,Rr) = sqrt(2*Nr*Ns/(Nr+Ns))*||Rs - Rr||

where   in the last expression indicates averages (centroids). If I carry out 
this operation to merge cluster
c(0,2) with 10, I get:

  d(c(0,2),10) = sqrt(2*2*1/(2+1))*|1-9| = 9.237604

This is different from 11. in the R output.

Does anyone know what's the exact value for the ward linkage, as displayed in 
the hclust height output?

Thanks in advance for any help!

J


-- 
This e-mail and any attachments may contain confidential...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread Bert Gunter

?ordered
?C
?contr.poly

If you don't know what polynomial contrasts are, consult any good
linear models text. MASS has a good, though a bit terse, section on
this.

-- Bert

On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote:
 Hello dear all,

 I am unable to understand why when I run the following three lines:

 set.seed(4254)
 a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T)))
 summary(lm(y ~ x, a))


 The output I get includes factor levels which are not relevant to what I am
 actually using:

 Call:
 lm(formula = y ~ x, data = a)
 Residuals:
     Min      1Q  Median      3Q     Max
 -1.4096 -0.6400 -0.1244  0.5886  2.1891
 Coefficients:
             Estimate Std. Error t value Pr(|t|)
 (Intercept) -0.03276    0.15169  -0.216    0.830
 x.L         -0.28968    0.33866  -0.855    0.398
 x.Q         -0.38813    0.33851  -1.147    0.259
 x.C         -0.27183    0.34027  -0.799    0.430
 x^4          0.25993    0.33935   0.766    0.449
 Residual standard error: 0.9564 on 35 degrees of freedom
 Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878
 F-statistic: 0.8202 on 4 and 35 DF,  p-value: 0.5211


 I am guessing that this is having something to do with the contrast matrix
 that is used, but this is not clear to me.
 Can anyone suggest a good read, or an explanation?

 Thanks.


 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread Bert Gunter

Maybe should have explicitly said:

 C(ordered(1:5))
[1] 1 2 3 4 5
attr(,contrasts)
   ordered
contr.poly
Levels: 1  2  3  4  5

-- Bert

On Fri, Dec 2, 2011 at 7:06 AM, Bert Gunter bgun...@gene.com wrote:
 ?ordered
 ?C
 ?contr.poly

 If you don't know what polynomial contrasts are, consult any good
 linear models text. MASS has a good, though a bit terse, section on
 this.

 -- Bert

 On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote:
 Hello dear all,

 I am unable to understand why when I run the following three lines:

 set.seed(4254)
 a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T)))
 summary(lm(y ~ x, a))


 The output I get includes factor levels which are not relevant to what I am
 actually using:

 Call:
 lm(formula = y ~ x, data = a)
 Residuals:
     Min      1Q  Median      3Q     Max
 -1.4096 -0.6400 -0.1244  0.5886  2.1891
 Coefficients:
             Estimate Std. Error t value Pr(|t|)
 (Intercept) -0.03276    0.15169  -0.216    0.830
 x.L         -0.28968    0.33866  -0.855    0.398
 x.Q         -0.38813    0.33851  -1.147    0.259
 x.C         -0.27183    0.34027  -0.799    0.430
 x^4          0.25993    0.33935   0.766    0.449
 Residual standard error: 0.9564 on 35 degrees of freedom
 Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878
 F-statistic: 0.8202 on 4 and 35 DF,  p-value: 0.5211


 I am guessing that this is having something to do with the contrast matrix
 that is used, but this is not clear to me.
 Can anyone suggest a good read, or an explanation?

 Thanks.


 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Save Venn-diagram (Vennerable) together with table and plot in single pdf page

2011-12-02 Thread Sonja Braaker


Dear R-users

I want to save a list with characters a point plot and a Venn diagram in 
a single pdf page.


I am successful to do this when I use a character list and two point plots.
However when I try to replace the first point plots with my Venn diagram 
(built with Vennerable package, compute.Venn() and plot.Venn()) the Venn 
plot will not position at the right place in the pdf.


I guess there are some parameters in the plot.Venn function, that I need 
to change but I did not find out which ones.


Here an example of the pdf with two point plots as I want it:

library(gplots)
library(Vennerable)

Varnames-c(A,B,C)
x - Venn(SetNames = Varnames,Weight = 
c(`100`=2,`010`=6,`001`=10,`110`=1, `011`=0.2, `101`=0.5,`111`=1))

cx-compute.Venn(x,doWeights = TRUE)
tabletext-paste(Variable: ,letters[1:8],sep=)

pdf(path/plot_test.pdf, fillOddEven=TRUE,paper=a4, 
onefile=TRUE,width=7,height=10)

layout(matrix(c(1,2,2,1,2,2,3,3,3), 3, 3, byrow = TRUE),heights=c(1,1,2))
par(mar=c(6,2,2,4))
textplot(tabletext,valign=top,halign=left,cex=2)
plot(rnorm(100),main=Random 1)
plot(rnorm(100),col=red,main=Random2)
dev.off()


And here the example of the pdf with where I try to replace the 
Random1 point plot with a Venn diagram (wrong size and position of 
Venn diagram):


pdf(path/venn_test.pdf, fillOddEven=TRUE,paper=a4, 
onefile=TRUE,width=7,height=10)

layout(matrix(c(1,2,2,1,2,2,3,3,3), 3, 3, byrow = TRUE),heights=c(1,1,2))
par(mar=c(6,2,2,4))
textplot(tabletext,valign=top,halign=left,cex=2)
plot(cx)
plot(rnorm(100),col=red,main=Random2)
dev.off()

Would be thankful for any hints

Sonja

--
Sonja Braaker
Swiss Federal Research Institute WSL
Community Ecology
Zürcherstrasse 111
CH-8903 Birmensdorf

Tel. +41 44 7392 230
Fax  +41 44 7392 215
sonja.braa...@wsl.ch
http://www.wsl.ch/info/mitarbeitende/braaker/index_EN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R, PostgresSQL and poor performance

2011-12-02 Thread Berry, David I.

On 01/12/2011 17:01, Gabor Grothendieck ggrothendi...@gmail.com wrote:

On Thu, Dec 1, 2011 at 10:02 AM, Berry, David I. d...@noc.ac.uk wrote:
 Hi List

 Apologies if this isn't the correct place for this query (I've tried a
search of the mail archives but not had much joy).

 I'm running R (2.14.0)  on a Mac (OSX v 10.5.8, 2.66GHz, 4GB memory)
and am having a few performance issues with reading data in from a
Postres database (using RPostgreSQL). My query / code are as below

 # -
 library('RPostgreSQL')

 drv - dbDriver(PostgreSQL)

 dbh - dbConnect(drv,user=Š,password=Š,dbname=Š,host=Š)

 sql - select id, date, lon, lat, date_trunc('day' , date) as jday,
extract('hour' from date) as hour, extract('year' from date) as year
from observations where pt = 6 and date = '1990-01-01' and date 
'1995-01-01' and lon  180 and lon  290 and lat  -30 and lat  30 and
sst is not null

 dataIn - dbGetQuery(dbh,sql)

If this is a large table of which the desired rows are a small
fraction of all rows then be sure there indexes on the variables in
your where clause.

You can also try it with the RpgSQL driver although there is no reason
to think that that would be faster.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Thanks for the reply and suggestions. I've tried the RpgSQL drivers and
the results are pretty similar in terms of performance.

The ~1.5M records I'm trying to read into R are being extracted from a
table with ~300M rows (and ~60 columns) that has been indexed on the
relevant columns and horizontally partitioned (with constraint checking
on). I do need to try and optimize the database a bit more but I don¹t
think this is the cause of the performance issues.

As an example, when I run the query purely in R it takes 273s to run
(using system.time() to time it). When I extract the data via psql and
system() and then import it into R using read.table() it takes 32s. The
code I've used for both are below. The second way of doing it (psql and
read.table()) is less than ideal but does seem to have a big performance
advantage at the moment  the only difference in the results is that the
date variables are stored as strings in the second example.

# Query purely in R
# 
dbh - dbConnect(drv,user=Š,password=Š, dbname=Š,host=Š)

sql - select id, date, lon, lat, date_trunc('day' , date) as jday,
extract('hour' from date) as hour, extract('year' from date) as year from
observations where pt = 6 and date = '1990-01-01' and date  '1995-01-01'
and lon  180 and lon  290 and lat  -30 and lat  30 and sst is not
null;

dataIn - dbGetQuery(dbh,sql)



# Query via command line
# --
system('psql h myhost d mydb U myuid f getData.sql')

system('cat tmp.csv | sed 's/^,//g;s/^[0-9a-zA-Z]\+//g'  tmp2.csv')
# This just ensures the first column is quoted

dataIn - read.table('tmp2.csv',sep=',' ,col.names=c(
id,date,lon,lat,jday,hour,year) )


# Contents of getData.sql
# -
\o ./tmp.csv
\pset format unaligned
\pset fieldsep ','
\pset tuples_only
select 
id, date, lon, lat, date_trunc('day' , date) as jday, extract('hour' 
from
date) as hour, extract('year' from date) as year
from 
observations 
where 
pt = 6 and date = '1990-01-01' and date  '1995-01-01' and lon  180 
and
lon  290 and lat  -30 and lat  30 and sst is not null;
\q


--
David Berry
National Oceanography Centre, UK



-- 
This message (and any attachments) is for the recipient only. NERC
is subject to the Freedom of Information Act 2000 and the contents
of this email and any reply you make may be disclosed by NERC unless
it is exempt from release under the Act. Any material supplied to
NERC may be stored in an electronic records management system.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in Genetic Matching

2011-12-02 Thread shyam basnet

Dear R Users,

I am a novice learner of R software. I am working with Genetic Matching 
- GenMatch(), but I am getting an Error message as follows: 

Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
BalanceMatrix.binarynp,  : 
  GenMatch(): input includes NAs

Could you please suggest me correcting the above problem?

My GenMatch command is,

 gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
 BalanceMatrix.binarynp, popsize = 1000)

Thanking you,

Sincerely Yours,

Shyam Kumar Basnet
SLU, Uppsala
Sweden 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to test for Poisson?

2011-12-02 Thread RToss

Hi!

I am sitting with a school assignment, but I got stuck on this one.
I am suppose to test if my data is Poisson-distributed.
The data I´m using is the studie Bids, found in the Ecdat-package, and the
variable of interest is the dependent numbids.
How do I practically perform a test for this?

Kind regards/ Richard

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-test-for-Poisson-tp4147356p4147356.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] breaking up n object into random groups

2011-12-02 Thread statfan

say n = 100
I want to partition this into 4 random groups wheren n1 + n2 + n3 + n4 = n
and ni is the number of elements in group i.

Thank you for you help

--
View this message in context: 
http://r.789695.n4.nabble.com/breaking-up-n-object-into-random-groups-tp4147476p4147476.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 9:51 AM, Tal Galili wrote:


Hello dear all,

I am unable to understand why when I run the following three lines:

set.seed(4254)

a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T)))
summary(lm(y ~ x, a))



The output I get includes factor levels which are not relevant to  
what I am

actually using:

Call:

lm(formula = y ~ x, data = a)
Residuals:
   Min  1Q  Median  3Q Max
-1.4096 -0.6400 -0.1244  0.5886  2.1891
Coefficients:
   Estimate Std. Error t value Pr(|t|)
(Intercept) -0.032760.15169  -0.2160.830
x.L -0.289680.33866  -0.8550.398
x.Q -0.388130.33851  -1.1470.259
x.C -0.271830.34027  -0.7990.430
x^4  0.259930.33935   0.7660.449


Those are polynomial contrasts: linear, quadratic, cubic and quartic.  
If you don't want contrasts based on ordered factors then just use  
regular factors. You should probably be looking at:


?C

(...yet another function whose name should be avoided in naming data- 
objects.)


--
David.



Residual standard error: 0.9564 on 35 degrees of freedom
Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878
F-statistic: 0.8202 on 4 and 35 DF,  p-value: 0.5211



I am guessing that this is having something to do with the contrast  
matrix

that is used, but this is not clear to me.
Can anyone suggest a good read, or an explanation?

Thanks.


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il  
(Hebrew) |

www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test for Poisson?

2011-12-02 Thread B77S

A simple way to determine if it is NOT is to see if the mean (the single
parameter of a poisson: lambda) and variance are the same. 

This really has nothing to do with R (other than the data source), and since
it is homework, you will likely get no further help here.
Good luck.


RToss wrote
 
 Hi!
 
 I am sitting with a school assignment, but I got stuck on this one.
 I am suppose to test if my data is Poisson-distributed.
 The data I´m using is the studie Bids, found in the Ecdat-package, and
 the variable of interest is the dependent numbids.
 How do I practically perform a test for this?
 
 Kind regards/ Richard
 


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-test-for-Poisson-tp4147356p4147519.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breaking up n object into random groups

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 10:09 AM, statfan wrote:


say n = 100
I want to partition this into 4 random groups wheren n1 + n2 + n3 +  
n4 = n

and ni is the number of elements in group i.



Try assigning with a sample() from:

unlist(mapply(rep, c(1:4), each=c(n1,n2,n3,n4)))

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] breaking up n object into random groups

2011-12-02 Thread Bert Gunter

There are a million ways to do this, probably.

brks - c(1,sort(sample(seq_len(99),3)),100)  ##  4 random groups

and then use brks as the breaks parameter in cut() with include.lowest = TRUE

?cut

-- Bert

On Fri, Dec 2, 2011 at 7:09 AM, statfan irene_vr...@hotmail.com wrote:
 say n = 100
 I want to partition this into 4 random groups wheren n1 + n2 + n3 + n4 = n
 and ni is the number of elements in group i.

 Thank you for you help

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/breaking-up-n-object-into-random-groups-tp4147476p4147476.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CART with rpart

2011-12-02 Thread kende jan

dear all, 


i want to keep in my data file the results of  terminal nodes (groups) after 
CART analysis for performing other statisticals analysis by this groups.

can you help me please?

thanks.

jan.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread Tal Galili

Thank you both Bert and David, for the quick reply.
I will look further into this.

With regards,
Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Dec 2, 2011 at 5:08 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Maybe should have explicitly said:

  C(ordered(1:5))
 [1] 1 2 3 4 5
 attr(,contrasts)
   ordered
 contr.poly
 Levels: 1  2  3  4  5

 -- Bert

 On Fri, Dec 2, 2011 at 7:06 AM, Bert Gunter bgun...@gene.com wrote:
  ?ordered
  ?C
  ?contr.poly
 
  If you don't know what polynomial contrasts are, consult any good
  linear models text. MASS has a good, though a bit terse, section on
  this.
 
  -- Bert
 
  On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote:
  Hello dear all,
 
  I am unable to understand why when I run the following three lines:
 
  set.seed(4254)
  a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T)))
  summary(lm(y ~ x, a))
 
 
  The output I get includes factor levels which are not relevant to what
 I am
  actually using:
 
  Call:
  lm(formula = y ~ x, data = a)
  Residuals:
  Min  1Q  Median  3Q Max
  -1.4096 -0.6400 -0.1244  0.5886  2.1891
  Coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept) -0.032760.15169  -0.2160.830
  x.L -0.289680.33866  -0.8550.398
  x.Q -0.388130.33851  -1.1470.259
  x.C -0.271830.34027  -0.7990.430
  x^4  0.259930.33935   0.7660.449
  Residual standard error: 0.9564 on 35 degrees of freedom
  Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878
  F-statistic: 0.8202 on 4 and 35 DF,  p-value: 0.5211
 
 
  I am guessing that this is having something to do with the contrast
 matrix
  that is used, but this is not clear to me.
  Can anyone suggest a good read, or an explanation?
 
  Thanks.
 
 
  Contact
  Details:---
  Contact me: tal.gal...@gmail.com |  972-52-7275845
  Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
 |
  www.r-statistics.com (English)
 
 --
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
 
  Bert Gunter
  Genentech Nonclinical Biostatistics
 
  Internal Contact Info:
  Phone: 467-7374
  Website:
 
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CART with rpart

2011-12-02 Thread Tal Galili

Hi Jan,
You are likely to simply use
?predict
(e.g: predict.rpart)

Are you using a classification or a regression tree?


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Dec 2, 2011 at 6:15 PM, kende jan kende...@yahoo.fr wrote:

 dear all,


 i want to keep in my data file the results of  terminal nodes (groups)
 after CART analysis for performing other statisticals analysis by this
 groups.

 can you help me please?

 thanks.

 jan.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Willkommen bei der R-help Mailingliste

2011-12-02 Thread karajamu


Hello everbody,

I am new to this mailing list and hope to find some help.
I'm trying to get into the spatstat package and encountered two problems. First 
a graphical one:
There is an example dataset called finpines which has several marks 
(http://www.oga-lab.net/RGM2/func.php?rd_id=spatstat:finpines)
When I pass the given code from the website to R 



data(finpines)

plot(unmark(finpines), main=Finnish pines: locations)
plot(finpines, which.marks=height, main=heights)

plot(finpines, which.marks=diameter, main=diameters)
I get the warning

Warnmeldung:
In symbols(c(-1.993875, -1.019901, -4.914071, -4.469962, -4.303847,  :
  which.marks ist kein Grafikparameter

Something like which.marks is not a graphic parameter; and the plots for 
height and diameter show now differences.

Furthermore, I  want to create a ppp with several marks, but I did not figure 
out how this works.
Trying

X - as.ppp(mydata, owin(c(174, 178), c(29, 33)))

just gives the error

Error in as.ppp(mydata, owin(c(174, 178), c(29, 33))) : 
  X must be a two-column or three-column data frame

The data set looks something like

Date   X   YMar1Mar2Mar3

1.1.4   3 50   6  A
2.1.2   1 40   9  A
3.1.5   8 35   12B

But how can I integrate two or more marks in a three-column data frame, when 
two columns are already needed for the X and Y coordinates?

I hope you can help me with this.

Cheers
sina




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R + memory of objects

2011-12-02 Thread Marc Jekel


Dear R community,

I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize 
working memory load. Simon (thanks!) and others gave me a hint to use the command gc() 
to clean up memory which works quite nice but appears to me to be more like a fix to a 
problem.

To give you an impression of what I am talking, here is a short code example + 
I will give rough measure (system track app) of my working memory needed for 
each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, 
approx 4 GB Ram):

##

# example 1:

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

# used working memory increases from 1044 --  1808 MB

# (same command again, i.e.)

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

# 1808 MB --  2178 MB Why does memory increase?

# (give the matrix column names)

colnames(y) = c(col1, col2)

# 2178 MB --  1781 MB Why does the size of an object decrease if I assign 
column labels?

###

# example 2:

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

1016 --  1780 MB

y = data.frame(y)

# increase from 1780 MB --  3315 MB

##

Why does it take so much extra memory to store this matrix as a data.frame?

It is not the object per se (i.e. that data.frames need more memory) because if 
I use gc() memory size drops to 1387 MB. Does this mean that it may be more 
memory-efficient not to use any data.frames but matrices only? etc.

This puzzles me a lot. From my experience these effects are also accentuated 
for larger objects.

As an anecdotal comparison: I also used Stata in my last project due to these 
memory problems and I could do a lot of variable manipulations of the same (!) 
data with significant (I am talking about GB) less memory needed.

Best,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Graphics - Axis Labels overlap window edges

2011-12-02 Thread robgriffin247

Hi,

I am trying to put larger axis labels on my graphs (using cex.axis and
cex.label) but when I do this the top of the text on the Y axis goes outside
of the window which you can see in this picture
-http://twitter.com/#!/robgriffin247/status/142642881436450816/photo/1 - (if
you click on the picture it opens a larger version so it is easier to see
the problem) is there anyway I can get R to not cut the top off the letters?

Thanks,
Rob

--
View this message in context: 
http://r.789695.n4.nabble.com/Graphics-Axis-Labels-overlap-window-edges-tp4147897p4147897.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem subsetting: undefined columns

2011-12-02 Thread Aurélien PHILIPPOT

Hi Paul and Jim,
Thanks for your messages.

I just wanted R to give me the columns of my data frame d, whose names
appear in v. I do not care about the names of v that are not in d. In
addition, every time, there will be at least one element of v that has a
corresponding column in d, for sure, so I know there is at least one match
between the 2.

Initially, I  tried something in the spirit:
sub- subset(d, colnames(d) %in% v)

but I could not make it work properly.


Best,
Aurelien

2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl

 On 12/02/2011 07:20 AM, Aurélien PHILIPPOT wrote:
  Dear R-users,
  -I am new to R, and I am struggling with the following problem.
 
  -I am repeating the following  operations hundreds of times, within a
 loop:
  I want to subset a data frame by columns. I am interested in the columns
  names that are given by the rows of another data frame that was built in
  parallel. The solution I have so far works well as long as the elements
 of
  the second data frame are included in the column names of the first data
  frame but if an element from the second object is not a column name of
 the
  first one, then it bugs.

 Hi Aurelien,

 I would call this a feature, not a bug. I think R does what it should
 do, you request a non-existent column and it throws an error. What kind
 of behavior are you looking for instead of this error?

 regards,
 Paul

 
  -More concretely, I have the following data frames d and v:
  mmdd-c(19720601, 19720602, 19720605)
  sret.10006-c(1,2,3)
  sret.10014-c(5,9,7)
  sret.10065-c(10,2,11)
 
 
  d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
  sret.10014=sret.10014, sret.10065=sret.10065)
 
  v- data.frame(V1=sret.10006, V2=sret.10090)
  v- sapply(v, function(x) levels(x)[x])
 
  -I want to do the following subsetting:
  sub- subset(d, select=c(v))
 
 
  and I get the following error message:
  Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
 
 
 
  Any help would be very much appreciated,
 
  Best,
  Aurelien
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


 --
 Paul Hiemstra, Ph.D.
 Global Climate Division
 Royal Netherlands Meteorological Institute (KNMI)
 Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
 P.O. Box 201 | 3730 AE | De Bilt
 tel: +31 30 2206 494

 http://intamap.geo.uu.nl/~paul
 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R + memory of objects

2011-12-02 Thread Uwe Ligges


I guess the numbers your report are what your OS shows you?

R runs garbage collection (which can be manually triggred by gc()) after 
certain fuzzy rules. So what you report below is not always the current 
required memory but what was allocated and not yet garbnage collected.


See ?object.size to get the memory consumption of objects.

Uwe Ligges





On 02.12.2011 16:17, Marc Jekel wrote:

Dear R community,

I am still struggling a bit on how R does memory allocation and how to
optimize my code to minimize working memory load. Simon (thanks!) and
others gave me a hint to use the command gc() to clean up memory which
works quite nice but appears to me to be more like a fix to a problem.

To give you an impression of what I am talking, here is a short code
example + I will give rough measure (system track app) of my working
memory needed for each computational step (R64bit latest version on WIN
7 64 bit system, 2 Cores, approx 4 GB Ram):

##

# example 1:

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

# used working memory increases from 1044 -- 1808 MB

# (same command again, i.e.)

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

# 1808 MB -- 2178 MB Why does memory increase?

# (give the matrix column names)

colnames(y) = c(col1, col2)

# 2178 MB -- 1781 MB Why does the size of an object decrease if I
assign column labels?

###

# example 2:

y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2)

1016 -- 1780 MB

y = data.frame(y)

# increase from 1780 MB -- 3315 MB

##

Why does it take so much extra memory to store this matrix as a data.frame?

It is not the object per se (i.e. that data.frames need more memory)
because if I use gc() memory size drops to 1387 MB. Does this mean that
it may be more memory-efficient not to use any data.frames but matrices
only? etc.

This puzzles me a lot. From my experience these effects are also
accentuated for larger objects.

As an anecdotal comparison: I also used Stata in my last project due to
these memory problems and I could do a lot of variable manipulations of
the same (!) data with significant (I am talking about GB) less memory
needed.

Best,

Marc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Graphics - Axis Labels overlap window edges

2011-12-02 Thread Uwe Ligges




On 02.12.2011 17:41, robgriffin247 wrote:

Hi,

I am trying to put larger axis labels on my graphs (using cex.axis and
cex.label) but when I do this the top of the text on the Y axis goes outside
of the window which you can see in this picture
-http://twitter.com/#!/robgriffin247/status/142642881436450816/photo/1 - (if
you click on the picture it opens a larger version so it is easier to see
the problem) is there anyway I can get R to not cut the top off the letters?



Increase the margins. See ?par and its mar argument.

Uwe Ligges



Thanks,
Rob

--
View this message in context: 
http://r.789695.n4.nabble.com/Graphics-Axis-Labels-overlap-window-edges-tp4147897p4147897.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Forests in R

2011-12-02 Thread Axel Urbiz

Thanks for this!

Axel.

On Thu, Dec 1, 2011 at 11:29 AM, Liaw, Andy andy_l...@merck.com wrote:

 The first version of the package was created by re-writing the main
 program in the original Fortran as C, and calls other Fortran subroutines
 that were mostly untouched, so dynamic memory allocation can be done.
  Later versions have most of the Fortran code translated/re-written in C.
  Currently the only Fortran part is the node splitting in classification
 trees.

 Andy

  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Langfelder
  Sent: Thursday, December 01, 2011 12:33 AM
  To: Axel Urbiz
  Cc: R-help@r-project.org
  Subject: Re: [R] Random Forests in R
 
  On Wed, Nov 30, 2011 at 7:48 PM, Axel Urbiz
  axel.ur...@gmail.com wrote:
   I understand the original implementation of Random Forest
  was done in
   Fortran code. In the source files of the R implementation
  there is a note
   C wrapper for random forests:  get input from R and drive
 the Fortran
   routines.. I'm far from an expert on this...does that mean that the
   implementation in R is through calls to C functions only
  (not Fortran)?
  
   So, would knowing C be enough to understand this code, or
  Fortran is also
   necessary?
 
  I haven't seen the C and Fortran code for Random Forest but I
  understand the note to say that R code calls some C functions that
  pre-process (possibly re-format etc) the data, then call the actual
  Random Forest method that's written in Fortran, then possibly
  post-process the output and return it to R. It would imply that to
  understand the actual Random Forest code, you will have to read the
  Fortran source code.
 
  Best,
 
  Peter
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 Notice:  This e-mail message, together with any attach...{{dropped:16}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexplained behavior of level names when using ordered factors in lm?

2011-12-02 Thread Steve Lianoglou

Hi Bert,

Since you opened the door ...

On Fri, Dec 2, 2011 at 10:06 AM, Bert Gunter gunter.ber...@gene.com wrote:
 ?ordered
 ?C
 ?contr.poly

 If you don't know what polynomial contrasts are, consult any good
 linear models text. MASS has a good, though a bit terse, section on
 this.

Do you have a favorite liner model text with a bit more exposition than MASS?

Even though this list isn't for teaching stats, whenever I can catch
some of the tried and true statisticians talking about texts on
specific subject matter, I like to take advantage of it to see what I
need to add to my amazon wish list to help sharpen the old saw :-)

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem subsetting: undefined columns

2011-12-02 Thread R. Michael Weylandt michael.weyla...@gmail.com

How about this?

d[, v[v %in% colnames(d)]]

Michael

On Dec 2, 2011, at 12:01 PM, Aurélien PHILIPPOT aurelien.philip...@gmail.com 
wrote:

 Hi Paul and Jim,
 Thanks for your messages.
 
 I just wanted R to give me the columns of my data frame d, whose names
 appear in v. I do not care about the names of v that are not in d. In
 addition, every time, there will be at least one element of v that has a
 corresponding column in d, for sure, so I know there is at least one match
 between the 2.
 
 Initially, I  tried something in the spirit:
 sub- subset(d, colnames(d) %in% v)
 
 but I could not make it work properly.
 
 
 Best,
 Aurelien
 
 2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl
 
 On 12/02/2011 07:20 AM, Aur�lien PHILIPPOT wrote:
 Dear R-users,
 -I am new to R, and I am struggling with the following problem.
 
 -I am repeating the following  operations hundreds of times, within a
 loop:
 I want to subset a data frame by columns. I am interested in the columns
 names that are given by the rows of another data frame that was built in
 parallel. The solution I have so far works well as long as the elements
 of
 the second data frame are included in the column names of the first data
 frame but if an element from the second object is not a column name of
 the
 first one, then it bugs.
 
 Hi Aurelien,
 
 I would call this a feature, not a bug. I think R does what it should
 do, you request a non-existent column and it throws an error. What kind
 of behavior are you looking for instead of this error?
 
 regards,
 Paul
 
 
 -More concretely, I have the following data frames d and v:
 mmdd-c(19720601, 19720602, 19720605)
 sret.10006-c(1,2,3)
 sret.10014-c(5,9,7)
 sret.10065-c(10,2,11)
 
 
 d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
 sret.10014=sret.10014, sret.10065=sret.10065)
 
 v- data.frame(V1=sret.10006, V2=sret.10090)
 v- sapply(v, function(x) levels(x)[x])
 
 -I want to do the following subsetting:
 sub- subset(d, select=c(v))
 
 
 and I get the following error message:
 Error in `[.data.frame`(x, r, vars, drop = drop) :
  undefined columns selected
 
 
 
 Any help would be very much appreciated,
 
 Best,
 Aurelien
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 Paul Hiemstra, Ph.D.
 Global Climate Division
 Royal Netherlands Meteorological Institute (KNMI)
 Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
 P.O. Box 201 | 3730 AE | De Bilt
 tel: +31 30 2206 494
 
 http://intamap.geo.uu.nl/~paul
 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Partitioning Around Mediods then rpart to follow - is this sensible

2011-12-02 Thread Stephen Sefick


The problem:  There are no a priori groupings to run a classification on

My solution:

This is a non-R code question, so I appreciate any thoughts.  I have 
used pam in the cluster package proceeded by sillohouette to find the 
optimum number of clusters on scaled and centered data.  I have followed 
this by a classification tree analysis with rpart to discern which 
variables drive the clustering on the original data.  Is this a sensible 
approach?

many thanks,

Stephen Sefick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with loop

2011-12-02 Thread Komine

Hi, 
I try to build a loop difficultly. 
I have in a folder called Matrices several files (.csv) called Mat2002273,
Mat2002274  to Mat2002361. 
I want to calculate for each file the mean of the column called Pixelvalues.
I try this code but as result, I have this message:  Mat2002273 not found  

essai-read.table(C:\\Users\\Desktop\\Matrices\\Mat2002273.csv,sep=;,dec=,,header=TRUE)
essai
a - NULL 
for(i in Mat2002273:Mat2002361){
paste(mean(essai$Pixelvalues))
a[i] - paste(mean(essai$Pixelvalues))
print(a[i])
}

Thank you for your help 




--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-loop-tp4148083p4148083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of 2 matrices

2011-12-02 Thread Michael Kao


On 2/12/2011 2:48 p.m., David Winsemius wrote:


On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:


Hi all,
I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. 
I would like to find the number of rows of matrix B that I can find 
in matrix A (rows that are common to both matrices with or without 
sorting).


I have tried the intersection and is.element functions in R but 
it only working for the vectors and not matrix

i.e,intersection(A,B) and is.element(A,B).


Have you considered the 'duplicated' function?



Here is an example based on the duplicated function

test.mat1 - matrix(1:20, nc = 5)

test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat - function(mat1, mat2){
nr1 - nrow(mat1)
nr2 - nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with loop

2011-12-02 Thread R. Michael Weylandt

You never create a variable called Mat2002273 or Mat2002361 so you
can't ask R to loop over all the values between them.

If I were you, I'd code something like this:

lf - list.files()

# PUT IN SOME CODE TO REMOVE FILES YOU DON'T WANT TO USE

pv - vector(numeric, length(lf))

for(i in lf) pv[i] - mean( read.csv(lf, header = TRUE)[,Pixelvalues])

print(pv)

Michael

On Fri, Dec 2, 2011 at 12:15 PM, Komine moma...@yahoo.fr wrote:
 Hi,
 I try to build a loop difficultly.
 I have in a folder called Matrices several files (.csv) called Mat2002273,
 Mat2002274  to Mat2002361.
 I want to calculate for each file the mean of the column called Pixelvalues.
 I try this code but as result, I have this message:  Mat2002273 not found

essai-read.table(C:\\Users\\Desktop\\Matrices\\Mat2002273.csv,sep=;,dec=,,header=TRUE)
essai
a - NULL
for(i in Mat2002273:Mat2002361){
paste(mean(essai$Pixelvalues))
a[i] - paste(mean(essai$Pixelvalues))
print(a[i])
}

 Thank you for your help




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Problem-with-loop-tp4148083p4148083.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] References for book R In Action by Kabacoff

2011-12-02 Thread Pablo Domínguez Vaselli

The references are here: http://manning.com/kabacoff/excerpt_references.pdf

(they will be included on the next printing too, got omitted by mistake)

Regards,

Pablo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help in dbWriteTable

2011-12-02 Thread David M. Schruth


Hi,

The following code should work:

fields - dbListFields(con, db.table.name)
reordered.names - names(df)[match(fields, names(df))]
df - df[ ,reordered.names]

But, you might want to try using the function 'dbWriteTable2' in the 
'caroline' package. (In fact the three lines above have been copied 
verbatim out of said function). It works much like the original 
dbWriteTable but also addresses the column reordering frustration you 
mention and more: na's in NOT NULL columns, length mismatches, adding NA 
columns for missing fields, type checking as well as primary key support 
for PostgreSQL.


I use it mainly with Postgres so I can't say for sure if it'll work for 
you.  But let me know if it doesn't!


-Dave Schruth

On 12/1/2011 8:53 PM, arunkumar wrote:

hi

  I need some help in dbWriteTable.
I'm not able to insert the rows in the table if the column order are not
same in the database and in the dataframe which i'm inserting. Also facing
issue if the table is already created externally and inserting it thru
dbWrite.

is there some way that we can sepecify the rownames in the dbwrite..or any
method which will solve my problem



--
View this message in context: 
http://r.789695.n4.nabble.com/help-in-dbWriteTable-tp4145110p4145110.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compiling R using Solaris Studio

2011-12-02 Thread RogerP

I am trying to compile R using Solaris Studio, but it keeps trying to use the
GNU compiler!  I've tried editing all the Makeconf files I can find, but
configure keeps changing them back!  I tried to rename the GNU directory so
it could not find gcc, but then I got a missing lib error.

How does one change the compiler used to compile R?

Thanks!

Roger

--
View this message in context: 
http://r.789695.n4.nabble.com/Compiling-R-using-Solaris-Studio-tp4148407p4148407.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in Genetic Matching

2011-12-02 Thread R. Michael Weylandt

The error message is pretty explicit: your problem is taht one of your
inputs has NA (missing value) in it and the GenMatch() function is not
prepared to handle the.  You can find which one by running:

any(is.na(Tr))
any(is.na(X.binarynp)
any(is.na(BalanceMatrix.binarynp))

and then use View() on the object with NAs to take a look and see
where they are coming from.

Michael

On Fri, Dec 2, 2011 at 9:16 AM, shyam basnet shyamabc2...@yahoo.com wrote:
 Dear R Users,

 I am a novice learner of R software. I am working with Genetic Matching 
 - GenMatch(), but I am getting an Error message as follows:

 Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
 BalanceMatrix.binarynp,  :
   GenMatch(): input includes NAs

 Could you please suggest me correcting the above problem?

 My GenMatch command is,

 gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = 
 BalanceMatrix.binarynp, popsize = 1000)

 Thanking you,

 Sincerely Yours,

 Shyam Kumar Basnet
 SLU, Uppsala
 Sweden

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Project local libraries (reproducible research)

2011-12-02 Thread Hadley Wickham

Hi all,

I was wondering if any one had scripts that they could share for
capturing the current version of R packages used for a project. I'm
interested in creating a project local library so that you're safe if
someone (e.g. the ggplot2 author) updates a package you're relying on
and breaks your code.   I could fairly easily hack together, but I was
wondering if any one had any neat scripts they'd care to share.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of 2 matrices

2011-12-02 Thread Hans W Borchers

Michael Kao mkao006rmail at gmail.com writes:

 
Your solution is fast, but not completely correct, because you are also 
counting possible duplicates within the second matrix. The 'refitted'
function could look as follows:

compMat2 - function(A, B) {  # rows of B present in A
B0 - B[!duplicated(B), ]
na - nrow(A); nb - nrow(B0)
AB - rbind(A, B0)
ab - duplicated(AB)[(na+1):(na+nb)]
return(sum(ab))
}

and testing an example the size the OR was asking for:

set.seed(8237)
A  - matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
B  - matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

system.time(n - compMat2(A, B))  # n = 3790

while compMat() will return 5522 rows, with 1732 duplicates within B !
A 3.06 GHz iMac needs about 2 -- 2.5 seconds.

Hans Werner


 On 2/12/2011 2:48 p.m., David Winsemius wrote:
 
  On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:
 
  Hi all,
  I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. 
  I would like to find the number of rows of matrix B that I can find 
  in matrix A (rows that are common to both matrices with or without 
  sorting).
 
  I have tried the intersection and is.element functions in R but 
  it only working for the vectors and not matrix
  i.e,intersection(A,B) and is.element(A,B).
 
  Have you considered the 'duplicated' function?
 
 
 Here is an example based on the duplicated function
 
 test.mat1 - matrix(1:20, nc = 5)
 
 test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
 
 compMat - function(mat1, mat2){
  nr1 - nrow(mat1)
  nr2 - nrow(mat2)
  mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
 }
 
 compMat(test.mat1, test.mat2)
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RFE: vectorized behavior for as.POSIXct tz argument

2011-12-02 Thread Jack Tanner

x - 1472562988 + 1:10; tz - rep(EST,10)

# Case 1: Works as documented
ct - as.POSIXct(x, tz=tz[1], origin=1960-01-01)

# Case 2: Fails
ct - as.POSIXct(x, tz=tz, origin=1960-01-01)

If case 2 worked, it'd be a little easier to process paired (time, time zone)
vectors from different time zones.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of 2 matrices

2011-12-02 Thread Hans W Borchers

Michael Kao mkao006rmail at gmail.com writes:

 
Well, taking a second look, I'd say it depends on the exact formulation.

In the applications I have in mind, I would like to count each occurrence
in B only once. Perhaps the OP never thought about duplicates in B

Hans Werner

 
 Here is an example based on the duplicated function
 
 test.mat1 - matrix(1:20, nc = 5)
 
 test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
 
 compMat - function(mat1, mat2){
  nr1 - nrow(mat1)
  nr2 - nrow(mat2)
  mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
 }
 
 compMat(test.mat1, test.mat2)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem subsetting: undefined columns

2011-12-02 Thread Aurélien PHILIPPOT

thanks Michael.
I played with your suggestion to get the output in the format I wanted, and
I found the following that works fine:

sub-d[, which(colnames(d) %in% v) ]

Aurelien

2011/12/2 R. Michael Weylandt michael.weyla...@gmail.com 
michael.weyla...@gmail.com

 How about this?

 d[, v[v %in% colnames(d)]]

 Michael

 On Dec 2, 2011, at 12:01 PM, AurÃ©lien PHILIPPOT 
 aurelien.philip...@gmail.com wrote:

  Hi Paul and Jim,
  Thanks for your messages.
 
  I just wanted R to give me the columns of my data frame d, whose names
  appear in v. I do not care about the names of v that are not in d. In
  addition, every time, there will be at least one element of v that has a
  corresponding column in d, for sure, so I know there is at least one
 match
  between the 2.
 
  Initially, I  tried something in the spirit:
  sub- subset(d, colnames(d) %in% v)
 
  but I could not make it work properly.
 
 
  Best,
  Aurelien
 
  2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl
 
  On 12/02/2011 07:20 AM, Aurï¿½lien PHILIPPOT wrote:
  Dear R-users,
  -I am new to R, and I am struggling with the following problem.
 
  -I am repeating the following  operations hundreds of times, within a
  loop:
  I want to subset a data frame by columns. I am interested in the
 columns
  names that are given by the rows of another data frame that was built
 in
  parallel. The solution I have so far works well as long as the elements
  of
  the second data frame are included in the column names of the first
 data
  frame but if an element from the second object is not a column name of
  the
  first one, then it bugs.
 
  Hi Aurelien,
 
  I would call this a feature, not a bug. I think R does what it should
  do, you request a non-existent column and it throws an error. What kind
  of behavior are you looking for instead of this error?
 
  regards,
  Paul
 
 
  -More concretely, I have the following data frames d and v:
  mmdd-c(19720601, 19720602, 19720605)
  sret.10006-c(1,2,3)
  sret.10014-c(5,9,7)
  sret.10065-c(10,2,11)
 
 
  d- data.frame(mmdd=mmdd, sret.10006=sret.10006,
  sret.10014=sret.10014, sret.10065=sret.10065)
 
  v- data.frame(V1=sret.10006, V2=sret.10090)
  v- sapply(v, function(x) levels(x)[x])
 
  -I want to do the following subsetting:
  sub- subset(d, select=c(v))
 
 
  and I get the following error message:
  Error in `[.data.frame`(x, r, vars, drop = drop) :
   undefined columns selected
 
 
 
  Any help would be very much appreciated,
 
  Best,
  Aurelien
 
   [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  --
  Paul Hiemstra, Ph.D.
  Global Climate Division
  Royal Netherlands Meteorological Institute (KNMI)
  Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
  P.O. Box 201 | 3730 AE | De Bilt
  tel: +31 30 2206 494
 
  http://intamap.geo.uu.nl/~paul
  http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summarizing elements of a list

2011-12-02 Thread LCOG1

Thank you for the help, I knew it could be done with a member of the apply 
family.  I struggle with apply stuff though, its not always intuitive for me 
with these functions.

Cheers,
 JR

From: Sarah Goslee [via R] [mailto:ml-node+s789695n414453...@n4.nabble.com]
Sent: Thursday, December 01, 2011 6:44 PM
To: ROLL Josh F
Subject: Re: Summarizing elements of a list

How about:

lapply(Version1_, subset, subset=c(TRUE, FALSE))
or sapply() depending on what you want the result to look like.

Thanks for the reproducible example.

Sarah

On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden 
email]/user/SendEmail.jtp?type=nodenode=4144538i=0 wrote:

 Hi everyone,
   I looked around the list for a while but couldn't find a solution to my
 problem.  I am storing some results to a simulation in a list and for each
 element i have two separate vectors(is that what they are called, correct my
 vocab if necessary). See below

 Version1_-list()
 for(i in 1:5){
Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1))
 }

 What I want is to put all of the elements' 'First' vectors into a single
 list to box plot. But whats a more elegant solution to the below?

 c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First)

 since i have 50 or more simulations this is impractical and sloppy.  Do I
 need to store my data differently or is their a solution on the back end?
 Thanks all.

 Josh

--
Sarah Goslee
http://www.functionaldiversity.org

__
[hidden email]/user/SendEmail.jtp?type=nodenode=4144538i=1 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144538.html
To unsubscribe from Summarizing elements of a list, click 
herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==.
NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context: 
http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148568.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summarizing elements of a list

2011-12-02 Thread LCOG1

Great, this worked the fastest of all the suggestions.  Cheers,

Josh

From: Michael Weylandt [via R] [mailto:ml-node+s789695n414494...@n4.nabble.com]
Sent: Thursday, December 01, 2011 8:11 PM
To: ROLL Josh F
Subject: Re: Summarizing elements of a list

Similarly, this might work:

unlist(lapply(Version1_, `[`,First))

Michael

On Thu, Dec 1, 2011 at 9:41 PM, Sarah Goslee [hidden 
email]/user/SendEmail.jtp?type=nodenode=4144941i=0 wrote:

 How about:

 lapply(Version1_, subset, subset=c(TRUE, FALSE))
 or sapply() depending on what you want the result to look like.

 Thanks for the reproducible example.

 Sarah

 On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden 
 email]/user/SendEmail.jtp?type=nodenode=4144941i=1 wrote:
 Hi everyone,
   I looked around the list for a while but couldn't find a solution to my
 problem.  I am storing some results to a simulation in a list and for each
 element i have two separate vectors(is that what they are called, correct my
 vocab if necessary). See below

 Version1_-list()
 for(i in 1:5){
Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1))
 }

 What I want is to put all of the elements' 'First' vectors into a single
 list to box plot. But whats a more elegant solution to the below?

 c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First)

 since i have 50 or more simulations this is impractical and sloppy.  Do I
 need to store my data differently or is their a solution on the back end?
 Thanks all.

 Josh

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=2 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
[hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=3 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144941.html
To unsubscribe from Summarizing elements of a list, click 
herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==.
NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context: 
http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148571.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] order function give back row name

2011-12-02 Thread Martin Bauer

Hello,


I have a matrix results with dimension 1x9 double matrix

 XLB   XLE XLF XLI
1   53.3089  55.77923   37.64458 83.08646

I'm trying to order this matrix 

 print(order(results))
[1] 3 1 2 4 

how can the function order return the columnname XLF   XLB   XLE  XLI  instead 
of 3 1 2 4 

any idea ?

Thank you in advance
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of 2 matrices

2011-12-02 Thread jim holtman

Here is one way of doing it:

compMat2 - function(A, B) {  # rows of B present in A
+B0 - B[!duplicated(B), ]
+na - nrow(A); nb - nrow(B0)
+AB - rbind(A, B0)
+ab - duplicated(AB)[(na+1):(na+nb)]
+return(sum(ab))
+}


set.seed(8237)
A  - matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
B  - matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

system.time({
+   # convert for comparison
+   A.1 - apply(A, 1, function(x) paste(x, collapse = ' '))
+   B.1 - apply(B, 1, function(x) paste(x, collapse = ' '))
+   count - sum(B.1 %in% A.1)
+})
   user  system elapsed
   1.770.001.79


 count
[1] 3905


On Fri, Dec 2, 2011 at 2:46 PM, Hans W Borchers
hwborch...@googlemail.com wrote:
 Michael Kao mkao006rmail at gmail.com writes:


 Well, taking a second look, I'd say it depends on the exact formulation.

 In the applications I have in mind, I would like to count each occurrence
 in B only once. Perhaps the OP never thought about duplicates in B

 Hans Werner


 Here is an example based on the duplicated function

 test.mat1 - matrix(1:20, nc = 5)

 test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

 compMat - function(mat1, mat2){
      nr1 - nrow(mat1)
      nr2 - nrow(mat2)
      mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
 }

 compMat(test.mat1, test.mat2)


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order function give back row name

2011-12-02 Thread R. Michael Weylandt

names(results)[order(results)]

Michael

On Fri, Dec 2, 2011 at 2:45 PM, Martin Bauer bauermar...@gmx.at wrote:
 Hello,


 I have a matrix results with dimension 1x9 double matrix

         XLB       XLE             XLF         XLI
 1       53.3089  55.77923       37.64458     83.08646

 I'm trying to order this matrix

 print(order(results))
 [1] 3 1 2 4

 how can the function order return the columnname XLF   XLB   XLE  XLI  
 instead of 3 1 2 4

 any idea ?

 Thank you in advance
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order function give back row name

2011-12-02 Thread Sarah Goslee

With similar data, since you didn't include reproducible example of your own:

 results - matrix(c(53, 55, 37, 83), nrow=1)
 colnames(results) - letters[1:4]
 results
  a  b  c  d
[1,] 53 55 37 83
 order(results)
[1] 3 1 2 4
 colnames(results)[order(results)]
[1] c a b d


On Fri, Dec 2, 2011 at 2:45 PM, Martin Bauer bauermar...@gmx.at wrote:
 Hello,


 I have a matrix results with dimension 1x9 double matrix

         XLB       XLE             XLF         XLI
 1       53.3089  55.77923       37.64458     83.08646

 I'm trying to order this matrix

 print(order(results))
 [1] 3 1 2 4

 how can the function order return the columnname XLF   XLB   XLE  XLI  
 instead of 3 1 2 4

 any idea ?

 Thank you in advance
 --


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summarizing elements of a list

2011-12-02 Thread R. Michael Weylandt

Here's a slight modification that is even faster if speed is a consideration:

sapply(Version1_, `[[`, First)

The thought process is to go through the list Version1_ and apply
the operation `[[` to each element individually. This requires a
second operator (here the element name First) which we pass through
the ... of sapply() -- I hope that helps you get a sense of the
mechanics. We use sapply() instead of lapply() because it does some
internal simplification for us to get one big vector back, effectively
cutting out the unlist of the first solution I gave you.

Michael

On Fri, Dec 2, 2011 at 2:04 PM, LCOG1 jr...@lcog.org wrote:
 Great, this worked the fastest of all the suggestions.  Cheers,

 Josh

 
 From: Michael Weylandt [via R] 
 [mailto:ml-node+s789695n414494...@n4.nabble.com]
 Sent: Thursday, December 01, 2011 8:11 PM
 To: ROLL Josh F
 Subject: Re: Summarizing elements of a list

 Similarly, this might work:

 unlist(lapply(Version1_, `[`,First))

 Michael

 On Thu, Dec 1, 2011 at 9:41 PM, Sarah Goslee [hidden 
 email]/user/SendEmail.jtp?type=nodenode=4144941i=0 wrote:

 How about:

 lapply(Version1_, subset, subset=c(TRUE, FALSE))
 or sapply() depending on what you want the result to look like.

 Thanks for the reproducible example.

 Sarah

 On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden 
 email]/user/SendEmail.jtp?type=nodenode=4144941i=1 wrote:
 Hi everyone,
   I looked around the list for a while but couldn't find a solution to my
 problem.  I am storing some results to a simulation in a list and for each
 element i have two separate vectors(is that what they are called, correct my
 vocab if necessary). See below

 Version1_-list()
 for(i in 1:5){
        Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1))
 }

 What I want is to put all of the elements' 'First' vectors into a single
 list to box plot. But whats a more elegant solution to the below?

 c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First)

 since i have 50 or more simulations this is impractical and sloppy.  Do I
 need to store my data differently or is their a solution on the back end?
 Thanks all.

 Josh

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=2 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=3 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144941.html
 To unsubscribe from Summarizing elements of a list, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==.
 NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148571.html
 Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RFE: vectorized behavior for as.POSIXct tz argument

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 2:28 PM, Jack Tanner wrote:


x - 1472562988 + 1:10; tz - rep(EST,10)

# Case 1: Works as documented
ct - as.POSIXct(x, tz=tz[1], origin=1960-01-01)

# Case 2: Fails
ct - as.POSIXct(x, tz=tz, origin=1960-01-01)


sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt,  
origin=1960-01-01),simplify=FALSE)




If case 2 worked, it'd be a little easier to process paired (time,  
time zone)

vectors from different time zones.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plot and polygon in log scale

2011-12-02 Thread Santosh

Dear Experts,

When using plot and polygon, I can change the density and angle of the
shaded area lines when plotting is done in regular scale. It does not seem
to work in 'log' scale. Any suggestions would be highly appreciated!

below is an example:

plot(1:10,c(1:10)^2*20,log=y)
polygon(c(3:7,7:3),c((3:7)^2*20,c(7:3)^2*10),col='grey',angle=45,dens=30)

Warning message:
In polygon.fullhatch(xy$x[start:(end - 1)], xy$y[start:(end - 1)],  :
  cannot hatch with logarithmic scale active

Regards,
Santosh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RFE: vectorized behavior for as.POSIXct tz argument

2011-12-02 Thread Jack Tanner

David Winsemius dwinsemius at comcast.net writes:

 sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt,  
 origin=1960-01-01),simplify=FALSE)

Sure, there's no end of workarounds. It would just be consistent to treat both
the x and the tz arguments as vectors.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RFE: vectorized behavior for as.POSIXct tz argument

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 4:06 PM, Jack Tanner wrote:


David Winsemius dwinsemius at comcast.net writes:


sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt,
origin=1960-01-01),simplify=FALSE)


Sure, there's no end of workarounds. It would just be consistent to  
treat both

the x and the tz arguments as vectors.


I've wondered abut that too. The function where I would like to see a  
dual vectorized application is 'rep'. In cases where the x argument is  
the same length as the 'times' or 'each' arguments I would like to see  
it produce  a vector that is sum(each) or tume(times) long.


The problem is most likely in the ambiguity of how to apply the  
arguments:


 unlist(sapply(1:5, function(tt)  rep(1:5, each=tt)))
 [1] 1 2 3 4 5 1 1 2 2 3 3 4 4 5 5 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 1  
1 1 2 2 2 2 3
[40] 3 3 3 4 4 4 4 5 5 5 5 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5  
5 5 5



 mapply(rep, x=1:5, each=1:5)
[[1]]
[1] 1

[[2]]
[1] 2 2

[[3]]
[1] 3 3 3

[[4]]
[1] 4 4 4 4

[[5]]
[1] 5 5 5 5 5



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave problem on Mac OS when using umlauts and summary()

2011-12-02 Thread Mark Heckmann

I have the following Sweave file which gets sweaved correctly.

=
m - lm(y1 ~x1, anscombe)
summary(m)
@ 

I include the sweaved .tex file into another .tex file via include.
When I use a single umlaut in the .snw file a warning occurs.
As a result part of the summary output is not contained in the .tex file.

ä
=
m - lm(y1 ~x1, anscombe)
summary(m)
@   

You can now run (pdf)latex on 'ch1.tex'
Warnmeldungen:
1: ch1.Snw has unknown encoding: assuming Latin-1 
2: ungültige Zeichenkette in Konvertierung der Ausgabe (wrong character in 
conversion of output)

Interestingly, this error does NOT occur, when I omit the summary(m) statement.

ä
=
m - lm(y1 ~x1, anscombe)
#summary(m)
@   

You can now run (pdf)latex on 'ch1.tex'
Warnmeldung:
ch1.Snw has unknown encoding: assuming Latin-1 

I know that I can prevent this by adding a line at the beginning of the .snw 
file:

\usepackage[utf8]{inputenc}

ä
=
m - lm(y1 ~x1, anscombe)
summary(m)
@ 

This gets sweaved correctly without warnings:

But this solution is not good as it is not the preamble of the .tex document 
where I add the usepackage line.
This will cause an error when processing the entire document with tex.

How can I achieve the last result in another way?
I tried: 

Sweave('/Users/markheckmann/Desktop/test_sweave/ch1.Snw', encoding=UFT-8)

But this does not work either when the usepackage line is omitted.

I am stuck here. Can anyone help?

TIA 
Mark



Mark Heckmann
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com











[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: calculate mean of multiple rows in a data frame

2011-12-02 Thread Jabez Wilson

Thank you, I copied the data from the R environment, but it came out wrong. You 
understood exactly what I wanted, and your solution is admirable: I clearly 
need to address the naming convention. Thanks for your help.

--- On Fri, 2/12/11, Jean V Adams jvad...@usgs.gov wrote:


From: Jean V Adams jvad...@usgs.gov
Subject: Re: [R] Fw: calculate mean of multiple rows in a data frame
To: Jabez Wilson jabez...@yahoo.co.uk
Cc: R-Help r-h...@stat.math.ethz.ch
Date: Friday, 2 December, 2011, 14:29



It's easier for folks to help you if you put your example data in a format that 
can be readily read in R.  See, for example, the dput() function, which you can 
use to provide us with something like this: 

DF - structure(list(NAME = c(Control_1, Control_2, Control_1, 
Control_3, MM0289~RFU:11810.15, MM0289~RFU:9238.41, 
MM16597~RFU:36765.38, 
MM16597~RFU:41258.94), ID = c(probe~B01R01C01, probe~B01R01C02, 
probe~B01R09C01, probe~B01R09C02, probe~B29R13C06, probe~B29R13C05, 
probe~B44R15C20, probe~B44R15C19), a = c(3L, 712L, 937L, 
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = 
c(NAME, 
ID, a, b, c, d), class = data.frame, row.names = c(1, 
2, 3, 4, 5, 6, 7, 8)) 

If I understand what you're after, you want to summarize data within groups, 
but your NAME variable is not as general as you would like.  You can get around 
this by creating a new variable which is a shorter and more general version of 
the NAME variable.  I did this by saving just the part of the NAME before the 
colon, :. 

shortname - sapply(strsplit(DF$NAME, :), [, 1) 
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) 

    shortname   a     b     c     d 
1   Control_1 470 423.0 912.0 721.0 
2   Control_2 712  13.0  32.0 179.0 
3   Control_3 464 836.0 508.0  53.0 
4  MM0289~RFU 352 573.5 734.5 779.5 
5 MM16597~RFU 416 850.0 358.0 788.5 

Jean 


 Jabez Wilson wrote on 12/01/2011 03:15:39 PM:

 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686
 
 
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686
 Sorry, that should look like this:
 
 
 
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686 NAME ID a b c d 
 1 Control_1 probe~B01R01C01 3 22 926 774 
 2 Control_2 probe~B01R01C02 712 13 32 179 
 3 Control_1 probe~B01R09C01 937 824 898 668 
 4 Control_3 probe~B01R09C02 464 836 508 53 
 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995
 
 --- On Thu, 1/12/11, Jabez Wilson jabez...@yahoo.co.uk wrote:
 
 
 From: Jabez Wilson jabez...@yahoo.co.uk
 Subject: calculate mean of multiple rows in a data frame
 To: R-Help r-h...@stat.math.ethz.ch
 Date: Thursday, 1 December, 2011, 20:45
 
 Dear all, I have a data frame (DF) in the following format:
 
 NAME
 ID
 a
 b
 c
 d
 
 1
 Control_1
 probe~B01R01C01
 381
 213
 345
 653
 
 2
 Control_2
 probe~B01R01C02
 574
 629
 563
 783
 
 3
 Control_1
 probe~B01R09C01
 673
 511
 521
 967
 
 4
 Control_3
 probe~B01R09C02
 53
 809
 999
 50
 
 5
 MM0289~RFU:11810.15
 probe~B29R13C06
 681
 34
 115
 587
 
 6
 MM0289~RFU:9238.41
 probe~B29R13C05
 784
 443
 20
 784
 
 7
 MM16597~RFU:36765.38
 probe~B44R15C20
 719
 251
 790
 445
 
 8
 MM16597~RFU:41258.94
 probe~B44R15C19
 677
 363
 268
 686.
 I would like to consolidate the data frame by parsing through the 
 rows, and where the NAME is identical, consolidate into one row and 
 return the mean.
 I can do this for the first lines (Control_1 etc) by using aggregate()

[R] Imputing data

2011-12-02 Thread khlam

So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data

data(pc)
pc.na-pc
pc.roughfix - na.roughfix(pc.na)
pc.narf - randomForest(pc.na, na.action=na.roughfix)


yet it does not replace the NA in the list.  Presently I want to replace the
NA with maybe the mean of the rows or columns or some type of correlation.

Any help would be appreciated. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Imputing-data-tp4150041p4150041.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Writing data including NAs to access using RODBC

2011-12-02 Thread Matthew Johnson

Hi,

I have run into a problem writing data using RODBC. The dataframe i
have read in from access includes some NAs. I have put the data into
an xts object, manipulated the data, and would now like to append two
columns of the manipulated data to the original table in access.

I cannot append the data, nor write a new table. After some fiddling
about i think that it is that the vectors i wish to append to the
original dataframe /write include some NAs.

Is there a work around?

Thanks

Matt Johnson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave problem on Mac OS when using umlauts and summary()

2011-12-02 Thread Yihui Xie

This problem comes up so frequently that I have made
options(useFancyQuotes=FALSE) by default in my knitr package:
http://yihui.github.com/knitr/

You can also use options(useFancyQuotes='TeX').

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Fri, Dec 2, 2011 at 4:08 PM, Mark Heckmann mark.heckm...@gmx.de wrote:
 I have the following Sweave file which gets sweaved correctly.

 =
 m - lm(y1 ~x1, anscombe)
 summary(m)
 @

 I include the sweaved .tex file into another .tex file via include.
 When I use a single umlaut in the .snw file a warning occurs.
 As a result part of the summary output is not contained in the .tex file.

 ä
 =
 m - lm(y1 ~x1, anscombe)
 summary(m)
 @

 You can now run (pdf)latex on 'ch1.tex'
 Warnmeldungen:
 1: ‘ch1.Snw’ has unknown encoding: assuming Latin-1
 2: ungültige Zeichenkette in Konvertierung der Ausgabe (wrong character in 
 conversion of output)

 Interestingly, this error does NOT occur, when I omit the summary(m) 
 statement.

 ä
 =
 m - lm(y1 ~x1, anscombe)
 #summary(m)
 @

 You can now run (pdf)latex on 'ch1.tex'
 Warnmeldung:
 ‘ch1.Snw’ has unknown encoding: assuming Latin-1

 I know that I can prevent this by adding a line at the beginning of the .snw 
 file:

 \usepackage[utf8]{inputenc}

 ä
 =
 m - lm(y1 ~x1, anscombe)
 summary(m)
 @

 This gets sweaved correctly without warnings:

 But this solution is not good as it is not the preamble of the .tex document 
 where I add the usepackage line.
 This will cause an error when processing the entire document with tex.

 How can I achieve the last result in another way?
 I tried:

 Sweave('/Users/markheckmann/Desktop/test_sweave/ch1.Snw', encoding=UFT-8)

 But this does not work either when the usepackage line is omitted.

 I am stuck here. Can anyone help?

 TIA
 Mark


 
 Mark Heckmann
 Blog: www.markheckmann.de
 R-Blog: http://ryouready.wordpress.com











        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Imputing data

2011-12-02 Thread Peter Langfelder

On Fri, Dec 2, 2011 at 2:16 PM, khlam kh...@ucsc.edu wrote:
 So I have a very big matrix of about 900 by 400 and there are a couple of NA
 in the list. I have used the following functions to impute the missing data

 data(pc)
 pc.na-pc
 pc.roughfix - na.roughfix(pc.na)
 pc.narf - randomForest(pc.na, na.action=na.roughfix)


 yet it does not replace the NA in the list.  Presently I want to replace the
 NA with maybe the mean of the rows or columns or some type of correlation.

 Any help would be appreciated.

There are several imputation functions available in the various
packages - for example, packages Hmisc and e1071 both contain a
function called impute, and the package impute contains the function
impute.knn for nearest neighbor imputation.

HTH,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R, PostgresSQL and poor performance

2011-12-02 Thread Joe Conway

On 12/02/2011 09:46 PM, Berry, David I. wrote:
 Thanks for the reply and suggestions. I've tried the RpgSQL drivers and
 the results are pretty similar in terms of performance.
 
 The ~1.5M records I'm trying to read into R are being extracted from a
 table with ~300M rows (and ~60 columns) that has been indexed on the
 relevant columns and horizontally partitioned (with constraint checking
 on). I do need to try and optimize the database a bit more but I don¹t
 think this is the cause of the performance issues.

With that much data you might want to consider PL/R:
  http://www.joeconway.com/plr/

HTH,

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Imputing data

2011-12-02 Thread Weidong Gu

Hi,

For imputation using randomForest package, check

?rfImpute

Weidong

On Fri, Dec 2, 2011 at 6:00 PM, Peter Langfelder
peter.langfel...@gmail.com wrote:
 On Fri, Dec 2, 2011 at 2:16 PM, khlam kh...@ucsc.edu wrote:
 So I have a very big matrix of about 900 by 400 and there are a couple of NA
 in the list. I have used the following functions to impute the missing data

 data(pc)
 pc.na-pc
 pc.roughfix - na.roughfix(pc.na)
 pc.narf - randomForest(pc.na, na.action=na.roughfix)


 yet it does not replace the NA in the list.  Presently I want to replace the
 NA with maybe the mean of the rows or columns or some type of correlation.

 Any help would be appreciated.

 There are several imputation functions available in the various
 packages - for example, packages Hmisc and e1071 both contain a
 function called impute, and the package impute contains the function
 impute.knn for nearest neighbor imputation.

 HTH,

 Peter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot and polygon in log scale

2011-12-02 Thread Peter Ehlers


On 2011-12-02 13:03, Santosh wrote:

Dear Experts,

When using plot and polygon, I can change the density and angle of the
shaded area lines when plotting is done in regular scale. It does not seem
to work in 'log' scale. Any suggestions would be highly appreciated!

below is an example:

plot(1:10,c(1:10)^2*20,log=y)
polygon(c(3:7,7:3),c((3:7)^2*20,c(7:3)^2*10),col='grey',angle=45,dens=30)

Warning message:
In polygon.fullhatch(xy$x[start:(end - 1)], xy$y[start:(end - 1)],  :
   cannot hatch with logarithmic scale active

Regards,
Santosh


It looks like density is not implemented for log scales. (There is a
comment in the source file.) Perhaps a note in the help file might be
useful, but I would think that nowadays most users would want colour
or shades of gray anyway.

Of course, you can always do the logging on the data before plotting.
You'll just have to use the axis() function to print appropriate axis
labels.

Peter Ehlers



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help! Big problem when using browser() to do R debugging?

2011-12-02 Thread Michael

Hi all,

Could you please help me?

I am having the following weird problem when debugging R programs
using browser():

In my function, I've inserted a browser() in front of Step 1. My
function has 3 steps and at the end of each step, it will print out
the message Step i is done...

However, after I hit ENTER when the program stopped before Step 1
and entered into the debugging mode, it not only executed the next
line(i.e. the Step 1), but also all the (many) remaining lines in that
function, as shown below:


Browse[1]
[1] Step 1 is done..
[1] Step 2 is done..
[1] Step 3 is done..

Then it automatically quited the debugging mode and when I tried to
check the value of myobj, I've got the following error message:

 names(myobj)
Error: object 'myobj' not found
No suitable frames for recover()



So my question is: why did one key stroke ENTER lead it to execute
all the remaining lines in that function and then returned from the
function and quited the debugging mode?

Thanks a lot!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help! Big problem when using browser() to do R debugging?

2011-12-02 Thread Duncan Murdoch


On 11-12-02 8:38 PM, Michael wrote:

Hi all,

Could you please help me?

I am having the following weird problem when debugging R programs
using browser():

In my function, I've inserted a browser() in front of Step 1. My
function has 3 steps and at the end of each step, it will print out
the message Step i is done...

However, after I hitENTER  when the program stopped before Step 1
and entered into the debugging mode, it not only executed the next
line(i.e. the Step 1), but also all the (many) remaining lines in that
function, as shown below:


Browse[1]
[1] Step 1 is done..
[1] Step 2 is done..
[1] Step 3 is done..

Then it automatically quited the debugging mode and when I tried to
check the value of myobj, I've got the following error message:


names(myobj)

Error: object 'myobj' not found
No suitable frames for recover()



So my question is: why did one key strokeENTER  lead it to execute
all the remaining lines in that function and then returned from the
function and quited the debugging mode?


See ?browser.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] side-by-side map with different geographies using spplot

2011-12-02 Thread David Epstein


Hello,

I want to create side-by-side maps of similar attribute data in two 
different cities using a single legend.


To simply display side-by-side census block group boundary 
(non-thematic) maps for Minneapolis  Cleveland I do the following:


library(rgdal)
library(sp)
Minneapolis=readOGR(../Minneapolis/Census/2010/Census_BlockGroup_GEO/,tl_2010_27053_bg10)
Cleveland=readOGR(../Cleveland/Census/2010/Census_BlockGroup_GEO/,tl_2010_39035_bg10)
par(mfrow=c(1,2))
plot(Minneapolis)
plot(Cleveland)

I can display a single thematic map for a city using spplot as follows:

spplot(Minneapolis,Thematic_Data_Column)

But, calling the function again for Cleveland just overwrites the 
window. I am unsure how to use spplot's layout tools with two different 
geographies. Most examples use a single geography and multiple attribute 
columns. Alternatively, is there a way to use par together with spplot 
to allow for multiple spplot calls?


thank you,
-david

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple lm question

2011-12-02 Thread Worik R


 Use `lm` the way it is designed to be used, with a data argument:

  l2 - lm(e~. , data=as.data.frame(M))
  summary(l2)

 Call:
 lm(formula = e ~ ., data = as.data.frame(M))


And what is the regression being done in this case?  How are the
independent  variables used?

It looks like M[,5]~M[,1]+M[,2]+M[,3]+M[,4] as those are the
coefficients.   But the results are different when I do that explicitly:

 M - matrix(runif(5*20), nrow=20)
 colnames(M) - c('a', 'b', 'c', 'd', 'e')
 l1 - lm(df[,'e']~., data=df)
 summary(l1)

Call:
lm(formula = df[, e] ~ ., data = df)

Residuals:
   Min 1Q Median 3QMax
-9.580e-17 -3.360e-17 -8.596e-18  9.114e-18  2.032e-16

Coefficients:
  Estimate Std. Errort value Pr(|t|)
(Intercept) -7.505e-17  7.158e-17 -1.048e+000.312
a   -1.653e-17  7.117e-17 -2.320e-010.820
b   -5.042e-17  5.480e-17 -9.200e-010.373
c4.236e-17  5.774e-17  7.340e-010.475
d   -3.878e-17  4.946e-17 -7.840e-010.446
e1.000e+00  6.083e-17  1.644e+16   2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 6.763e-17 on 14 degrees of freedom
Multiple R-squared: 1,Adjusted R-squared: 1
F-statistic: 6.435e+31 on 5 and 14 DF,  p-value:  2.2e-16

 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
 summary(l3)

Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

Residuals:
 Min   1Q   Median   3Q  Max
-0.49398 -0.14203  0.01588  0.14157  0.31335

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   0.6681 0.1859   3.594  0.00266 **
M[, 1]   -0.1767 0.2419  -0.730  0.47644
M[, 2]   -0.3874 0.2135  -1.814  0.08970 .
M[, 3]0.3695 0.2180   1.695  0.11078
M[, 4]0.1361 0.2366   0.575  0.57360
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.2449 on 15 degrees of freedom
Multiple R-squared: 0.2988,Adjusted R-squared: 0.1119
F-statistic: 1.598 on 4 and 15 DF,  p-value: 0.2261


cheers
Worik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple lm question

2011-12-02 Thread R. Michael Weylandt

In your code by supplying a vector M[,e] you are regressing e
against all the variables provided in the data argument, including e
itself -- this gives the very strange regression coefficients you
observe. R has no way to know that that's somehow related to the e
it sees in the data argument.

In the suggested way,

lm(formula = e ~ ., data = as.data.frame(M))

e is regressed against everything that is not e and sensible results are given.

Michael

On Fri, Dec 2, 2011 at 11:03 PM, Worik R wor...@gmail.com wrote:

 Use `lm` the way it is designed to be used, with a data argument:

  l2 - lm(e~. , data=as.data.frame(M))
  summary(l2)

 Call:
 lm(formula = e ~ ., data = as.data.frame(M))


 And what is the regression being done in this case?  How are the
 independent  variables used?

 It looks like M[,5]~M[,1]+M[,2]+M[,3]+M[,4] as those are the
 coefficients.   But the results are different when I do that explicitly:

 M - matrix(runif(5*20), nrow=20)
 colnames(M) - c('a', 'b', 'c', 'd', 'e')
 l1 - lm(df[,'e']~., data=df)
 summary(l1)

 Call:
 lm(formula = df[, e] ~ ., data = df)

 Residuals:
       Min         1Q     Median         3Q        Max
 -9.580e-17 -3.360e-17 -8.596e-18  9.114e-18  2.032e-16

 Coefficients:
              Estimate Std. Error    t value Pr(|t|)
 (Intercept) -7.505e-17  7.158e-17 -1.048e+00    0.312
 a           -1.653e-17  7.117e-17 -2.320e-01    0.820
 b           -5.042e-17  5.480e-17 -9.200e-01    0.373
 c            4.236e-17  5.774e-17  7.340e-01    0.475
 d           -3.878e-17  4.946e-17 -7.840e-01    0.446
 e            1.000e+00  6.083e-17  1.644e+16   2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 6.763e-17 on 14 degrees of freedom
 Multiple R-squared:     1,    Adjusted R-squared:     1
 F-statistic: 6.435e+31 on 5 and 14 DF,  p-value:  2.2e-16

 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
 summary(l3)

 Call:
 lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

 Residuals:
     Min       1Q   Median       3Q      Max
 -0.49398 -0.14203  0.01588  0.14157  0.31335

 Coefficients:
            Estimate Std. Error t value Pr(|t|)
 (Intercept)   0.6681     0.1859   3.594  0.00266 **
 M[, 1]       -0.1767     0.2419  -0.730  0.47644
 M[, 2]       -0.3874     0.2135  -1.814  0.08970 .
 M[, 3]        0.3695     0.2180   1.695  0.11078
 M[, 4]        0.1361     0.2366   0.575  0.57360
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.2449 on 15 degrees of freedom
 Multiple R-squared: 0.2988,    Adjusted R-squared: 0.1119
 F-statistic: 1.598 on 4 and 15 DF,  p-value: 0.2261


 cheers
 Worik

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple lm question

2011-12-02 Thread Worik R

Duh!  Silly me!  But my confusion persits:  What is the regression being
done?  See below

On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 In your code by supplying a vector M[,e] you are regressing e
 against all the variables provided in the data argument, including e
 itself -- this gives the very strange regression coefficients you
 observe. R has no way to know that that's somehow related to the e
 it sees in the data argument.


 In the suggested way,

 lm(formula = e ~ ., data = as.data.frame(M))

 e is regressed against everything that is not e and sensible results are
 given.


But still 'l1 - lm(e~., data=df)' is not the same as 'l3 -
lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])'

 M - matrix(runif(5*20), nrow=20)
 colnames(M) - c('a', 'b', 'c', 'd', 'e')
 l1 - lm(e~., data=df)
 summary(l1)

Call:
lm(formula = e ~ ., data = df)

Residuals:
 Min   1Q   Median   3Q  Max
-0.38343 -0.21367  0.03067  0.13757  0.49080

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  0.285210.29477   0.9680.349
a0.092830.30112   0.3080.762
b0.239210.22425   1.0670.303
c   -0.160270.24154  -0.6640.517
d0.240250.20054   1.1980.250

Residual standard error: 0.2871 on 15 degrees of freedom
Multiple R-squared: 0.1602,Adjusted R-squared: -0.06375
F-statistic: 0.7153 on 4 and 15 DF,  p-value: 0.5943

 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
 summary(l3)

Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

Residuals:
 Min   1Q   Median   3Q  Max
-0.36355 -0.22679 -0.01202  0.18462  0.37377

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  0.769720.24501   3.142  0.00672 **
M[, 1]  -0.238300.24123  -0.988  0.33890
M[, 2]  -0.020460.21958  -0.093  0.92699
M[, 3]  -0.295180.22559  -1.308  0.21040
M[, 4]  -0.315450.24570  -1.284  0.21866
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.2668 on 15 degrees of freedom
Multiple R-squared: 0.2762,Adjusted R-squared: 0.08317
F-statistic: 1.431 on 4 and 15 DF,  p-value: 0.272



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple lm question

2011-12-02 Thread David Winsemius



On Dec 2, 2011, at 11:20 PM, Worik R wrote:

Duh!  Silly me!  But my confusion persits:  What is the regression  
being

done?  See below


Sigh  Please note that your df and M are undoubtedly different  
objects by now:


 M - matrix(runif(5*20), nrow=20)
 colnames(M) - c('a', 'b', 'c', 'd', 'e')
 l1 - lm(e~., data=as.data.frame(M))
 l1

Call:
lm(formula = e ~ ., data = as.data.frame(M))

Coefficients:
(Intercept)abcd
0.40139 -0.15032 -0.06242  0.13139  0.23905

 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
 l3

Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

Coefficients:
(Intercept)   M[, 1]   M[, 2]   M[, 3]   M[, 4]
0.40139 -0.15032 -0.06242  0.13139  0.23905

As expected.

--
David.



On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:


In your code by supplying a vector M[,e] you are regressing e
against all the variables provided in the data argument, including  
e

itself -- this gives the very strange regression coefficients you
observe. R has no way to know that that's somehow related to the e
it sees in the data argument.




In the suggested way,

lm(formula = e ~ ., data = as.data.frame(M))

e is regressed against everything that is not e and sensible  
results are

given.



But still 'l1 - lm(e~., data=df)' is not the same as 'l3 -
lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])'


M - matrix(runif(5*20), nrow=20)
colnames(M) - c('a', 'b', 'c', 'd', 'e')
l1 - lm(e~., data=df)
summary(l1)


Call:
lm(formula = e ~ ., data = df)

Residuals:
Min   1Q   Median   3Q  Max
-0.38343 -0.21367  0.03067  0.13757  0.49080

Coefficients:
   Estimate Std. Error t value Pr(|t|)
(Intercept)  0.285210.29477   0.9680.349
a0.092830.30112   0.3080.762
b0.239210.22425   1.0670.303
c   -0.160270.24154  -0.6640.517
d0.240250.20054   1.1980.250

Residual standard error: 0.2871 on 15 degrees of freedom
Multiple R-squared: 0.1602,Adjusted R-squared: -0.06375
F-statistic: 0.7153 on 4 and 15 DF,  p-value: 0.5943


l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
summary(l3)


Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

Residuals:
Min   1Q   Median   3Q  Max
-0.36355 -0.22679 -0.01202  0.18462  0.37377

Coefficients:
   Estimate Std. Error t value Pr(|t|)
(Intercept)  0.769720.24501   3.142  0.00672 **
M[, 1]  -0.238300.24123  -0.988  0.33890
M[, 2]  -0.020460.21958  -0.093  0.92699
M[, 3]  -0.295180.22559  -1.308  0.21040
M[, 4]  -0.315450.24570  -1.284  0.21866
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.2668 on 15 degrees of freedom
Multiple R-squared: 0.2762,Adjusted R-squared: 0.08317
F-statistic: 1.431 on 4 and 15 DF,  p-value: 0.272





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

93 matches

Mail list logo