Re: [R] p-values < 2.2e-16 not reported

2010-05-19 Thread Shi, Tao
Will,

 I'm wondering if you have any 
insights after looking at the cor.test source code.  It seems to be fine to me, 
as the p value is either calculated by "your first method" or a 
.C code.

...Tao



- Original Message 
> From: Will Eagle 
> To: r-help@r-project.org
> Sent: Wed, May 19, 2010 3:31:26 PM
> Subject: Re: [R] p-values < 2.2e-16 not reported
> 
> Dear all,

thanks for your feedback so far. With the help of a colleague I 
> think I found the solution to my problem:

> 
> pt(10,100,lower=FALSE)
[1] 4.950844e-17

IS *NOT* EQUAL TO

> 
> 1-pt(10,100,lower=TRUE)
[1] 0

This means that R is capable of 
> providing p-values < 2.2e-16, however, if the value is used in a substraction 
> or addition then the default value of the machine epsilon .Machine$double.eps 
> =  2.220446e-16 is applied. This causes that all p-values smaller than this 
> threshold are set to zero. This problem applies also to other distribution 
> functions like pnorm() and others. For your information I would also like to 
> quote the relevant part of the R manual on .Machine$double.eps:
"the smallest 
> positive floating-point number x such that 1 + x != 1. It equals 
> base^ulp.digits 
> if either base is 2 or rounding is 0; otherwise, it is (base^ulp.digits)/ 
> 2.  Normally 2.220446e-16."

Although different opinions were 
> expressed on whether it makes sense to differentiate p-values below the 
> machine 
> epsilon, in my opinion different effect sizes should correspond with 
> different 
> p-values when reporting statistical results. Additionally, in certain 
> scientific 
> fields, eg genetics, where usually many tests are performed and simple 
> methods, 
> eg Bonferroni method, are used to adjust for multiple testing, it is 
> important 
> to know the exact size of the p-value.

Therefore, I would like to suggest 
> that operations of the 2nd variant (ie 1-pt(10,100,lower=TRUE)) should be 
> deprecated to calculate p-values and operations of the 1st variant (ie 
> pt(10,100,lower=FALSE)) should be used  instead. Since I have seen the 2nd 
> variant being frequently used (also by very experienced R users) and I assume 
> that it is hidden in many statistical test functions, eg cor.test(), this 
> issue 
> seems to me quite important.

Best 
> regards,

Will

__

> ymailto="mailto:R-help@r-project.org"; 
> href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list

> href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank 
> >https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
> guide http://www.R-project.org/posting-guide.html
and provide commented, 
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Re : Nomogram with multiple interactions (package rms)

2010-05-19 Thread Marc Carpentier
Thank you for your responses, but I don't think you're right about the doc...
I carefully looked at it before posting and ran the examples, looked in 
Vanderbilt Biostat doc, and just looked again example(nomogram) :
1st example : categorical*continous : two axes for each sex
f <- lrm(y ~ lsp(age,50)+sex*rcs(cholesterol,4)+blood.pressure)


2nd : continous*continous : one "age" axe for each specified value of 
cholesterol
g <- lrm(y ~ sex + rcs(age,3)*rcs(cholesterol,3))

3rd and 4th : categorical*continous : two axes for each sex (4th with fun)
f <- psm(Surv(d.time,death) ~ sex*age, dist='lognormal')

5th : categorical*continous : two axes for each sex (with fun)
g <- lrm(Y ~ age+rcs(cholesterol,4)*sex)

I'm desperately trying to represent a case of categorical*(continous+continous) 
:
f2 <- cph(Surv(d.time,death) ~ sex*(rcs(cholesterol,4)+blood.pressure)
The best solution I can think of is to draw one nomogram for each sex :
Assuming 'male' is the ref level of sex :
1st nomogram : one axe for rcs(cholesterol,4), one axe for blood.pressure
2nd nomogram : one axe for sex:rcs(cholesterol,4), one axe for 
sex:blood.pressure, both shifted because of the sex own effect.
(I badly draw it in my previous mail)
I didn't see any example of this "adjustement" of nomogram to 'male' or 
'female'...

I hope I gave a clearer explanation and I'm not wrong about this unmentioned 
case.

Marc




- Message d'origine 
De : Frank E Harrell Jr 
À : Marc Carpentier 
Cc : r-help-request Mailing List 
Envoyé le : Jeu 20 mai 2010, 0h 55min 32s
Objet : Re: Re : [R] Nomogram with multiple interactions (package rms)

On 05/19/2010 04:36 PM, Marc Carpentier wrote:
> I'm sorry. I don't understand the "omit" solution, and maybe I mislead you 
> with my explanation.
>
> With the data from the "f" exemple of nomogram() :
> Let's declare :
> f2<- cph(Surv(d.time,death) ~ sex*(age+blood.pressure))
> I guess the best (and maybe the only) way to represent it with a nomogram is 
> to plot two nomograms (I couldn't find better).
> Is there a way to have :
>
> Nomogram1 : "male" :
> - points 1-100 ---
> - age ("men") ---
> - blood.pressure ("men") ---
> - linear predictor ---
>
> And nomogram2 : "female" :
> - points 1-100 ---
> - age ("female") ---
> - blood.pressure ("female") ---
> - linear predictor ---
>
> As I said I tried and failed (nomogram() still wants me to define
> interact=list(...)) with :
> plot(nomorgam(f2, adj.to=list(sex="male")) #and "female" for the other one
>
> Marc

I think the documentation tells you how to do this.  But you failed to 
look at the output from example(nomogram).  In one of the examples two 
continuous predictors have two axes each, with male and female in close 
proximity.  Or maybe I'm just missing your point.

Frank

>
>
>
> - Message d'origine 
> De : Frank E Harrell Jr
> À : Marc Carpentier; r-help-request Mailing 
> List
> Envoyé le : Mer 19 mai 2010, 22h 28min 51s
> Objet : Re: [R] Nomogram with multiple interactions (package rms)
>
> On 05/19/2010 03:17 PM, Marc Carpentier wrote:
>> Dear list, I'm facing the following problem : A cox model with my sex
>> variable interacting with several continuous variables :
>> cph(S~sex*(x1+x2+x3)) And I'd like to make a nomogram. I know it's a
>> bit tricky and one mights argue that nomogram is not a good a
>> choice... I could use the parameter
>> interact=list(sex=("male","female"),x1=c(a,b,c))... but with rcs or
>> pol transformations of x1, x2 and x3, the choice of the
>> categorization (a,b,c,...) is arbitrary and the nomogram not so
>> useful... Considering that sex is the problem, I thought I could draw
>> two nomograms, one for each sex... based on one model. These would be
>> great. Do you think it's possible ?
>
> Yes, you can specify constant predictors not to draw with the omit=
> argument.  But try first to draw everything.  Shouldn't you just get 2
> axes each for x1 x2 x3?
>
>>
>> Taking the exemple of the help of nomogram() (package "rms") : f<-
>> psm(Surv(d.time,death) ~ sex*age, dist=if(.R.)'lognormal' else
>> 'gaussian')
>
> Drop the if(.R.) which was just corrected in the documentation.  Use
> dist='lognormal'
>
> Frank
>
>>
>> Let's add the previously defined blood.pressure effect with an
>> interaction with sex too (with cph) : f2<- cph(Surv(d.time,death) ~
>> sex*(age+blood.pressure))
>>
>> I thought of the parameter adt.to : plot(nomorgam(f2,
>> adj.to=list(sex="male")) #and "female" for the other one
>>
>> But nomogram() still wants me to define interact=list(...) Thanks for
>> any advice you might have (with adj.to or any alternative...)
>>
>> Marc Carpentier
>>
>
>


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
  Department of Biostatistics   Vanderbilt University





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailm

[R] Indexing with sparse matrices (SparseM)

2010-05-19 Thread Robin Jeffries
Hello,

I'm working with a very large, very sparse X matrix. Let csr.X <- *
as.matrix.csr*(X) as described by the SparseM package.


The documentation says that "Indexing  work just like they do on dense
matrices". To me this says that I should be able to perform operations on
the rows of csr.X in the same way I would on X itself. E.g.
f <- function(x){
  for (i in 1:n){
u[i] <- log(1+exp(t(X[i,])%*%beta))
  }
  sm <- sum(u)
  return(sm)
}

However, csr.X[i,] doesn't exist.
Now I get how *as.matrix.csr* coerces X into an object with three arrays,
two indexes and a list of the non-zero data. What I can't quite wrap my
brain around is how I would go about using those indices to perform
iterative operations on the rows of X, for example in my toy function
above.

I'm hoping that someone with more experience working with sparse matrices
can provide a few suggestions or pointers? I'm not hooked on this package
either, it was just the first one I came across via Rseek.


Many thanks,
-Robin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] se from lme

2010-05-19 Thread Meissner, Tony (DWLBC)
Is there a R package that calculates the standard errors of prediction/fit from 
lme/nlme models?

Tschüß
Tony Meissner
Principal Scientist (Monitoring)
Resources Monitoring Group
Science, Monitoring and Information Division
Department of Water, Land & Biodiversity Conservation
"Imagine" ©
*(ph) (08) 8595 2209
*(mob) 0401 124 971
*(fax) (08) 8595 2232
* 28 Vaughan Terrace, Berri SA 5343
PO Box 240, Berri SA 5343
DX 51103
***The information in this e-mail may be confidential and/or legally 
privileged.  Use or disclosure of the information by anyone other than the 
intended recipient is prohibited and may be unlawful.  If you have received 
this e-mail in error, please advise by return e-mail or by telephoning +61 8 
8595 2209




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Mohan L
On Thu, May 20, 2010 at 10:31 AM, Joshua Wiley wrote:

> Dear Mohan,
>
> First, I would like to modify my code slightly to:
>
> data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
> na.rm=TRUE
>
> This actually will add a 7th level to your factor automatically.  The
> reason I wanted to change from using c() to data.frame() is that if
> one uses c(), all the columns are converted to character (this has to
> do with different methods for rbind, see ?rbind particularly the
> Details and Value section which describe the different methods for
> rbind and what its behavior will be if it is using the default
> method).  This may not be an issue, but it would hamper any subsequent
> calculations you may wish to perform on your data.
>

Thanks for your great help.


Mohan L

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixed Effects Model on Within-Subjects Design

2010-05-19 Thread David Atkins


Dave--

Given that you want all comparisons among all means in your design, you 
won't get that directly in a call to lme (or lmer in lme4 package). 
Take a look at multcomp package and its vignettes, where I think you'll 
find what you're looking for.


cheers, Dave

--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datk...@u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)   
1100 NE 45th Street, Suite 300  
Seattle, WA  98105  
206-616-3879
http://depts.washington.edu/cshrb/
(Mon-Wed)   

Center for Healthcare Improvement, for Addictions, Mental Illness,
  Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)

Dear R Experts,

I am attempting to run a mixed effects model on a within-subjects repeated
measures design, but I am unsure if I am doing it properly. I was hoping
that someone would be able to offer some guidance.

There are 5 independent variables (subject, condition, difficulty,
repetition) and 1 dependent measure (value). Condition and difficulty are
fixed effects and have 3 levels each (1,2,3 and 25,50,75 respectively),
while subject and repetition are random effects. Three repeated measurements
(repetitions) were taken for each condition x difficulty pair for each
subject, making this an entirely within-subject design.



I would like an output that compares the significance of the 3 levels of
difficulty for each condition, as well as the overall interaction of
condition*difficulty. The ideal output would look like this:

condition1:diff25 vs. condition1:diff50  p_value = 
condition1:diff25 vs. condition1:diff75  p_value = 
condition1:diff50 vs. condition1:diff75  p_value = 

condition2:diff25 vs. condition1:diff50  p_value = 
condition2:diff25 vs. condition1:diff75  p_value = 
condition2:diff50 vs. condition1:diff75  p_value = 

condition3:diff25 vs. condition1:diff50  p_value = 
condition3:diff25 vs. condition1:diff75  p_value = 
condition3:diff50 vs. condition1:diff75  p_value = 

condition*diff  p_value = 



Here is my code:

#get the data
study.data =read.csv("http://files.davidderiso.com/example_data.csv";,
header=T)
attach(study.data)
subject = factor(subject)
condition = factor(condition)
diff = factor(diff)
rep = factor(rep)

#visualize whats happening
interaction.plot(diff, condition, value, ylim=c(24,
45),ylab="value", xlab="difficulty", trace.label="condition")

#compute the significance
library(nlme)
study.lme = lme(value~condition*diff,random=~1|subject/rep)
summary(study.lme)



Thank you so much for your generous help!!!

Best,
Dave Deriso
UCSD Psychology

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x y plot with z coordinate scaling to a color value

2010-05-19 Thread Daisy Englert Duursma
Sorry, that should be
qplot(x,y,data=yourdataset, colour= z)

On Thu, May 20, 2010 at 3:07 PM, Daisy Englert Duursma
 wrote:
> the package ggplot2 makes it really really easy!
>
> qplot(x,y,data=yourdataset, colour= color)
>



-- 
Daisy Englert Duursma

Room E8C156
Dept. Biological Sciences
Macquarie University  NSW  2109
Australia

Tel +61 2 9850 9256



10A Carrington Rd
Hornsby, NSW 2077

Mobile: 0421858456

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x y plot with z coordinate scaling to a color value

2010-05-19 Thread Daisy Englert Duursma
the package ggplot2 makes it really really easy!

qplot(x,y,data=yourdataset, colour= color)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Joshua Wiley
Dear Mohan,

First, I would like to modify my code slightly to:

data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
na.rm=TRUE

This actually will add a 7th level to your factor automatically.  The
reason I wanted to change from using c() to data.frame() is that if
one uses c(), all the columns are converted to character (this has to
do with different methods for rbind, see ?rbind particularly the
Details and Value section which describe the different methods for
rbind and what its behavior will be if it is using the default
method).  This may not be an issue, but it would hamper any subsequent
calculations you may wish to perform on your data.

Aside from the way I just showed, you have many options.

(1) Read the data in and do not convert the state names to factor in
the first place (David's suggestion). This is most likely the easiest.
 The only reason I can see not wanting to use this is if there were
several other columns you wanted treated as factors, in which case
look at option #3.

See ?read.csv for details.  Note that: "'read.csv' and 'read.csv2' are
identical to 'read.table' except for the defaults." so it is perfectly
legal to use the stringsAsFactors argument shown for read.table.

data <- read.csv(file='ipsample.csv', header=TRUE,
stringsAsFactors=FALSE) #what your read in code might look like

Even if you do this, I would still recommend against my original code
because it converts everything else to character too.

(2) Add a level to the factor for total (as you showed).  I do not see
any real problem with this, but it is a bit of a complicated solution.
 My updated code actually leads to similar results.

(3) Supposing the data is already read in as a factor and you would
like to change it, you can convert just that column to character.
Once the column is character, you can name that cell Total, or
whatever else you like.

data$State <- as.character(data$State)

(4) Wu Gong suggested this option.  The difficulty with this option is
that because cbind is like rbind, the default without any dataframe
arguments will wind up with everything being converted to character,
just like my original code.

data <- rbind(data,cbind(State="Total", t(apply(data[,-1],2,sum

Depending exactly what you are working with, different solutions may
be better or wrose and these are certainly not the only ways you could
handle it.

Best regards,

Josh

On Wed, May 19, 2010 at 8:19 PM, Mohan L  wrote:
>
>>
>>
>> How to safely avoid this warning massage?
>> Now I have  instead of "Total" in last row State column. How to I
>> replace it as "Total"?
>>
> Dear All,
>
> The below link provides a very good explanation of  "Creating factor
> variables" and way to avoid the warning message
> http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm
>
> Now it works for me without warning message ( invalid factor level, NAs
> generated ) according to the document above , I did  below to avoid the
> warning:
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC 0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
>
>
>> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
> Warning message:
> In `[<-.factor`(`*tmp*`, ri, value = "Total") :
>   invalid factor level, NAs generated
>
>> a
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC 0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7   1390 1387 1440 1420 1306 957
>
> We can see that instead of "Total", the label was . To do this
> correctly, I have added the new level, "Total", to the factor column
> data$State using the factor function with the levels argument. Then I can
> finally add an element to the factor variable from the new level. here is
> the steps
>
>> levels(data$State)
> [1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"
>
>> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))
>
>> data$State
> [1] AAA  BBB  CCC  DDD  EEE  FFF
> Levels: AAA BBB CCC  DDD EEE FFF Total
>
>> levels(data$State)
> [1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"
>
>> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>
> Now the above works without warning.
>
>> x
>   State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC 0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
> 7 Total 1390 1387 1440 1420 1306 957
>
> I think I am doing right. If  I miss understood anything. Please guide me  I
> am beginer to R.
>
> Thanks & Rg
> Mohan L

Re: [R] multiple 2 by 2 crosstabulations?

2010-05-19 Thread David Winsemius


On May 19, 2010, at 9:10 PM, Biau David wrote:


Hello,

I have a dataframe (var_1, var_2, ..., var_n) and I would like to  
export summary statistics to Latex in the form of a table. I want  
specific summary statistics by crossing numerous variables 2x2 AT  
ONCE. In each cell I would like sometimes to have the median (Q1 -  
Q3), or frequency and proportion, etc. CrossTable, xtab, etc... do  
not allow for multiple 2 by 2 crosstabulation. The table would look  
like this:


 var_1  var_2 var3, ...
var_1   abc
var_2   def
var_3   .. ... ...

with a, b, c, ... the results of each crosstabulation. I have  
continuous and categorical variables.


Any idea?


The problem I face is you have used too many "..."'s, and "etc"'s and  
not enough R code. You probably have a clear idea what "summary  
statistics by crossing numerous variables 2x2 AT ONCE" means but I do  
not. It is possible that the describe.formula in Hmisc or the  
summaryBy function in the doBy package may be helpful, but that is  
merely a guess.




Thank you very much,

David.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] moving the x-axis in ggplot

2010-05-19 Thread Liam Blanckenberg
Dear all,

I was wondering if there is an easy way to set the position of the x-axis in
ggplot with respect to where it intercepts the y-axis? The default is for
the x-axis to be positioned at the bottom of the chart, but I would like to
force it to intercept at y = 0 (I have both positive and negative y values
across a range of x values). Clearly this can lead to issues with the x-axis
labels/ticks obstructing the plot, but for my present purpose this is fine.

Many thanks,

Liam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread David Winsemius


On May 19, 2010, at 11:19 PM, Mohan L wrote:




How to safely avoid this warning massage?
Now I have  instead of "Total" in last row State column. How to I
replace it as "Total"?

Dear All,


The below link provides a very good explanation of  "Creating factor
variables" and way to avoid the warning message
http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm

Now it works for me without warning message ( invalid factor level,  
NAs
generated ) according to the document above , I did  below to avoid  
the

warning:


data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
data

 State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86



a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))

Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
 invalid factor level, NAs generated


a

 State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7   1390 1387 1440 1420 1306 957

We can see that instead of "Total", the label was . To do this
correctly, I have added the new level, "Total", to the factor column
data$State using the factor function with the levels argument. Then  
I can
finally add an element to the factor variable from the new level.  
here is

the steps


levels(data$State)

[1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"


data$State <- factor(data$State,levels=c(levels(data$State),"Total"))



data$State

[1] AAA  BBB  CCC  DDD  EEE  FFF
Levels: AAA BBB CCC  DDD EEE FFF Total


levels(data$State)

[1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"


x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))


Now the above works without warning.


x

 State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7 Total 1390 1387 1440 1420 1306 957

I think I am doing right. If  I miss understood anything. Please  
guide me  I

am beginer to R.


If you had instead used stringsAsFactors=FALSE with the read.table  
function, the "State" column would have been character rather than  
factor, and you would have avoided all those difficulties.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample and rearrange

2010-05-19 Thread Wu Gong

Thank David and Jim.
I got it.

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sample-and-rearrange-tp747p2223872.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored venn diagram

2010-05-19 Thread Peter Ehlers

Tao,

You might also have a look at the venneuler package.

 -Peter Ehlers

On 2010-05-19 21:37, David Winsemius wrote:


On May 19, 2010, at 7:15 PM, Shi, Tao wrote:


Hi list,

This is probably too much to ask, but I'm wondering if there is a
ready-to-use function somewhere that allows me to color one area of a
venn diagram (e.g. the intersection of two sets)?



There is an intersectDiagrapm in plotrix that accepts color arguments,
although it is not really a Venn diagram.

The venn function in gplots does not accept color. It is a wrapper to
getVennCounts and drawVennDiagram, and if you hack the second function
you can get colors. For the example in the help page for Venn, you need
to find the section that handles the correct number of sets and add a
color argument to the polygon call. If you want intersections you will
need transparency. I do not know how to address specific subsets:

require(gplots)
require(seqinr) # for col2alpha for transparency

getAnywhere(drawVennDiagram) # add an assignment operator and trim out
this stuff
A single object matching ‘drawVennDiagram’ was found
It was found in the following places
namespace:gplots
with value
--end trimming--
function (data, small = 0.7, showSetLogicLabel = FALSE, simplify = FALSE)
{
numCircles <- NA
.
.
# make mod to n==4 section
+ polygon(relocate_elp(elps, 45, 130, 170), col=col2alpha("red", 0.5) )
+ polygon(relocate_elp(elps, 45, 200, 200), col=col2alpha("blue", 0.5) )
+ polygon(relocate_elp(elps, 135, 200, 200))
+ polygon(relocate_elp(elps, 135, 270, 170))
.
.
... And then pass the result from venn to drawVennDiagram:

gv <- venn(input)
drawVennDiagram(data = gv, small = 0.7, showSetLogicLabel = FALSE,
simplify = FALSE)

Set #1 in pink set2 in blue and the intersection is purple.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About the breakpoint when making heatmap with lots of variables

2010-05-19 Thread Meng Wu
HI,All

I am trying to create a heatmap with 24 samples with 15672 varibles, I read
in the table in R, and then made it as a matrix, then try to create the
heatmap using heatmap(x,...)

However, I received the error message as:
> heatmap(t(x))
Error: cannot allocate vector of size 936.8 Mb
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2925,0xa0b16500) malloc: *** mmap(size=982261760) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug


It seems like I do not have enough memory, how can I set a breakpoint as
suggested?

Thanks,

Meng

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Wu Gong

To remove NA instead of Total:

rbind(data,cbind(State="Total", t(apply(data[,-1],2,sum




-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Re-Adding-column-sum-to-new-row-in-data-frame-tp2223277p2223855.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple 2 by 2 crosstabulations?

2010-05-19 Thread Biau David
Hello,

I have a dataframe (var_1, var_2, ..., var_n) and I would like to export 
summary statistics to Latex in the form of a table. I want specific summary 
statistics by crossing numerous variables 2x2 AT ONCE. In each cell I would 
like sometimes to have the median (Q1 - Q3), or frequency and proportion, etc. 
CrossTable, xtab, etc... do not allow for multiple 2 by 2 crosstabulation. The 
table would look like this:

     var_1  var_2 var3, ...
var_1   a        b            c 
var_2   d        e        f
var_3   .. ...     ...

with a, b, c, ... the results of each crosstabulation. I have continuous and 
categorical variables.

Any idea?

Thank you very much,

David.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Res: Using the zero-inflated binomial in experimental designs

2010-05-19 Thread Ivan Allaman

Hi Ben!

Following his recommendations I did the following:
1st step:
I compared the best model for binomial and binomial inflates.
1.1 Best model for Binomial.

dg$resp.mumi <- cbind(dg$MUMI,dg$NT - dg$MUMI)
dg
names(dg)
mod.mumi.binomial  <- glm(resp.mumi ~ factor(PARTO)*REG, family=binomial,
data = dg)
summary(mod.mumi.binomial)
mod.mumi.binomial1 <- glm(resp.mumi ~ factor(PARTO) + REG, family=binomial,
data = dg)
summary(mod.mumi.binomial1)
mod.mumi.binomial2 <- glm(resp.mumi ~ REG, family=binomial, data = dg)
summary(mod.mumi.binomial2)
pchisq(-2*(logLik(mod.mumi.binomial1)-logLik(mod.mumi.binomial)),lower.tail=FALSE,
df= 15) 
[1] 0.1354171
pchisq(-2*(logLik(mod.mumi.binomial2)-logLik(mod.mumi.binomial1)),lower.tail=FALSE,
df= 5) 
[1] 0.06030012

The 5% significance level, we can choose the most parsimonious model, ie the
model "mod.mumi.binomial2".

1.2 Best model for binomial inflates
library(VGAM)
mod.mumi.binomialinflacionada <- vglm(resp.mumi ~
factor(PARTO)*REG,zibinomial, data = dg)
summary(mod.mumi.binomialinflacionada)
mod.mumi.binomialinflacionada1 <- vglm(resp.mumi ~
factor(PARTO)+REG,zibinomial, data = dg)
summary(mod.mumi.binomialinflacionada1)
mod.mumi.binomialinflacionada2 <- vglm(resp.mumi ~ REG,zibinomial, data =
dg)
summary(mod.mumi.binomialinflacionada2)
pchisq(-2*logLik(mod.mumi.binomialinflacionada1)-logLik(mod.mumi.binomialinflacionada)),lower.tail=FALSE,
 
df= 15) 
[1] 0.1477837
pchisq(-2*logLik(mod.mumi.binomialinflacionada2)-logLik(mod.mumi.binomialinflacionada1)),lower.tail=FALSE,
 
df= 5) 
[1] 0.0989934

The 5% significance level, we can choose the most parsimonious model, ie the
model "mod.mumi.binomialinflacionada2".

2st step:
Compare the best model of the binomial model with the best of inflated
binomial.
pchisq(-2*(logLik(mod.mumi.binomial2)-logLik(mod.mumi.binomialinflacionada2)),lower.tail=FALSE,
df= 1) 
[1] 0.1929690

There wasn't difference between the models. Must i choose the most
parsimonious model?

Thanks for your attention and sorry for the inconvenience.





-- 
View this message in context: 
http://r.789695.n4.nabble.com/Using-the-zero-inflated-binomial-in-experimental-designs-tp2221254p2223819.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored venn diagram

2010-05-19 Thread David Winsemius


On May 19, 2010, at 7:15 PM, Shi, Tao wrote:


Hi list,

This is probably too much to ask, but I'm wondering if there is a  
ready-to-use function somewhere that allows me to color one area of  
a venn diagram (e.g. the intersection of two sets)?




There is an intersectDiagrapm in plotrix that accepts color arguments,  
although it is not really a Venn diagram.


The venn function in gplots does not accept color. It is a wrapper to  
getVennCounts and drawVennDiagram, and if you hack the second function  
you can get colors. For the example in the help page for Venn, you  
need to find the section that handles the correct number of sets and  
add a color argument to the polygon call. If you want intersections  
you will need transparency. I do not know how to address specific  
subsets:


require(gplots)
require(seqinr)  # for col2alpha for transparency

getAnywhere(drawVennDiagram)   # add an assignment operator and trim  
out this stuff

A single object matching ‘drawVennDiagram’ was found
It was found in the following places
  namespace:gplots
with value
--end trimming--
function (data, small = 0.7, showSetLogicLabel = FALSE, simplify =  
FALSE)

{
numCircles <- NA
.
.
# make mod to n==4 section
 + polygon(relocate_elp(elps, 45, 130, 170),  
col=col2alpha("red", 0.5) )
+ polygon(relocate_elp(elps, 45, 200, 200),  
col=col2alpha("blue", 0.5) )

+ polygon(relocate_elp(elps, 135, 200, 200))
+ polygon(relocate_elp(elps, 135, 270, 170))
.
.
... And then pass the result from venn to drawVennDiagram:

gv <- venn(input)
 drawVennDiagram(data = gv, small = 0.7, showSetLogicLabel = FALSE,
simplify = FALSE)

Set #1 in pink set2 in blue and the intersection is purple.

--
David


Thanks!

...Tao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Mohan L
>
>
> How to safely avoid this warning massage?
> Now I have  instead of "Total" in last row State column. How to I
> replace it as "Total"?
>
> Dear All,

The below link provides a very good explanation of  "Creating factor
variables" and way to avoid the warning message
http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm

Now it works for me without warning message ( invalid factor level, NAs
generated ) according to the document above , I did  below to avoid the
warning:

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
> data
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86


> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

> a
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7   1390 1387 1440 1420 1306 957

We can see that instead of "Total", the label was . To do this
correctly, I have added the new level, "Total", to the factor column
data$State using the factor function with the levels argument. Then I can
finally add an element to the factor variable from the new level. here is
the steps

> levels(data$State)
[1] "AAA"  "BBB"  "CCC " "DDD"  "EEE"  "FFF"

> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))

> data$State
[1] AAA  BBB  CCC  DDD  EEE  FFF
Levels: AAA BBB CCC  DDD EEE FFF Total

> levels(data$State)
[1] "AAA"   "BBB"   "CCC "  "DDD"   "EEE"   "FFF"   "Total"

> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))

Now the above works without warning.

> x
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7 Total 1390 1387 1440 1420 1306 957

I think I am doing right. If  I miss understood anything. Please guide me  I
am beginer to R.

Thanks & Rg
Mohan L

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mixed Effects Model on Within-Subjects Design

2010-05-19 Thread Dave Deriso
Dear R Experts,

I am attempting to run a mixed effects model on a within-subjects repeated
measures design, but I am unsure if I am doing it properly. I was hoping
that someone would be able to offer some guidance.

There are 5 independent variables (subject, condition, difficulty,
repetition) and 1 dependent measure (value). Condition and difficulty are
fixed effects and have 3 levels each (1,2,3 and 25,50,75 respectively),
while subject and repetition are random effects. Three repeated measurements
(repetitions) were taken for each condition x difficulty pair for each
subject, making this an entirely within-subject design.



I would like an output that compares the significance of the 3 levels of
difficulty for each condition, as well as the overall interaction of
condition*difficulty. The ideal output would look like this:

condition1:diff25 vs. condition1:diff50  p_value = 
condition1:diff25 vs. condition1:diff75  p_value = 
condition1:diff50 vs. condition1:diff75  p_value = 

condition2:diff25 vs. condition1:diff50  p_value = 
condition2:diff25 vs. condition1:diff75  p_value = 
condition2:diff50 vs. condition1:diff75  p_value = 

condition3:diff25 vs. condition1:diff50  p_value = 
condition3:diff25 vs. condition1:diff75  p_value = 
condition3:diff50 vs. condition1:diff75  p_value = 

condition*diff  p_value = 



Here is my code:

#get the data
study.data =read.csv("http://files.davidderiso.com/example_data.csv";,
header=T)
attach(study.data)
subject = factor(subject)
condition = factor(condition)
diff = factor(diff)
rep = factor(rep)

#visualize whats happening
interaction.plot(diff, condition, value, ylim=c(24,
45),ylab="value", xlab="difficulty", trace.label="condition")

#compute the significance
library(nlme)
study.lme = lme(value~condition*diff,random=~1|subject/rep)
summary(study.lme)



Thank you so much for your generous help!!!

Best,
Dave Deriso
UCSD Psychology

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x y plot with z coordinate scaling to a color value

2010-05-19 Thread Felix Andrews
And made slightly easier with panel.levelplot.points from latticeExtra:
http://latticeextra.r-forge.r-project.org/#panel.levelplot.points


On 19 May 2010 22:24, baptiste auguie  wrote:
> Hi,
>
> See also ?lattice::xyplot and ?ggplot2::geom_point , either one can do
> it automatically.
>
> HTH,
>
> baptiste
> On 19 May 2010 12:24, Jannis  wrote:
>> Dears,
>>
>> before I start programming my own function I would like to ask you whether 
>> there is any function already available that lets me plot a x/y plot with 
>> the colors of the points determined by scaling a third variable z into a 
>> defined colorramp?
>>
>> Until now I am using a wild combination of functions to create a colorvector 
>> by hand, then calling
>>
>> plot(x,y,col=colors)
>>
>> and then using color.legend() from plotrix to add a colorbar. Is there 
>> anything where I can just call (for example)
>>
>> plot(x,y,z)
>>
>>
>> and get a plot and a colorbar next to it?
>>
>> If not I have to program that myself because my current way involves a lot 
>> of steps and readjusting plotting regions.
>>
>>
>> Thanks for your help!
>>
>> Jannis
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Felix Andrews / 安福立
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Mohan L
Hi Joshua,
>
>
>
> rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>

Yes. This what exactly I want. Thanks for your time.


>
> If your State column is a factor, it will return a warning that NAs
> were introduced (but the totals will still be at the bottom).
>

Yes.

> is.factor(data$State)
[1] TRUE

 State column is a factor, it will return warning message like this .
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

I know that this is common factor vs character problem in R. I think there
may be a safe way to handle this:

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)

> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "Total") :
  invalid factor level, NAs generated

> a
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86
7   1390 1387 1440 1420 1306 957

How to safely avoid this warning massage?
Now I have  instead of "Total" in last row State column. How to I
replace it as "Total"?

Thanks & Rg
Mohan L

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex help: splitting strings with no separator

2010-05-19 Thread Gabor Grothendieck
One way is to use strapply in the gsubfn package.  It is like apply in
that the first argument is the object (in both cases), the second is
the modifier (the margin in the case of apply and the regular
expression in the case of strapply) and a function (in both cases).
The parenthesized expressions in the regular expression are captured
and passed to the function.  Here \\D+ is a string of non-digits and
\\d+ is a string of digits.  See http://gsubfn.googlecode.com home
page, the vignette and the help for more info.

> library(gsubfn)
> strapply(x, "(\\D+)(\\d+)", c, simplify = rbind)
 [,1][,2]
[1,] "Apple" "12"
[2,] "HP""42"
[3,] "Dell"  "91"


On Wed, May 19, 2010 at 8:15 PM, Krishna Tateneni  wrote:
> Greetings,
>
> I have a vector of values that are a word followed by a number, e.g., x =
> c("Apple12","HP42","Dell91").  The goal is to split this vector into two
> vectors such that the first vector contains just the words and the second
> contains just the numbers.  I cannot use strsplit (or at least I do not know
> how) as there is no obvious separator.
>
> I can use sub to create a separator, e.g., y = sub("([[:digit:]])","
> \\1",x), and then use strsplit, but I thought more experienced R users may
> have a better solution.  I've spent some time with Google, but not turned up
> anything so far.
>
> Many thanks,
> --Krishna
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample and rearrange

2010-05-19 Thread jim holtman
you just need the function name; the parameter is being supplied by the lapply:

 t(apply(x, 1, rearrange))

On Wed, May 19, 2010 at 7:47 PM, Wu Gong  wrote:
>
> I tried to use a separate function to make the code more understandable. But
> I failed. I don't know what's wrong with the code.
>
> x <- as.matrix(x)
>
> rearrange <- function(.row){
>        z <- do.call(rbind, strsplit(.row[-1], ''))
>        z.col <- t(apply(z, 2, paste, collapse=''))
>        cbind(.row[1], z.col)
>        }
>
> t(apply(x, 1, rearrange(.row)))
>
> Error in strsplit(.row[-1], "") : object '.row' not found
>
> I don't know how to pass the value to the function.
>
> -
> A R learner.
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/sample-and-rearrange-tp747p2223767.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample and rearrange

2010-05-19 Thread David Winsemius


On May 19, 2010, at 7:47 PM, Wu Gong wrote:



I tried to use a separate function to make the code more  
understandable. But

I failed. I don't know what's wrong with the code.

x <- as.matrix(x)

rearrange <- function(.row){
z <- do.call(rbind, strsplit(.row[-1], ''))
z.col <- t(apply(z, 2, paste, collapse=''))
cbind(.row[1], z.col)
}

t(apply(x, 1, rearrange(.row)))

Error in strsplit(.row[-1], "") : object '.row' not found


The error occurs because apply is sending a single row at a time, but  
it is not named .row. Your code _does_ work, but only if you use it  
thusly:


t(apply(x, 1, rearrange))




I don't know how to pass the value to the function.


You may not, ... but R knows how.


-
A R learner.
--

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample and rearrange

2010-05-19 Thread Wu Gong

I tried to use a separate function to make the code more understandable. But
I failed. I don't know what's wrong with the code.

x <- as.matrix(x)

rearrange <- function(.row){
z <- do.call(rbind, strsplit(.row[-1], ''))
z.col <- t(apply(z, 2, paste, collapse=''))
cbind(.row[1], z.col)
}

t(apply(x, 1, rearrange(.row)))

Error in strsplit(.row[-1], "") : object '.row' not found

I don't know how to pass the value to the function.

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sample-and-rearrange-tp747p2223767.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex help: splitting strings with no separator

2010-05-19 Thread Henrique Dallazuanna
Try this:

read.table(textConnection(gsub("([[:alpha:]])(\\d.*)", "\\1;\\2", x)), sep =
";")

or

do.call(rbind, strsplit(gsub("([[:alpha:]])(\\d.*)", "\\1;\\2", x), ";"))

On Wed, May 19, 2010 at 9:15 PM, Krishna Tateneni wrote:

> Greetings,
>
> I have a vector of values that are a word followed by a number, e.g., x =
> c("Apple12","HP42","Dell91").  The goal is to split this vector into two
> vectors such that the first vector contains just the words and the second
> contains just the numbers.  I cannot use strsplit (or at least I do not
> know
> how) as there is no obvious separator.
>
> I can use sub to create a separator, e.g., y = sub("([[:digit:]])","
> \\1",x), and then use strsplit, but I thought more experienced R users may
> have a better solution.  I've spent some time with Google, but not turned
> up
> anything so far.
>
> Many thanks,
> --Krishna
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex help: splitting strings with no separator

2010-05-19 Thread Jorge Ivan Velez
Hi Krishna,

Here is a suggestion:

> x <- c("Apple12","HP42","Dell91")
> foo <- function(x) data.frame(brand = gsub("[0-9]", "", x), number =
gsub("[^0-9]", "", x))
> foo(x)
# brand number
# 1 Apple 12
# 2HP 42
# 3  Dell 91

HTH,
Jorge


On Wed, May 19, 2010 at 8:15 PM, Krishna Tateneni <> wrote:

> Greetings,
>
> I have a vector of values that are a word followed by a number, e.g., x =
> c("Apple12","HP42","Dell91").  The goal is to split this vector into two
> vectors such that the first vector contains just the words and the second
> contains just the numbers.  I cannot use strsplit (or at least I do not
> know
> how) as there is no obvious separator.
>
> I can use sub to create a separator, e.g., y = sub("([[:digit:]])","
> \\1",x), and then use strsplit, but I thought more experienced R users may
> have a better solution.  I've spent some time with Google, but not turned
> up
> anything so far.
>
> Many thanks,
> --Krishna
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regex help: splitting strings with no separator

2010-05-19 Thread Krishna Tateneni
Greetings,

I have a vector of values that are a word followed by a number, e.g., x =
c("Apple12","HP42","Dell91").  The goal is to split this vector into two
vectors such that the first vector contains just the words and the second
contains just the numbers.  I cannot use strsplit (or at least I do not know
how) as there is no obvious separator.

I can use sub to create a separator, e.g., y = sub("([[:digit:]])","
\\1",x), and then use strsplit, but I thought more experienced R users may
have a better solution.  I've spent some time with Google, but not turned up
anything so far.

Many thanks,
--Krishna

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiclass SVM

2010-05-19 Thread Xiyan Lon
I am learning classification using SVM for research (survey).
The data that I have had some class. I know that the library (e1071)
can only handle
multiclass with one-against-one-method. I want to know are there any
other functions
or library which can handle multiclass methods like:
- multiclass SVM in a one-vs-all
- multiclass SVM in Error-correcting output coding (ECOC)

Thanks
Xiyan Lon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in untar2(tarfile, files, list , exdir) : unsupported entry type ‘x’

2010-05-19 Thread Christopher Bare
Hi,

I'm starting to think this is a Mac OS X Snow Leopard specific issue.
The GNU docs on the tar format say the type flag 'x' means this:

#define XHDTYPE  'x'/* Extended header referring to the
   next file in the archive */

Maybe, OSX is putting some bad juju in my files causing them to have
these extended headers?? Aaarg!

-chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] colored venn diagram

2010-05-19 Thread Shi, Tao
Hi list,

This is probably too much to ask, but I'm wondering if there is a ready-to-use 
function somewhere that allows me to color one area of a venn diagram (e.g. the 
intersection of two sets)?

Thanks!

...Tao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Nomogram with multiple interactions (package rms)

2010-05-19 Thread Frank E Harrell Jr

On 05/19/2010 04:36 PM, Marc Carpentier wrote:

I'm sorry. I don't understand the "omit" solution, and maybe I mislead you with 
my explanation.

With the data from the "f" exemple of nomogram() :
Let's declare :
f2<- cph(Surv(d.time,death) ~ sex*(age+blood.pressure))
I guess the best (and maybe the only) way to represent it with a nomogram is to 
plot two nomograms (I couldn't find better).
Is there a way to have :

Nomogram1 : "male" :
- points 1-100 ---
- age ("men") ---
- blood.pressure ("men") ---
- linear predictor ---

And nomogram2 : "female" :
- points 1-100 ---
- age ("female") ---
- blood.pressure ("female") ---
- linear predictor ---

As I said I tried and failed (nomogram() still wants me to define
interact=list(...)) with :
plot(nomorgam(f2, adj.to=list(sex="male")) #and "female" for the other one

Marc


I think the documentation tells you how to do this.  But you failed to 
look at the output from example(nomogram).  In one of the examples two 
continuous predictors have two axes each, with male and female in close 
proximity.  Or maybe I'm just missing your point.


Frank





- Message d'origine 
De : Frank E Harrell Jr
À : Marc Carpentier; r-help-request Mailing 
List
Envoyé le : Mer 19 mai 2010, 22h 28min 51s
Objet : Re: [R] Nomogram with multiple interactions (package rms)

On 05/19/2010 03:17 PM, Marc Carpentier wrote:

Dear list, I'm facing the following problem : A cox model with my sex
variable interacting with several continuous variables :
cph(S~sex*(x1+x2+x3)) And I'd like to make a nomogram. I know it's a
bit tricky and one mights argue that nomogram is not a good a
choice... I could use the parameter
interact=list(sex=("male","female"),x1=c(a,b,c))... but with rcs or
pol transformations of x1, x2 and x3, the choice of the
categorization (a,b,c,...) is arbitrary and the nomogram not so
useful... Considering that sex is the problem, I thought I could draw
two nomograms, one for each sex... based on one model. These would be
great. Do you think it's possible ?


Yes, you can specify constant predictors not to draw with the omit=
argument.  But try first to draw everything.  Shouldn't you just get 2
axes each for x1 x2 x3?



Taking the exemple of the help of nomogram() (package "rms") : f<-
psm(Surv(d.time,death) ~ sex*age, dist=if(.R.)'lognormal' else
'gaussian')


Drop the if(.R.) which was just corrected in the documentation.  Use
dist='lognormal'

Frank



Let's add the previously defined blood.pressure effect with an
interaction with sex too (with cph) : f2<- cph(Surv(d.time,death) ~
sex*(age+blood.pressure))

I thought of the parameter adt.to : plot(nomorgam(f2,
adj.to=list(sex="male")) #and "female" for the other one

But nomogram() still wants me to define interact=list(...) Thanks for
any advice you might have (with adj.to or any alternative...)

Marc Carpentier







--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading in and writing out one line at a time

2010-05-19 Thread sedm1000

I hope that somebody can help me with this - I think very simple - issue...?

I am running a package that only accepts one line at a time, but I would
like to run this package on a dataframe of >500 lines. 

Dataframe "d" is a single column:

APPLES
PEARS
AUBERGINES
KUMQUATS

I would like to read one line of my dataframe "d", individually into a new
frame "f", then execute the program on "f" and it provides an output:

>mash(f)

[[1]]
[1]  FREDBLOGGS

[2]  250

I would like to record this output to a two column dataframe, "r", such as:

FREDBLOGGS 250

and then repeat the process on the next line of dataframe "d", and so on to
the end of dataframe "d",writing each line into "r" so that the dataframe
"r" eventually reads:

FREDBLOGGS 250
JAMESJONES  175
TERRYTAITE   892
HARRYSMITH 320


I'm afraid that I'm new to this, but think that this first step in will be
very useful in general. Thank you kindly for any help.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-and-writing-out-one-line-at-a-time-tp2223726p2223726.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p-values < 2.2e-16 not reported

2010-05-19 Thread Will Eagle

Dear all,

thanks for your feedback so far. With the help of a colleague I think I 
found the solution to my problem:


> pt(10,100,lower=FALSE)
[1] 4.950844e-17

IS *NOT* EQUAL TO

> 1-pt(10,100,lower=TRUE)
[1] 0

This means that R is capable of providing p-values < 2.2e-16, however, 
if the value is used in a substraction or addition then the default 
value of the machine epsilon .Machine$double.eps =  2.220446e-16 is 
applied. This causes that all p-values smaller than this threshold are 
set to zero. This problem applies also to other distribution functions 
like pnorm() and others. For your information I would also like to quote 
the relevant part of the R manual on .Machine$double.eps:
"the smallest positive floating-point number x such that 1 + x != 1. It 
equals base^ulp.digits if either base is 2 or rounding is 0; otherwise, 
it is (base^ulp.digits)/ 2.  Normally 2.220446e-16."


Although different opinions were expressed on whether it makes sense to 
differentiate p-values below the machine epsilon, in my opinion 
different effect sizes should correspond with different p-values when 
reporting statistical results. Additionally, in certain scientific 
fields, eg genetics, where usually many tests are performed and simple 
methods, eg Bonferroni method, are used to adjust for multiple testing, 
it is important to know the exact size of the p-value.


Therefore, I would like to suggest that operations of the 2nd variant 
(ie 1-pt(10,100,lower=TRUE)) should be deprecated to calculate p-values 
and operations of the 1st variant (ie pt(10,100,lower=FALSE)) should be 
used  instead. Since I have seen the 2nd variant being frequently used 
(also by very experienced R users) and I assume that it is hidden in 
many statistical test functions, eg cor.test(), this issue seems to me 
quite important.


Best regards,

Will

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] printing a dataframe by categories

2010-05-19 Thread Henrique Dallazuanna
Try this:

by(x, x$Form, function(.x)rbind(.x[-3], sum(.x[2])))

On Wed, May 19, 2010 at 6:49 PM, Gerrit Draisma wrote:

> I am looking for the following simple question.
> I have a data frame with names and numbers, divided in categories.
> I would like to produce a text file with page breaks,
> listing the names and numbers by category,
> and totalling the numbers.
>
> Example:
>
>  Name<-LETTERS[1:6]
>  Score<-rep(5:8,length.out=6)
>  Form<-rep(1:2,each=3)
>  x<-data.frame(Name,Score,Form)
>
> This gives approximately what I need
>  by(x,x$Form,function(x){x[,1:2]})
> but I would like a total added to the list.
>
> Gerrit.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample and rearrange

2010-05-19 Thread David Winsemius


On May 19, 2010, at 5:01 PM, Wu Gong wrote:



It took me a day to make the sense of Jim's code :(

Hope my comments will help.

## Transform data to matrix
x <- as.matrix(x)

## Apply function to each row
## Create a function to rearrange bases
result <- apply(x, 1, function(eachrow){

## Split each gene to bases
## Exclude the fist column which is id
bases <- strsplit(eachrow[-1], '')

## Transform list to matrix
## Because the result of function strsplit is a list
bases <- do.call(rbind,bases)

## Recombine bases by connecting all bases in each column
recombine <- apply(bases, 2, paste, collapse="")

## Add id
## Transpos recombine
cbind(eachrow[1], t(recombine))
})

## Transpose the result matrix  
result <- t(result)


It will come more quickly as you learn more. I also looked at Jimm's  
solution by pulling it apart, although I did not spend a whole day at  
it, maybe ten minutes. I thought a three line version was more  
informative, because it did not make everything scroll of the console:


> x <- read.table(textConnection("SampleIDA1  A2   
A3  A4

+  GM920222GATTGCC GATTGCC GATAGAC GATAGAC
+  GM930040GTCATCA GAGTGCA ACTATAA GATTGCC
+  GM930040GTCATCA GAGTGCA ACTATAA GATTGCC"), header=TRUE,  
as.is=TRUE)

> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+  # separate characters
+  z <- do.call(rbind, strsplit(.row[-1], ''))
+  # combine each column
+  z.col <- t(apply(z, 2, paste, collapse=''))
+  # add the ID
+  cbind(.row[1], z.col)
+  }))
 [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "" "" "" "TTAA" "" "CCAA" ""
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

# I usually see if I can get the inner-most function to work:

> z <- do.call(rbind, strsplit(x[1,], ''))
Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg  
2)

> z
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
SampleID "G"  "M"  "9"  "2"  "0"  "2"  "2"  "2"

#So I guess I didn't get an exact replica since Jim had excluded the  
first element in the row


A1   "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A2   "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A3   "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
A4   "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
> z <- do.call(rbind, strsplit(x[1,-1], ''))  # there ... cleaner
> z
   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
A1 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A2 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A3 "G"  "A"  "T"  "A"  "G"  "A"  "C"
A4 "G"  "A"  "T"  "A"  "G"  "A"  "C"

That seemed to help understand what was going on in the middle of the  
functions. Now I wondered if the transpose could be avoided. So I  
tried cbind instead of rbind:


> z <- do.call(cbind, strsplit(x[1,-1], ''))
> z
 A1  A2  A3  A4
[1,] "G" "G" "G" "G"
[2,] "A" "A" "A" "A"
[3,] "T" "T" "T" "T"
[4,] "T" "T" "A" "A"
[5,] "G" "G" "G" "G"
[6,] "C" "C" "A" "A"
[7,] "C" "C" "C" "C"
> z.col <- apply(z, 2, paste, collapse='')
> z.col
   A1A2A3A4
"GATTGCC" "GATTGCC" "GATAGAC" "GATAGAC"

## Nope that does not work:
## So try apply on the columns ...
> z.col <- apply(z, 1, paste, collapse='')
> z.col
[1] "" "" "" "TTAA" "" "CCAA" ""

## OK that worked. Now see if it works inside the whole sequence:

> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+  # separate characters
+  z <- do.call(cbind, strsplit(.row[-1], ''))
+  # combine each column
+  z.col <- apply(z, 1, paste, collapse='')
+  # add the ID
+  cbind(.row[1], z.col)
+  }))
 [,1]   [,2]   [,3]   [,4]   [,5]   [, 
6]   [,7]
[1,] "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222"  
"GM920222"
[2,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"  
"GM930040"
[3,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"  
"GM930040"


Well not exactly.
 [,8]   [,9]   [,10]  [,11]  [,12]  [,13]  [,14]
[1,] "" "" "" "TTAA" "" "CCAA" ""
[2,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+  # separate characters
+  z <- do.call(cbind, strsplit(.row[-1], ''))
+  # combine each column
+  z.col <- apply(z, 1, paste, collapse='')
+  # add the ID
## and add the transpose columns:
+  cbind(.row[1], t(z.col))
+  }))
 [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "" "" "" "TTAA" "" "CCAA" ""
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

So I got to the same place but didn't really achieve any savings.



-
A R learner.



David "als

[R] Using svychisq inside user-defined function

2010-05-19 Thread Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Hi R-help,

Yes, this is my second request for assistance in a single day

I am attempting to use svychisq() inside a function  I made.  The goal
of this function is to produce a table of summary statistics that I can
later output to EXCEL (simple frequencies and sample sizes from regular
crosstabulation on dataset "data" but the chi-square using survey
methods on "audit").

Here's my code (I can't supply data for you as I am not that
sophisticated and the real data is not cleared for public consumption -
I really apologize):


# create my svydesign object

audit <- svydesign(id~id, strata=~field, weights=~wt, data=data,
fpc=~AllocProportion)

# my function to create my table

mkMyCrossTable <- function(X, svyX, T) {

tbl <- crosstab(X, data$SEX, prop.c=TRUE)
tbl <- data.frame(cbind(tbl$t, tbl$prop.col))
tbl$var <- rownames(tbl)

chisq <- svychisq(~svyX + SEX, design=audit, statistic="adjWald",
round=4)
chisq <- data.frame(do.call("cbind", chisq)
chisq <- data.frame(chisq[,3])

Table <- data.frame(tbl$var,
paste(formatC(tbl$X0.1*100, format="f", digits=1),
"%", sep=""), 
  tbl$X0,
  paste(formatC(tbl$X1.1*100, format="f",
digits=1), "%", sep=""),
  tbl$X1,
  chisq[1])
Table[2: length(Table[,1]), 6] <- NA
Table <- NAToUnkown(Table, unknown = " ")
Colnames(Table) <- c(T, "Male (%)", "Male (n)", "Female (%)", "Female
(n)", "p-value")
Table

}

con3 <- mkMyCrossTable(data$con, con, "Constituency")





The error occurs with the "chisq <- svychisq(~X+SEX, design=audit,
statistic="adjWald", round=4)" part of my function.  I did debug() to
double check.

I get the error:  "Error in '[.data.frame'(design$variables, ,
as.character(rows)) :
 Undefined columns selected"

My suspicion is that it doesn't like me referencing the variables in
"audit", but I don't know how to fix it.


Thanks,

Jen

PS.  I know my table-making function is terribly inelegant...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] printing a dataframe by categories

2010-05-19 Thread Gerrit Draisma

I am looking for the following simple question.
I have a data frame with names and numbers, divided in categories.
I would like to produce a text file with page breaks,
listing the names and numbers by category,
and totalling the numbers.

Example:

  Name<-LETTERS[1:6]
  Score<-rep(5:8,length.out=6)
  Form<-rep(1:2,each=3)
  x<-data.frame(Name,Score,Form)

This gives approximately what I need
  by(x,x$Form,function(x){x[,1:2]})
but I would like a total added to the list.

Gerrit.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating all possible models from full model

2010-05-19 Thread Tim Clark
Thanks to everyone for the replies and functions.  For some reason my emails 
are not going to the thread.  I just switched to the newer mail version for my 
yahoo account, so hopefully this one will get posted.

Aloha,

Tim

 Tim Clark
Department of Zoology 
University of Hawaii 



- Original Message 
From: Xiaogang Su 
To: Tim Clark 
Cc: r-help@r-project.org
Sent: Wed, May 19, 2010 9:42:40 AM
Subject: Re: [R] Generating all possible models from full model

A couple of years ago, I wrote a function for doing this. It can be
found in the following file:
http://pegasus.cc.ucf.edu/~xsu/CLASS/STA4164/mt4-2009.doc
I also pasted a copy below. Hope you find it useful.  -XG

# 
# ALL POSSIBLE REGRESSIONS
# 
# The parameter k is the total number of predictors. To apply the function,
# you need to prepare the data so that all predictors for selection are named
# as x1, x2, ..., and the response is called y.

all.possible.regressions <- function(dat, k){
    n <- nrow(dat)
    regressors <- paste("x", 1:k, sep="")
    lst <- rep(list(c(T, F)), k)
    regMat <- expand.grid(lst);
    names(regMat) <- regressors
    formular <- apply(regMat, 1, function(x)
            as.character(paste(c("y ~ 1", regressors[x]), collapse="+")))
    allModelsList <- apply(regMat, 1, function(x)
            as.formula(paste(c("y ~ 1", regressors[x]),collapse=" + ")) )
    allModelsResults <- lapply(allModelsList,
            function(x, data) lm(x, data=data), data=dat)
    n.models <- length(allModelsResults)
    extract <- function(fit) {
        df.sse <- fit$df.residual
        p <- n - df.sse -1
        sigma <- summary(fit)$sigma
        MSE <- sigma^2
        R2 <- summary(fit)$r.squared
        R2.adj <- summary(fit)$adj.r.squared
        sse <- MSE*df.sse
        aic <- n*log(sse) + 2*(p+2)
        bic <- n*log(sse) + log(n)*(p+2)
        out <- data.frame(df.sse=df.sse, p=p, SSE=sse, MSE=MSE,
            R2=R2, R2.adj=R2.adj, AIC=aic, BIC=bic)
        return(out)
    }
    result <- lapply(allModelsResults, extract)
    result <- as.data.frame(matrix(unlist(result), nrow=n.models, byrow=T))
    result <- cbind(formular, result)
rownames(result) <- NULL
colnames(result) <- c("model", "df.sse", "p", "SSE", "MSE", "R2",
"R2.adj", "AIC", "BIC")
    return(result)
}
all.possible.regressions(dat=quasar, k=5)


On Tue, May 18, 2010 at 11:38 PM, Tim Clark  wrote:
> Is there a function that will allow me to run all model iterations if I 
> specify a full model?  I am using information criteria to choose between 
> possible candidate models.  I have been writing out all possible model 
> combinations by hand, and I am always worried that I am missing models or 
> have made a mistake somewhere.  It is also difficult to alter models if I 
> want to change a term.  For example, below are the set of models I would like 
> to run.  Is there a way to specify the full model and have R generate the 
> rest?  I.e. specify
>
>  m1234567<-glm.convert(glm.nb(mantas~site*year+cosmonth+sinmonth+coslunar+sinlunar+plankton,
>  data=mydata))
>
> and have R run all the other models.
>
>
> library(MASS)
>
> #Intercept only
>  m0<-glm.convert(glm.nb(mantas~1,data=mydata))
>
> #One term - 7 models
>  #Manta abundance is greater at one of the two sites
>  m1<-glm.convert(glm.nb(mantas~site,data=mydata))
>  #Manta abundance increases each year as the population increases in size due 
> to births or immigration being greater than deaths and emmigration
>  m2<-glm.convert(glm.nb(mantas~year,data=mydata))
>  #Manta abundances increases during part of the year due to seasonal cycles 
> in resources (mates, food)
>  m3<-glm.convert(glm.nb(mantas~cosmonth,data=mydata))
>  m4<-glm.convert(glm.nb(mantas~sinmonth,data=mydata))
>  #Manta abundance decreases with increased lunar phase
>  m5<-glm.convert(glm.nb(mantas~coslunar, data=mydata))
>  m6<-glm.convert(glm.nb(mantas~sinlunar, data=mydata))
>  #Manta abundance increases with increased levels of plankton
>  m7<-glm.convert(glm.nb(mantas~plankton,data=mydata))
>
> #Two terms - 21 models
>  m12<-glm.convert(glm.nb(mantas~site*year, data=mydata))   #Interaction term 
> to account for hotel being closed at Keauhou for some years
>  m13<-glm.convert(glm.nb(mantas~site+cosmonth,data=mydata))
>  m14<-glm.convert(glm.nb(mantas~site+sinmonth,data=mydata))
>  m15<-glm.convert(glm.nb(mantas~site+coslunar,data=mydata))
>  m16<-glm.convert(glm.nb(mantas~site+sinlunar,data=mydata))
>  m17<-glm.convert(glm.nb(mantas~site+plankton,data=mydata)) #Should this have 
> an interaction term?  Plankton may varry by site
>
>  m23<-glm.convert(glm.nb(mantas~year+cosmonth,data=mydata))
>  m24<-glm.convert(glm.nb(mantas~year+sinmonth,data=mydata))
>  m25<-glm.convert(glm.nb(mantas~year+coslunar,data=mydata))
>  m26<-glm.convert(glm.nb(mantas~year+sinlunar,data=mydata))
>  m27<-glm.convert(glm.nb(mantas~year+plankton,data=mydata))
>
>  m34<-glm.convert(glm.nb(mantas~cosmonth+sinmonth,da

Re: [R] sample and rearrange

2010-05-19 Thread Wu Gong

It took me a day to make the sense of Jim's code :(

Hope my comments will help.

## Transform data to matrix
x <- as.matrix(x)

## Apply function to each row
## Create a function to rearrange bases
result <- apply(x, 1, function(eachrow){

## Split each gene to bases
## Exclude the fist column which is id
bases <- strsplit(eachrow[-1], '')

## Transform list to matrix
## Because the result of function strsplit is a list
bases <- do.call(rbind,bases)

## Recombine bases by connecting all bases in each column
recombine <- apply(bases, 2, paste, collapse="")

## Add id
## Transpos recombine
cbind(eachrow[1], t(recombine))
 })

## Transpose the result matrix  
result <- t(result)

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sample-and-rearrange-tp747p2223594.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange case of partial matching in .[ - possible bug / wrong documentation?

2010-05-19 Thread Hilmar Berger



Duncan Murdoch schrieb:

On 19/05/2010 12:14 PM, Hilmar Berger wrote:

Hi all,

This occurred in R-2.11.0 (WinXP).

The R-help page of .[ says that:

"Character indices can in some circumstances be partially matched (see
pmatch) to the names or dimnames of the object being subsetted (but 
never

for subassignment). Unlike S (Becker et al p. 358)), R has never used
partial matching when extracting by [, and as from R 2.7.0 partial 
matching

is not by default used by [[ (see argument exact)."

My understanding is therefore that .[ should never try partial matching.
  


See near the top of the page:  "The descriptions here apply only to 
the default methods."  Since indexing is generic, an extraction method 
can do whatever it wants, and you need to read the particular page to 
find the behaviour.  The page for Extract.data.frame says:


"Both |[| and |[[| extraction methods partially match row names. By 
default neither partially match column names, but |[[| will unless 
|exact=TRUE|. If you want to do exact matching on row names use |match 
| as in the examples."


Duncan Murdoch 

Sorry, I should have read the complete help page.

Thanks a lot !

Best regards,
Hilmar

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looping input and multiple outputs

2010-05-19 Thread danob321

I'm quite new to this - and would appreciate some help getting started...

I am looking to feed data from dataframe A, one line at a time into an input
frame to the program, and then save the output into a different dataframe.

input frame - one column:

ABCDEFTATF
AHFUTJRAUUFS
TITIAJFSUDFJA
TIDUSHANM

program accepts input into frame "s" one line at a time:
ABCDEFTATF

program outputs:
[[1]]
[1] "..()((..))"

[[2]]
[1] -15.3


Ideally, I would like to collect the output as two columns into a separate
frame, e.g. "r":

..()((..))   -15.3
..()((..))   -14.4
..()((..))   -11.6
..()((..))   -17.7


Any help to get going on this would be very welcome...

Thanks.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Looping-input-and-multiple-outputs-tp2223539p2223539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Nomogram with multiple interactions (package rms)

2010-05-19 Thread Marc Carpentier
I'm sorry. I don't understand the "omit" solution, and maybe I mislead you with 
my explanation.

With the data from the "f" exemple of nomogram() :
Let's declare :
f2 <- cph(Surv(d.time,death) ~ sex*(age+blood.pressure))
I guess the best (and maybe the only) way to represent it with a nomogram is to 
plot two nomograms (I couldn't find better).
Is there a way to have :

Nomogram1 : "male" :
- points 1-100 ---
- age ("men") ---
- blood.pressure ("men") ---
- linear predictor ---

And nomogram2 : "female" :
- points 1-100 ---
- age ("female") ---
- blood.pressure ("female") ---
- linear predictor ---

As I said I tried and failed (nomogram() still wants me to define 
interact=list(...)) with :
plot(nomorgam(f2, adj.to=list(sex="male")) #and "female" for the other one

Marc



- Message d'origine 
De : Frank E Harrell Jr 
À : Marc Carpentier ; r-help-request Mailing List 

Envoyé le : Mer 19 mai 2010, 22h 28min 51s
Objet : Re: [R] Nomogram with multiple interactions (package rms)

On 05/19/2010 03:17 PM, Marc Carpentier wrote:
> Dear list, I'm facing the following problem : A cox model with my sex
> variable interacting with several continuous variables :
> cph(S~sex*(x1+x2+x3)) And I'd like to make a nomogram. I know it's a
> bit tricky and one mights argue that nomogram is not a good a
> choice... I could use the parameter
> interact=list(sex=("male","female"),x1=c(a,b,c))... but with rcs or
> pol transformations of x1, x2 and x3, the choice of the
> categorization (a,b,c,...) is arbitrary and the nomogram not so
> useful... Considering that sex is the problem, I thought I could draw
> two nomograms, one for each sex... based on one model. These would be
> great. Do you think it's possible ?

Yes, you can specify constant predictors not to draw with the omit= 
argument.  But try first to draw everything.  Shouldn't you just get 2 
axes each for x1 x2 x3?

>
> Taking the exemple of the help of nomogram() (package "rms") : f<-
> psm(Surv(d.time,death) ~ sex*age, dist=if(.R.)'lognormal' else
> 'gaussian')

Drop the if(.R.) which was just corrected in the documentation.  Use 
dist='lognormal'

Frank

>
> Let's add the previously defined blood.pressure effect with an
> interaction with sex too (with cph) : f2<- cph(Surv(d.time,death) ~
> sex*(age+blood.pressure))
>
> I thought of the parameter adt.to : plot(nomorgam(f2,
> adj.to=list(sex="male")) #and "female" for the other one
>
> But nomogram() still wants me to define interact=list(...) Thanks for
> any advice you might have (with adj.to or any alternative...)
>
> Marc Carpentier
>


-- 
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
  Department of Biostatistics   Vanderbilt University





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple language output - Correct in RGui, wrong in .txt after sink()

2010-05-19 Thread Prof Brian Ripley
You haven't given us the 'at a minimum' information asked for in the 
the posting guide (but we can guess you are using Windows), nor do we 
know the intended encoding of this email (I see no encoding in the 
header as it reached me, but it seems sensible viewed as UTF-8). And 
the absence of basic information does make it *really* hard to help 
here -- this reply is my third guess at what might be happening.


We also do not know the font you are using in RGui, but I am 
not aware of any Windows font which covers correctly Russian and CJK.
However, it is not just a question of knowing the font name: different 
versions of Windows, including different language-specific versions, 
have different fonts with the same name.


RGui (since about R 2.7.0) works in UCS-2 encoding.  Sink files work 
in the locale's encoding (another of the pieces of information you did 
not tell us, but on Windows it is 8-bit or specific to one of 
Simplified Chinese, Traditional Chinese, Japanese or Korean -- I'd 
guess from your address it was CP1252, but it *is* part of the 'at a 
minimum').  So whereas R can store non-native strings in UTF-8 
(provided you get them in as such), it can only output them if told 
how to: the designer of RGui did so but you in using 
sink('output.txt') did not.


cat+sink is an inefficient way to write to a file: try using the file= 
argument on an opened connection.  And you can set the encoding on 
that connection.  I really don't know what you meant by 'the 
characters as I expect': in a file they have to be in *some* encoding 
and you are not looking at bits but as a representation in some 
unspecified file viewer.  One possibility is that you meant UCS-2 
(what Windows tends incorrectly to call 'Unicode' files), in which 
case you can use something like


con <- file("foo", encoding="UCS-2LE")
cat(..., file=con)
...
close(con)

You can use a connection with sink() too.

Think of it more as a miracle (and much unappreciated hard work and 
inspired design) that any of this works on Windows, and if you want it 
to work transparently, change to an OS with UTF-8 locales (these days, 
just about anything else).



On Wed, 19 May 2010, mark.reds...@evonik.com wrote:


I have the following problem with outputting multilingual data to a file.
I get (except for Korean) what I expect as result in the RGui, but when I
use sink() to output to a text file loose the characters in the foreign
languages.
I post a small example below. Since I am not sure how well my email system
as the list copes with all the different characters I have additionally
created a pdf version of this example.
The first part of the example behaves as I expect for all languages except
Korean. I believe that the Korean language may be a problem with the font,
it would be great if someone could confirm this?
In the second part with output to the txt file I get the  type
unicode as output not the expected characters. My main problem is how can
I output the characters as I expect?


RM_EN <- c("Alfalfa hay","Alfalfa meal","Alfalfa silage")
RM_DE <- c("Luzerneheu","Lurzernegrünmehl","Luzernesilage")
RM_RU <- c("Люцерновое сено","Люцерновая травяная мука","Люцерновый

сенаж")

RM_CN <- c("苜蓿干草","苜蓿草粉","苜蓿青贮")
RM_JP <- c("アルファルファ乾草","アルファルファ ミール","アルファルファ

サイレージ")

RM_KR <- c("알팔파 건초","알팔파 박","알팔파 사일리지")

RMLANG <- data.frame(RM_EN,RM_DE,RM_RU,RM_CN,RM_JP,RM_KR)
nrm <- NROW(RMLANG)

for(i in 1:nrm)

+ {
+ cat(format("English",width = 12, justify = c("left")),
as.character(RMLANG$RM_EN[i]),"\n",sep="")
+ cat(format("Deutsch",width = 12, justify = c("left")),
as.character(RMLANG$RM_DE[i]),"\n",sep="")
+ cat(format("Russian",width = 12, justify = c("left")),
as.character(RMLANG$RM_RU[i]),"\n",sep="")
+ cat(format("Japanese",   width = 12, justify = c("left")),
as.character(RMLANG$RM_JP[i]),"\n",sep="")
+ cat(format("Chinese",width = 12, justify = c("left")),
as.character(RMLANG$RM_CN[i]),"\n",sep="")
+ cat(format("Korean",width = 12, justify = c("left")),
as.character(RMLANG$RM_KR[i]),"\n","\n","\n",sep="")
+ }
English Alfalfa hay
Deutsch Luzerneheu
Russian Люцерновое сено
Japaneseアルファルファ乾草
Chinese 苜蓿干草
Korean  알팔파 건초

English Alfalfa meal
Deutsch Lurzernegrünmehl
Russian Люцерновая травяная мука
Japaneseアルファルファ ミール
Chinese 苜蓿草粉
Korean  알팔파 박

English Alfalfa silage
Deutsch Luzernesilage
Russian Люцерновый сенаж
Japaneseアルファルファ サイレージ
Chinese 苜蓿青贮
Korean  알팔파 사일리지


for(i in 1:nrm)

+ {
+ sink("output.txt")
+ cat(format("English",width = 12, justify = c("left")),
as.character(RMLANG$RM_EN[i]),"\n",sep="")
+ cat(format("Deutsch",width = 12, justify = c("left")),
as.character(RMLANG$RM_DE[i]),"\n",sep="")
+ cat(format("Japanese",   width = 12, justify = c("left")),
as.character(RMLANG$RM_JP[i]),"\n",sep="")
+ cat(format("Chinese",width = 12, justify = c("left")),
as.character(RMLANG$RM_CN[i]),"\n",sep="")
+ cat(format("Korean", width = 12,

Re: [R] export dataframe's column classes to a list

2010-05-19 Thread Alexander Shenkin
It does indeed!  thank you very much, and thanks to mark leeds as well,
who emailed me off list with a similar solution.

On 5/19/2010 3:49 PM, Peter Alspach wrote:
> Tena koe Allie
>
> Does
>
> sapply(myframe, class)
>
> do what you want?
>
> Peter Alspach
>
>   
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Alexander Shenkin
>> Sent: Thursday, 20 May 2010 8:19 a.m.
>> To: r-help@r-project.org
>> Subject: [R] export dataframe's column classes to a list
>>
>> Hi Folks,
>>
>> I want to export a dataframe's column classes to a list so that I can
>> reinstantiate the dataframe from a CSV file in the future.  (I know
>> about save(), which I'm using in addition to this).
>>
>> what I want to do is the following:
>> write.csv(myframe);
>> col_classes = get_col_classes(myframe);
>> write.csv(col_classes, "column_classes")
>>
>> ... time passes, R gets closed, etc ...
>>
>> col_classes = read.csv("column_classes")
>> new_myframe = read.csv("myfile.csv", colClasses = col_classes)
>>
>>
>> My question is how to construct the "get_col_classes()" function
>> 
> above.
>   
>>  Any thoughts are greatly appreciated!
>>
>> Thanks,
>> Allie
>>
>> --
>> Alexander Shenkin
>> PhD Candidate
>> School of Natural Resources and Environment
>> University of Florida
>>
>> http://snre.ufl.edu/people/students.asp
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] export dataframe's column classes to a list

2010-05-19 Thread Peter Alspach
Tena koe Allie

Does

sapply(myframe, class)

do what you want?

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Alexander Shenkin
> Sent: Thursday, 20 May 2010 8:19 a.m.
> To: r-help@r-project.org
> Subject: [R] export dataframe's column classes to a list
> 
> Hi Folks,
> 
> I want to export a dataframe's column classes to a list so that I can
> reinstantiate the dataframe from a CSV file in the future.  (I know
> about save(), which I'm using in addition to this).
> 
> what I want to do is the following:
> write.csv(myframe);
> col_classes = get_col_classes(myframe);
> write.csv(col_classes, "column_classes")
> 
> ... time passes, R gets closed, etc ...
> 
> col_classes = read.csv("column_classes")
> new_myframe = read.csv("myfile.csv", colClasses = col_classes)
> 
> 
> My question is how to construct the "get_col_classes()" function
above.
>  Any thoughts are greatly appreciated!
> 
> Thanks,
> Allie
> 
> --
> Alexander Shenkin
> PhD Candidate
> School of Natural Resources and Environment
> University of Florida
> 
> http://snre.ufl.edu/people/students.asp
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in gam and spatial scale of variables

2010-05-19 Thread Joris Meys
On Wed, May 19, 2010 at 4:51 PM, Lucia Rueda  wrote:

>
> Hi Joris,
>
> We're using mgcv.
>
> We have data on abundance of groupers on line transects that have the same
> length.

I only now realized groupers are actually fish :-). Should work on my
english skills...


> My coworker has selected a bunch of variables and he has calculated
> them in terms of total area in different sizes of buffers around the
> centroid of the transect. He has run gam models (quasipoisson, mgcv) for
> each explanatory variable at each size of buffer.


Here you lost me a bit. How should I imagine those buffers? Is it, as Simon
said, some area? Then that would mean you measure eg salinity along the
transect, and average the numbers using a window of a specific size? Or am I
seeing it wrong?

Then he has selected the
> signifficant variables. Some variables explain a higher percentage of
> deviance at different sizes of buffers. And now he wants to build a gam
> model trying the different explanatory variables but using the values that
> correspond to the size of the buffer where they explain a higher deviance,
> so one variable might have the values of a smaller scale whereas other
> might
> correspond to a higher buffer size (I don't know if I made myself clear). I
> am wondering if this is correct.
>

It seems not correct to me. Model building in these frameworks, especially
when using inference, should be driven by hypothesis, not by any correlation
in the data. Especially with smooths one has to be very careful.

Another issue is the correlation between environmental variables, They often
covary along transects, meaning that you can have confounding and even
aliasing in your dataset. This has to be checked and taken into account
_before_ building the models. I have the impression that his approach does
not take care of this.

Next, I believe that data should be used as raw as possible, to not
jeopardize the interpretation. If you use different buffer sizes, you can't
just say that variable X and Y contribute significantly to the explanation
of the variation, but that variable X and Y contributes significantly,
depending on the scale it is measured.

It also depends on whether your goal is purely predictive, or if you want to
do inference. In case you want to conclude something about the significance
of the parameters, his approach seems unvalid to me. How to explain that the
significance of a variable depends on the scale of measurement? One assumes
a continuous relation -unless working with factors- so the scale shouldn't
make much of a difference anyway. If you can predict the number of groupers
by the amount of bald men in Hong-Kong, by all means, do so. But I wouldn't
formulate a scientific conclusion based on the significance of that model,
if you get my drift.

Also I don't know if he should include an offset in spite all the transects
> have the same length.
>
Do you mean an intercept? In that case I'd always include one, except in
very specific cases.

>
> I'm in charge of looking at the spatial correlation once he builds the
> model. I don't know much about it but I was thinking of doing a Moran test,
> correlogram and variogram and then if there's spatial autocorrelation doing
> gamm, sar or gee.
>
Gamm is a very powerful tool, but -if I understood Simon's book correctly-
you cannot trust the anova's on the gam-component of the gamm-object when
using link functions. LR tests can give some information, but there is not a
solid statistical framework yet for formal hypothesis testing of those
models.

I also wonder why building a model without, and then doing the same with the
correct variance-covariance structure. Personally, I'd do it the other way
around. Not that it will change much about the predictions, but it
definitely will change the inference.

In any case, all of these are my personal opinions on a problem I do not
understand fully. It's some general considerations, feel free to think
different.

>
> Thanks,
>
> Lucia
> --
> View this message in context:
> http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp483p976.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting

Re: [R] Generating all possible models from full model

2010-05-19 Thread Daniel Malter

Hi, one approach is document below. The function should work with any
regression function that follows the syntax of lm (others will need
adjustments). Note that you would have to create the interactions terms by
hand (which is no big deal if there are just few). Note also that this
approach can be highly problematic if you are scrounging for significant
relationships (this depends on the field and the specific intention with
which these analyses are performed).

#simulate data
  #predictor variables
data=data.frame(d=rnorm(100),e=rnorm(100),f=rnorm(100))
  #error term
u=rnorm(100)
  #dependent variable
y=data$d-data$e+2*data$f+u

#create a present/absent list for the regressors
grits=list()
for(i in 1:length(data)){
  grits[[i]]=c(0,1)
  }

#expand the above list to a grid that contains all combinations of
regressors
selection=expand.grid(grits)

#given the above grid, which regressor should I pick (get the indices for
which variable(s) should be included)
one=function(x){which(x==1)}
selection.id=apply(selection,1,one)

#what are the names of the included variables
vnames=function(x){names(data)[x]}
var.names=lapply(selection.id,vnames)

#Dependent variable (unnecessary step if y is a vector or matrix anyway)
y=as.matrix(y)

#Select the data for each regression and store them in a list
select.data=function(x){as.matrix(data[,x],row.names=T)}
Xs=lapply(selection.id,select.data)

#get the column names for each element of Xs right (workaround)
#this is necessary because R does not get the column names right if there is
only one column in the list element
for(i in 1:length(Xs)){dimnames(Xs[[i]])=list(NULL,var.names[[i]])}

#remove the first element because it's empty (otherwise the regression
function returns an error
#when it tries to run the first regression)
Xs[[1]]=NULL


#Define a function that regresses y on x and shows us the summary
regress=function(x){summary(lm(y~x))}

#Apply regress over all elements of Xs, i.e.,
#regress y on all possible subsets of regressors
lapply(Xs,regress)


HTH,
Daniel




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Generating-all-possible-models-from-full-model-tp377p2223550.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nomogram with multiple interactions (package rms)

2010-05-19 Thread Frank E Harrell Jr

On 05/19/2010 03:17 PM, Marc Carpentier wrote:

Dear list, I'm facing the following problem : A cox model with my sex
variable interacting with several continuous variables :
cph(S~sex*(x1+x2+x3)) And I'd like to make a nomogram. I know it's a
bit tricky and one mights argue that nomogram is not a good a
choice... I could use the parameter
interact=list(sex=("male","female"),x1=c(a,b,c))... but with rcs or
pol transformations of x1, x2 and x3, the choice of the
categorization (a,b,c,...) is arbitrary and the nomogram not so
useful... Considering that sex is the problem, I thought I could draw
two nomograms, one for each sex... based on one model. These would be
great. Do you think it's possible ?


Yes, you can specify constant predictors not to draw with the omit= 
argument.  But try first to draw everything.  Shouldn't you just get 2 
axes each for x1 x2 x3?




Taking the exemple of the help of nomogram() (package "rms") : f<-
psm(Surv(d.time,death) ~ sex*age, dist=if(.R.)'lognormal' else
'gaussian')


Drop the if(.R.) which was just corrected in the documentation.  Use 
dist='lognormal'


Frank



Let's add the previously defined blood.pressure effect with an
interaction with sex too (with cph) : f2<- cph(Surv(d.time,death) ~
sex*(age+blood.pressure))

I thought of the parameter adt.to : plot(nomorgam(f2,
adj.to=list(sex="male")) #and "female" for the other one

But nomogram() still wants me to define interact=list(...) Thanks for
any advice you might have (with adj.to or any alternative...)

Marc Carpentier




--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] col allocation is not right

2010-05-19 Thread Changbin Du
Thanks, Phil!

Does that mean only eight colors can be used in R plot?
THe following codes works for me, but, if I use the number, it does not
work.

plot(svm.auc, col=2, main="ROC curves comparing classification performance\n
of six machine learning models")
legend(0.5, 0.6, c(ns, nb, nr, nt, nl,ne), c(2,3,4,5,6,"black")) # Draw a
legend.

plot(bo.auc, col=3, add=T) # add=TRUE draws on the existing chart
plot(rf.auc, col=4, add=T)
plot(tree.auc, col=5, add=T)
plot(nn.auc, col=6, add=T)
plot(en.auc, col="black",lty=4,lwd=3, add=T)



On Wed, May 19, 2010 at 11:30 AM, Phil Spector wrote:

> Changbin -
>   Please take a look at the help file for the function
> "palette", which is how R maps col= numbers to colors.
> Also look at the default output of that function:
>
>  palette()
>>
> [1] "black"   "red" "green3"  "blue""cyan""magenta" "yellow"
> [8] "gray"
>
> You might get better results by specifying the colors that you
> want directly.
>
>- Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spec...@stat.berkeley.edu
>
>
>
> On Wed, 19 May 2010, Changbin Du wrote:
>
>  plot(svm.auc, col=2, main="ROC curves comparing classification
>> performance\n
>> of six machine learning models")
>> legend(0.5, 0.6, c(ns, nb, nr, nt, nl,ne), 2:6, 9) # Draw a legend.
>>
>> plot(bo.auc, col=3, add=T) # add=TRUE draws on the existing chart
>> plot(rf.auc, col=4, add=T)
>> plot(tree.auc, col=5, add=T)
>> plot(nn.auc, col=6, add=T)
>> plot(en.auc, col=9,lty="dotted",lwd=3, add=T)
>>
>> Hi, Dear community,
>>
>> I am use the above codes to draw plot, but find that col =9 is not used by
>> the R.  Instead, it use the col=2 when plot en.auc. WHy this happens and
>> how
>> to check the col allocation in R?
>>
>> Thanks!
>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>


-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save in for loop

2010-05-19 Thread Peter Ehlers

On 2010-05-19 12:05, Shi, Tao wrote:

Ivan,

Try this:

eval(parse(text=paste("save(file", i, ", file=\"file", i, ".RData\")", sep="")))

...Tao



Or just use 'list=' like this:

for (i in 1:4) {
  temp <- data.frame(a=(i+1):(i+10), b=LETTERS[(i+1):(i+10)])
  filename <- paste("file", i, sep="")
  assign(filename, temp)
  save(list=c(filename), file=paste(filename, ".rda", sep=""))
}

 -Peter Ehlers




- Original Message 

From: Ivan Calandra
To: r-help@r-project.org
Sent: Wed, May 19, 2010 7:56:44 AM
Subject: [R] save in for loop

Dear users,


My problem concerns save() within a for loop.
Here is my

code:


for (i in 1:4) {
temp<- data.frame(a=(i+1):(i+10),

b=LETTERS[(i+1):(i+10)])

filename<- paste("file", i, sep="")


assign(filename, temp)

save(filename, file=paste(filename, ".rda",

sep=""))

}

As you can see, save() doesn't work as I would like: (1)

the object saved is called "filename" (instead of "file1", "file2", etc), and
(2) it of course contains only the name (as character) instead of the
data.frame


How can I fix it?


[snip]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] export dataframe's column classes to a list

2010-05-19 Thread Alexander Shenkin
Hi Folks,

I want to export a dataframe's column classes to a list so that I can
reinstantiate the dataframe from a CSV file in the future.  (I know
about save(), which I'm using in addition to this).

what I want to do is the following:
write.csv(myframe);
col_classes = get_col_classes(myframe);
write.csv(col_classes, "column_classes")

... time passes, R gets closed, etc ...

col_classes = read.csv("column_classes")
new_myframe = read.csv("myfile.csv", colClasses = col_classes)


My question is how to construct the "get_col_classes()" function above.
 Any thoughts are greatly appreciated!

Thanks,
Allie

-- 
Alexander Shenkin
PhD Candidate
School of Natural Resources and Environment
University of Florida

http://snre.ufl.edu/people/students.asp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nomogram with multiple interactions (package rms)

2010-05-19 Thread Marc Carpentier
Dear list,
I'm facing the following problem :
A cox model with my sex variable interacting with several continuous variables 
: cph(S~sex*(x1+x2+x3))
And I'd like to make a nomogram. I know it's a bit tricky and one mights argue 
that nomogram is not a good a choice...
I could use the parameter interact=list(sex=("male","female"),x1=c(a,b,c))... 
but with rcs or pol transformations of x1, x2 and x3, the choice of the 
categorization (a,b,c,...) is arbitrary and the nomogram not so useful...
Considering that sex is the problem, I thought I could draw two nomograms, one 
for each sex... based on one model. These would be great.
Do you think it's possible ?

Taking the exemple of the help of nomogram() (package "rms") :
f <- psm(Surv(d.time,death) ~ sex*age, dist=if(.R.)'lognormal' else 'gaussian')

Let's add the previously defined blood.pressure effect with an interaction with 
sex too (with cph) :
f2 <- cph(Surv(d.time,death) ~ sex*(age+blood.pressure))

I thought of the parameter adt.to :
plot(nomorgam(f2, adj.to=list(sex="male")) #and "female" for the other one

But nomogram() still wants me to define interact=list(...)
Thanks for any advice you might have (with adj.to or any alternative...)

Marc Carpentier




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in untar2(tarfile, files, list , exdir) : unsupported entry type ‘x’

2010-05-19 Thread Christopher Bare
Hi R gurus,

I'm getting the following error when trying to build and install an R package:

Error in untar2(tarfile, files, list, exdir) : unsupported entry type ‘x’

I build the package like so:
R --no-init-file CMD build mypackage

Then try to install it:
sudo R --no-init-file CMD INSTALL mypackage.tar.gz

...which dies with the above error. I can extract the archive fine
with tar -zxf, so I don't think the file has problems.

R version 2.11.0 (2010-04-22)
OSX 10.6.3

Any hints would be greatly appreciated! Thanks,

-- chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange case of partial matching in .[ - possible bug / wrong documentation?

2010-05-19 Thread David Winsemius


On May 19, 2010, at 12:14 PM, Hilmar Berger wrote:


Hi all,

This occurred in R-2.11.0 (WinXP).

The R-help page of .[ says that:

"Character indices can in some circumstances be partially matched (see
pmatch) to the names or dimnames of the object being subsetted (but  
never

for subassignment). Unlike S (Becker et al p. 358)), R has never used
partial matching when extracting by [, and as from R 2.7.0 partial  
matching

is not by default used by [[ (see argument exact)."

My understanding is therefore that .[ should never try partial  
matching.


That was for [.matrix or similar  (as noted at top of the page). You  
created a different class of variable.


?"[.data.frame

"Both [ and [[ extraction methods partially match row names. By  
default neither partially match column names, but [[ will unless  
exact=TRUE. If you want to do exact matching on row names use match as  
in the examples."


--

David.




However:


df = data.frame(a=c(1,2,3,9), b=c(4,5,6,10))
rownames(df) = c("ef","gg","hh","fe")
df

  a  b
ef 1  4
gg 2  5
hh 3  6
fe 9 10



df["e",]

 a b
ef 1 4


rownames(df) = c("ef","gg","hh","efg")
df["e",]

   a  b
NA NA NA

So, it looks like partial matching is done using  
pmatch("e",rownames(df))

for "[". If this is true, the help page is not correct.

Thanks !
Regards,
Hilmar

---
Hilmar Berger
Integromics S.L. / CNB-CSIC
Madrid, Spain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange case of partial matching in .[ - possible bug / wrong documentation?

2010-05-19 Thread Duncan Murdoch

On 19/05/2010 12:14 PM, Hilmar Berger wrote:

Hi all,

This occurred in R-2.11.0 (WinXP).

The R-help page of .[ says that:

"Character indices can in some circumstances be partially matched (see
pmatch) to the names or dimnames of the object being subsetted (but never
for subassignment). Unlike S (Becker et al p. 358)), R has never used
partial matching when extracting by [, and as from R 2.7.0 partial matching
is not by default used by [[ (see argument exact)."

My understanding is therefore that .[ should never try partial matching.
  


See near the top of the page:  "The descriptions here apply only to the 
default methods."  Since indexing is generic, an extraction method can 
do whatever it wants, and you need to read the particular page to find 
the behaviour.  The page for Extract.data.frame says:


"Both |[| and |[[| extraction methods partially match row names. By 
default neither partially match column names, but |[[| will unless 
|exact=TRUE|. If you want to do exact matching on row names use |match 
| as in the examples."


Duncan Murdoch

However:

> df = data.frame(a=c(1,2,3,9), b=c(4,5,6,10))
> rownames(df) = c("ef","gg","hh","fe")
> df
   a  b
ef 1  4
gg 2  5
hh 3  6
fe 9 10


> df["e",]
  a b
ef 1 4

> rownames(df) = c("ef","gg","hh","efg")
> df["e",]
a  b
NA NA NA

So, it looks like partial matching is done using pmatch("e",rownames(df))
for "[". If this is true, the help page is not correct.

Thanks !
Regards,
Hilmar

---
Hilmar Berger
Integromics S.L. / CNB-CSIC
Madrid, Spain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Gavin Simpson
On Wed, 2010-05-19 at 14:59 -0400, Godavarthi, Murali wrote:
> Hi Gavin, Steve
> 
> Sorry, please use the below dput for mytestdata. Thanks!!

No need. The issue I think issue is due to the number of levels in the
factor. IIRC correctly, I've been bitten by this before where the
newdata object contained factors with different numbers of levels and/or
a different subset of levels.

Try setting the levels on mytestdata explicitly from the levels of
testdata, e.g. something like:

mytestdata <- within(mytestdata, {
 sex <- factor(sex, levels = levels(testdata$sex))
 race <- factor(race, levels = levels(testdata$race))
 marstat <- factor(marstat, levels = 
levels(testdata$marstat))
 empac <- factor(empac, levels = levels(testdata$empac))
 })

Then check with

str(mytestdata)

that it is consistent with

str(testdata)

If it is, then try to call predict on your RF model and newdata = testdata)

HTH

G

> 
> structure(list(imurder = 0, itheft = 0, irobbery = 0, iassault = 1,
> idrug = 0, iburglary = 0, igun = 0, psych = 0, Freq = 0, priors =
> 58, firstage = 19, intage = 19, sex = structure(1, .Label = "1", class =
> "factor"), race = structure(1, .Label = "BLACK", class = "factor"),
> marstat = structure(1, .Label = "SINGLE", class = "factor"), empac =
> structure(1, .Label = "UNEMPLD", class = "factor"), educ = 0,
> zipcode = 21215, suspendmn = 0, drugs = 0, alco = 0, probation = 1,
> parole = 0), .Names = c("imurder", "itheft", "irobbery", "iassault",
> "idrug", "iburglary", "igun", "psych", "Freq", "priors", "firstage",
> "intage", "sex", "race", "marstat", "empac", "educ", "zipcode",
> "suspendmn", "drugs", "alco", "probation", "parole"), class =
> "data.frame", row.names = "10")
> 
> 
> Best Regards,
> 
> Murali Godavarthi
> 
> 410-585-3746 (w)
> 
> ITCD - DPSCS Data Mining
> 
> 
> -Original Message-
> From: Godavarthi, Murali 
> Sent: Wednesday, May 19, 2010 2:43 PM
> To: 'gavin.simp...@ucl.ac.uk'; Steve Lianoglou
> Cc: r-help@r-project.org
> Subject: RE: [R] Regarding the 'R' Load Command
> 
> Hi Steve, Gavin
> 
> This is being really helpful. I've pasted the working data, and my test
> data below after running the str command on both of those variables. The
> working sample actually contains about 300 records, hence I am not able
> to paste the whole data here. However my sample test data which I am
> trying to get working, is only 1 record, and I've pasted the dput result
> below. Datatypes  seem to match in both variables for me in terms of
> being num/factor. Please suggest where it could be wrong. Thank You!
> 
> 
> 
> mytestdata
> 
> structure(list(imurder = 0, itheft = 0, irobbery = 0, iassault = 1L,
> idrug = 0L, iburglary = 0L, igun = 0L, psych = 0L, Freq = 0L, priors
> = 58L, firstage = 19L, intage = 19L, sex = structure(1L, .Label = "1",
> class = "factor"), race = structure(1L, .Label = "BLACK", class =
> "factor"), marstat = structure(1L, .Label = "SINGLE", class =
> "factor"), empac = structure(1L, .Label = "UNEMPLD", class =
> "factor"), educ = 0L, zipcode = 21215L, suspendmn = 0L, drugs = 0L,
> alco = 0L, probation = 1L, parole = 0L), .Names = c("imurder", "itheft",
> "irobbery", "iassault", "idrug", "iburglary", "igun", "psych", "Freq",
> "priors", "firstage", "intage", "sex", "race", "marstat", "empac",
> "educ", "zipcode", "suspendmn", "drugs", "alco", "probation", "parole"),
> class = "data.frame", row.names = "10")
> 
> 
> > str(testdata)
> 'data.frame':   291 obs. of  23 variables:
>  $ imurder  : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ itheft   : num  0 0 0 0 0 1 0 0 0 0 ...
>  $ irobbery : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ iassault : num  1 0 1 0 0 0 0 0 0 0 ...
>  $ idrug: num  0 1 0 1 1 0 0 1 1 1 ...
>  $ iburglary: num  0 0 0 0 0 0 0 0 0 0 ...
>  $ igun : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ psych: num  0 0 0 0 0 0 0 0 0 0 ...
>  $ Freq : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ priors   : num  58 4 2 0 6 22 0 36 0 0 ...
>  $ firstage : num  19 39 28 0 49 32 0 24 0 55 ...
>  $ intage   : num  19 39 28 25 49 32 32 24 30 55 ...
>  $ sex  : Factor w/ 2 levels "1","2": 1 2 1 2 2 1 1 1 1 1 ...
>  $ race : Factor w/ 5 levels "WHITE","BLACK",..: 2 2 1 1 2 1 1 2 2 2
> ...
>  $ marstat  : Factor w/ 7 levels "SINGLE","MARRIED",..: 1 2 2 1 2 4 7 1
> 7 3 ...
>  $ empac: Factor w/ 6 levels "EMPLD FT","EMPLD PT",..: 3 4 3 3 3 3 6
> 3 6 3 ...
>  $ educ : num  0 0 0 1 0 0 0 0 0 1 ...
>  $ zipcode  : num  21215 21217 21223 21223 21217 ...
>  $ suspendmn: num  0 600 0 0 60 3 2 479 0 3 ...
>  $ drugs: num  0 1 0 0 0 1 0 0 0 1 ...
>  $ alco : num  0 0 0 0 0 1 0 0 0 1 ...
>  $ probation: num  1 1 0 0 1 1 1 1 0 1 ...
>  $ parole   : num  0 0 0 0 0 0 0 0 0 0 ...
> > 
> > 
> > 
> > 
> > str(mytestdata)
> 'data.frame':   1 obs. of  23 variables:
>  $ imurder  : num 0
>  $ itheft   : num 0
>  $ irobbery : num 0
>  $ iassault : num 1
>  $ idru

Re: [R] Generating all possible models from full model

2010-05-19 Thread Frank E Harrell Jr

On 05/19/2010 01:39 PM, Ben Bolker wrote:

Frank E Harrell Jr  Vanderbilt.Edu>  writes:



Please read the large number of notes in the e-mail archive about the
invalidity of such modeling procedures.

Frank



   I'm curious: do you have an objection to multi-model averaging
a la Burnham, Anderson, and White (as implemented in the MuMIn
package)?  i.e., *not* just picking the
best model, and *not* trying to interpret statistical significance
of particular coefficients, but trying to maximize predictive
capability by computing the AIC values of many candidate models
and weighting predictions accordingly (and incorporating among-model variation
when computing prediction uncertainty)?  (I would look for the
answer in your book, but I have lost my copy by loaning it out
&  haven't got a new one yet ...)


Hi Ben,

I think that model averaging (e.g., Bayesian model averaging) works 
extremely well.  But if you are staying within one model family, it is a 
lot more work than the equally excellent penalized maximum likelihood 
estimation of a single (big) model.  The latter uses more standard tools 
and can isolate the effect of one variable and result in ordinary model 
graphics.


I haven't seen a variable selection method that works well without 
penalization (shrinkage).


Frank

--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Object of type 'closure' not subsettable in GROFIT

2010-05-19 Thread Duncan Murdoch

On 19/05/2010 10:34 AM, trubilar wrote:

Hello Everyone, I am quite new in R software.
I am doing a grofit to study growth.
I have a matrix of time with 10 weeks and 26 individuals ie 10 columns x 26
rows
and I have a data.frame with 3 columns: 1 for Individual Id (which repeats
it self, since I have 10 values for each individual correponding to each
week) the size of the individual (1 or 2) and the Length value.

I run the grofit and get the following error
Data error [-1:-3]: object type "closure" not subsettable

I have no idea what it means or if I am doing something wrong!
  


You are trying to index a function.  Possibly you're using brackets 
instead of parentheses when calling it?  E.g.


> mean(1:2)   # correct
[1] 1.5

> mean[1:2]   # incorrect
Error in mean[1:2] : object of type 'closure' is not subsettable
  
If that doesn't help enough, please simplify your problem to a 
self-contained one of a few lines and post it here.  Someone will work 
it out.


Duncan Murdoch

Could anyone help me?
Thanks


Tamara Rubilar
Lab. Bentos
Centro Nacional Patagonico
Puerto Madryn
Argentina



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in gam and spatial scale of variables

2010-05-19 Thread Joris Meys
Thank you for the correction. I was thinking about the difference of using a
variable with a smoother, and comparing that to a model with that variable
without smoother. I should specify that I mostly use thin plate regression
splines.

If the spline itself is not deviating from linearity, you get a nice
straight line going through the point (mean(Data),0) if you look at the
marginal plots. Use the same variable unchanged but as a simple linear
effect in the model, and the same line will run through (0,0). At least,
that's what I noticed and also the reason why I center my variables first.
The models are essentially the same, the shift is mainly in the intercept.
But the centering got a bit a reflex.

Cheers
Joris

On Wed, May 19, 2010 at 8:20 PM, Simon Wood  wrote:

> On Wednesday 19 May 2010 15:29, Joris Meys wrote:
> > Could you specify the package you use? If it is mgcv, this one centers
> your
> > variables before applying the smooths. That's something to take into
> > account when comparing different models.
> --- er, actually it only centres variables in this way for some smoothing
> bases, for numerical stability purposes: but this is done in a way that is
> user transparent and makes absolutely no difference to model interpretation
> or comparison. Of course the smooths themselves are subject to `centering
> constraints'  (but that's very different to centering the variables) ---
> these are just identifiability constraints --- all gam fitting packages
> have
> to put some identifiability constraints on the smooths, and the centering
> constraints used by `mgcv' and `gam' have the benefit of minimizing the
> standard errors on the constrained smooths.
>
> best,
> Simon
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603  
> > www.maths.bath.ac.uk/~sw283
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating all possible models from full model

2010-05-19 Thread Xiaogang Su
A couple of years ago, I wrote a function for doing this. It can be
found in the following file:
http://pegasus.cc.ucf.edu/~xsu/CLASS/STA4164/mt4-2009.doc
I also pasted a copy below. Hope you find it useful.  -XG

# 
# ALL POSSIBLE REGRESSIONS
# 
# The parameter k is the total number of predictors. To apply the function,
# you need to prepare the data so that all predictors for selection are named
# as x1, x2, ..., and the response is called y.

all.possible.regressions <- function(dat, k){
n <- nrow(dat)
regressors <- paste("x", 1:k, sep="")
lst <- rep(list(c(T, F)), k)
regMat <- expand.grid(lst);
names(regMat) <- regressors
formular <- apply(regMat, 1, function(x)
as.character(paste(c("y ~ 1", regressors[x]), collapse="+")))
allModelsList <- apply(regMat, 1, function(x)
as.formula(paste(c("y ~ 1", regressors[x]),collapse=" + ")) )
allModelsResults <- lapply(allModelsList,
function(x, data) lm(x, data=data), data=dat)
n.models <- length(allModelsResults)
extract <- function(fit) {
df.sse <- fit$df.residual
p <- n - df.sse -1
sigma <- summary(fit)$sigma
MSE <- sigma^2
R2 <- summary(fit)$r.squared
R2.adj <- summary(fit)$adj.r.squared
sse <- MSE*df.sse
aic <- n*log(sse) + 2*(p+2)
bic <- n*log(sse) + log(n)*(p+2)
out <- data.frame(df.sse=df.sse, p=p, SSE=sse, MSE=MSE,
R2=R2, R2.adj=R2.adj, AIC=aic, BIC=bic)
return(out)
}
result <- lapply(allModelsResults, extract)
result <- as.data.frame(matrix(unlist(result), nrow=n.models, byrow=T))
result <- cbind(formular, result)
rownames(result) <- NULL
colnames(result) <- c("model", "df.sse", "p", "SSE", "MSE", "R2",
"R2.adj", "AIC", "BIC")
return(result)
}
all.possible.regressions(dat=quasar, k=5)


On Tue, May 18, 2010 at 11:38 PM, Tim Clark  wrote:
> Is there a function that will allow me to run all model iterations if I 
> specify a full model?  I am using information criteria to choose between 
> possible candidate models.  I have been writing out all possible model 
> combinations by hand, and I am always worried that I am missing models or 
> have made a mistake somewhere.  It is also difficult to alter models if I 
> want to change a term.  For example, below are the set of models I would like 
> to run.  Is there a way to specify the full model and have R generate the 
> rest?  I.e. specify
>
>  m1234567<-glm.convert(glm.nb(mantas~site*year+cosmonth+sinmonth+coslunar+sinlunar+plankton,
>  data=mydata))
>
> and have R run all the other models.
>
>
> library(MASS)
>
> #Intercept only
>  m0<-glm.convert(glm.nb(mantas~1,data=mydata))
>
> #One term - 7 models
>  #Manta abundance is greater at one of the two sites
>  m1<-glm.convert(glm.nb(mantas~site,data=mydata))
>  #Manta abundance increases each year as the population increases in size due 
> to births or immigration being greater than deaths and emmigration
>  m2<-glm.convert(glm.nb(mantas~year,data=mydata))
>  #Manta abundances increases during part of the year due to seasonal cycles 
> in resources (mates, food)
>  m3<-glm.convert(glm.nb(mantas~cosmonth,data=mydata))
>  m4<-glm.convert(glm.nb(mantas~sinmonth,data=mydata))
>  #Manta abundance decreases with increased lunar phase
>  m5<-glm.convert(glm.nb(mantas~coslunar, data=mydata))
>  m6<-glm.convert(glm.nb(mantas~sinlunar, data=mydata))
>  #Manta abundance increases with increased levels of plankton
>  m7<-glm.convert(glm.nb(mantas~plankton,data=mydata))
>
> #Two terms - 21 models
>  m12<-glm.convert(glm.nb(mantas~site*year, data=mydata))   #Interaction term 
> to account for hotel being closed at Keauhou for some years
>  m13<-glm.convert(glm.nb(mantas~site+cosmonth,data=mydata))
>  m14<-glm.convert(glm.nb(mantas~site+sinmonth,data=mydata))
>  m15<-glm.convert(glm.nb(mantas~site+coslunar,data=mydata))
>  m16<-glm.convert(glm.nb(mantas~site+sinlunar,data=mydata))
>  m17<-glm.convert(glm.nb(mantas~site+plankton,data=mydata)) #Should this have 
> an interaction term?  Plankton may varry by site
>
>  m23<-glm.convert(glm.nb(mantas~year+cosmonth,data=mydata))
>  m24<-glm.convert(glm.nb(mantas~year+sinmonth,data=mydata))
>  m25<-glm.convert(glm.nb(mantas~year+coslunar,data=mydata))
>  m26<-glm.convert(glm.nb(mantas~year+sinlunar,data=mydata))
>  m27<-glm.convert(glm.nb(mantas~year+plankton,data=mydata))
>
>  m34<-glm.convert(glm.nb(mantas~cosmonth+sinmonth,data=mydata))
>  m35<-glm.convert(glm.nb(mantas~cosmonth+coslunar,data=mydata))
>  m36<-glm.convert(glm.nb(mantas~cosmonth+sinlunar,data=mydata))
>  m37<-glm.convert(glm.nb(mantas~cosmonth+plankton,data=mydata))  #Interaction 
> term?  Plankton may vary by season
>
>  m45<-glm.convert(glm.nb(mantas~sinmonth+coslunar, data=mydata))
>  m46<-glm.convert(glm.nb(mantas~sinmonth+sinlunar, data=mydata))
>  m47<-glm.convert(glm.nb(mantas~sinmonth+plankton, data=mydata)) #Intera

Re: [R] Displaying smooth bases - mgcv package

2010-05-19 Thread Joris Meys
Dear Simon,

Thank you very much! A colleague of mine directed me towards the
extract.lmeDesign function of the RLRsim package, but that only gets me the
basis functions in function of the model I specified. Your solution is what
I was looking for.

Kind regards
Joris

Just for completeness : This does something different, it takes the design
out of the lme part of a gamm object. This gives a mixed model
representation of the spline using the basis function in one way or another
(scaling is different, I'm still trying to grasp how). It's not really what
I looked for, but it might be useful for somebody else.

library(RLRsim)
X <- extract.lmeDesign(test$lme)$X # fixed effects I believe
Z <- extract.lmeDesign(test$lme)$Z # random effects I believe

op <- par(mfrow=c(2,5),mar=c(4,4,1,1))
plot(x1,X[,1],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,X[,2],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,8],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,7],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,6],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,5],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,4],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,3],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,2],ylab="Basis function",xlab="X",type="l",lwd=2)
plot(x1,Z[,1],ylab="Basis function",xlab="X",type="l",lwd=2)
par(op)




On Wed, May 19, 2010 at 8:38 PM, Simon Wood  wrote:

>
>
> On Wednesday 19 May 2010 15:08, Joris Meys wrote:
> > Dear all,
> >
> > for demonstration purposes I want to display the basis functions used by
> a
> > thin plate regression spline in a gamm model. I've been searching the
> help
> > files, but I can't really figure out how to get the plots of the basis
> > functions. Anybody an idea?
> >
> > Some toy code :
> >
> > require(mgcv)
> > require(nlme)
> >
> > x1 <- 1:1000
> > x2 <- runif(1000,10,500)
> >
> > fx1 <- -4*sin(x1/50)
> > fx2 <- -10*(x2)^(1/4)
> > y <- 60+ fx1 + fx2 + rnorm(1000,0,5)
> >
> > test <- gamm(y~s(x1)+s(x2))
> >
> > plot(test$gam)
> >
>
> ## plots basis functions in space of identifiability constraints, set
> ## absorb.cons=FALSE for original basis functions
> um <- smoothCon(s(x1),data=data.frame(x1=sort(x1)),
>  knots=NULL,absorb.cons=TRUE)
> X <- um[[1]]$X
> plot(sort(x1),X[,1],ylim=range(X),type="l")
> for (i in 2:ncol(X)) lines(sort(x1),X[,i],col=i)
>
> best,
> Simon
>
>
>
>
>
>
> > This gives me the conditional estimates, but not the basis functions for
> > the different splines. I'd like to have the basis functions for s(x1) and
> > s(x2).
> >
> > Cheers
> > Joris
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603  
> > www.maths.bath.ac.uk/~sw283
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Godavarthi, Murali
Hi Gavin, Steve

Sorry, please use the below dput for mytestdata. Thanks!!

structure(list(imurder = 0, itheft = 0, irobbery = 0, iassault = 1,
idrug = 0, iburglary = 0, igun = 0, psych = 0, Freq = 0, priors =
58, firstage = 19, intage = 19, sex = structure(1, .Label = "1", class =
"factor"), race = structure(1, .Label = "BLACK", class = "factor"),
marstat = structure(1, .Label = "SINGLE", class = "factor"), empac =
structure(1, .Label = "UNEMPLD", class = "factor"), educ = 0,
zipcode = 21215, suspendmn = 0, drugs = 0, alco = 0, probation = 1,
parole = 0), .Names = c("imurder", "itheft", "irobbery", "iassault",
"idrug", "iburglary", "igun", "psych", "Freq", "priors", "firstage",
"intage", "sex", "race", "marstat", "empac", "educ", "zipcode",
"suspendmn", "drugs", "alco", "probation", "parole"), class =
"data.frame", row.names = "10")


Best Regards,

Murali Godavarthi

410-585-3746 (w)

ITCD - DPSCS Data Mining


-Original Message-
From: Godavarthi, Murali 
Sent: Wednesday, May 19, 2010 2:43 PM
To: 'gavin.simp...@ucl.ac.uk'; Steve Lianoglou
Cc: r-help@r-project.org
Subject: RE: [R] Regarding the 'R' Load Command

Hi Steve, Gavin

This is being really helpful. I've pasted the working data, and my test
data below after running the str command on both of those variables. The
working sample actually contains about 300 records, hence I am not able
to paste the whole data here. However my sample test data which I am
trying to get working, is only 1 record, and I've pasted the dput result
below. Datatypes  seem to match in both variables for me in terms of
being num/factor. Please suggest where it could be wrong. Thank You!



mytestdata

structure(list(imurder = 0, itheft = 0, irobbery = 0, iassault = 1L,
idrug = 0L, iburglary = 0L, igun = 0L, psych = 0L, Freq = 0L, priors
= 58L, firstage = 19L, intage = 19L, sex = structure(1L, .Label = "1",
class = "factor"), race = structure(1L, .Label = "BLACK", class =
"factor"), marstat = structure(1L, .Label = "SINGLE", class =
"factor"), empac = structure(1L, .Label = "UNEMPLD", class =
"factor"), educ = 0L, zipcode = 21215L, suspendmn = 0L, drugs = 0L,
alco = 0L, probation = 1L, parole = 0L), .Names = c("imurder", "itheft",
"irobbery", "iassault", "idrug", "iburglary", "igun", "psych", "Freq",
"priors", "firstage", "intage", "sex", "race", "marstat", "empac",
"educ", "zipcode", "suspendmn", "drugs", "alco", "probation", "parole"),
class = "data.frame", row.names = "10")


> str(testdata)
'data.frame':   291 obs. of  23 variables:
 $ imurder  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ itheft   : num  0 0 0 0 0 1 0 0 0 0 ...
 $ irobbery : num  0 0 0 0 0 0 0 0 0 0 ...
 $ iassault : num  1 0 1 0 0 0 0 0 0 0 ...
 $ idrug: num  0 1 0 1 1 0 0 1 1 1 ...
 $ iburglary: num  0 0 0 0 0 0 0 0 0 0 ...
 $ igun : num  0 0 0 0 0 0 0 0 0 0 ...
 $ psych: num  0 0 0 0 0 0 0 0 0 0 ...
 $ Freq : num  0 0 0 0 0 0 0 0 0 0 ...
 $ priors   : num  58 4 2 0 6 22 0 36 0 0 ...
 $ firstage : num  19 39 28 0 49 32 0 24 0 55 ...
 $ intage   : num  19 39 28 25 49 32 32 24 30 55 ...
 $ sex  : Factor w/ 2 levels "1","2": 1 2 1 2 2 1 1 1 1 1 ...
 $ race : Factor w/ 5 levels "WHITE","BLACK",..: 2 2 1 1 2 1 1 2 2 2
...
 $ marstat  : Factor w/ 7 levels "SINGLE","MARRIED",..: 1 2 2 1 2 4 7 1
7 3 ...
 $ empac: Factor w/ 6 levels "EMPLD FT","EMPLD PT",..: 3 4 3 3 3 3 6
3 6 3 ...
 $ educ : num  0 0 0 1 0 0 0 0 0 1 ...
 $ zipcode  : num  21215 21217 21223 21223 21217 ...
 $ suspendmn: num  0 600 0 0 60 3 2 479 0 3 ...
 $ drugs: num  0 1 0 0 0 1 0 0 0 1 ...
 $ alco : num  0 0 0 0 0 1 0 0 0 1 ...
 $ probation: num  1 1 0 0 1 1 1 1 0 1 ...
 $ parole   : num  0 0 0 0 0 0 0 0 0 0 ...
> 
> 
> 
> 
> str(mytestdata)
'data.frame':   1 obs. of  23 variables:
 $ imurder  : num 0
 $ itheft   : num 0
 $ irobbery : num 0
 $ iassault : num 1
 $ idrug: num 0
 $ iburglary: num 0
 $ igun : num 0
 $ psych: num 0
 $ Freq : num 0
 $ priors   : num 58
 $ firstage : num 19
 $ intage   : num 19
 $ sex  : Factor w/ 1 level "1": 1
 $ race : Factor w/ 1 level "BLACK": 1
 $ marstat  : Factor w/ 1 level "SINGLE": 1
 $ empac: Factor w/ 1 level "UNEMPLD": 1
 $ educ : num 0
 $ zipcode  : num 21215
 $ suspendmn: num 0
 $ drugs: num 0
 $ alco : num 0
 $ probation: num 1
 $ parole   : num 0
>



Best Regards,

Murali Godavarthi

410-585-3746 (w)

ITCD - DPSCS Data Mining


-Original Message-
From: Gavin Simpson [mailto:gavin.simp...@ucl.ac.uk] 
Sent: Wednesday, May 19, 2010 12:58 PM
To: Steve Lianoglou
Cc: Godavarthi, Murali; r-help@r-project.org
Subject: Re: [R] Regarding the 'R' Load Command

I think the answer is clear from the error: R thinks the type of data in
the components of 'testmurali' do not match those of the data used to
fit the original randomForest.

The OP should go back to his model fitting code and do

str(obj)

where 'obj' is the name of his original data object used to fit the
randomForest and compare it with


Re: [R] Generating all possible models from full model

2010-05-19 Thread Tal Galili
Hi Tim,

Here is a rather clumsy way of going about your task:

# -- example code -
func.getY.getX.return.lm <- function(Y,  X.matrix , lm.id.vec)
{
# gets a Y, a vec of T/F and a X.matrix
# performs lm
# and returns output
 potential.X.size <- length(lm.id.vec) + 1
 lm.data <- data.frame(Y, X.matrix[,lm.id.vec])
 lm1 <- lm(Y ~ . ,data = lm.data)
return(lm1)
}

data(mtcars)
X <- mtcars[, -1]
Y <- mtcars[, 1]
X.toUse <- sample(c(T,F), 10, T)

func.getY.getX.return.lm(Y, X , X.toUse )

# -- example code -


Now what you will need to do is:
1) Create an X matrix that includes the vectors of all the interactions you
will be interested in.
2) make a matrix of TRUE/FALSE id's for the variables you would like to use.
3) go through that matrix and build the models (using, for example, the
function I gave above)
4) insert the output to a big list(), and on that list perform the checks
that interest you for finding your model.

As Frank Harrell and the others have mentioned, you are walking on VERY
shaky grounds (in terms of the stability of your models
predictive/explanatory power).

In case you develop the code more, I'd be curious to see how you did it.

Good luck,
Tal






Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Wed, May 19, 2010 at 8:48 PM, Tim Clark  wrote:

> Tal,
>
> No, I am definitely not wanting to generate 7! models.  At least not with
> my current dataset.  That would only be if I was running all first order
> terms and all possible interactions.  In this example I am only wanting to
> run interactions for the variables year:site, which should give 2^7+32=160
> possible models.  What I am really asking is if everyone writes out all of
> their models of interest by hand, or if there is some function that has been
> written that makes it easier to generate all the models.  I keep making
> mistakes when writing out lots of models, and was hoping for something that
> would automate the process so that I was sure I didn't screw up in the
> formulas.  This may just require more practice on my part, but it seems like
> such a common problem that someone would have written a function to do it.
> It looks like a function could be written using combn() with different
> number of elements.  I will see if I can come up with something.
>
> I am using AIC values to pick the best models and the function modavg() in
> the package AICcmodavg to generate model averages.  I have read a lot about
> the problems of stepwise selection, so I am trying to find something besides
> regsubsets() from leaps package.
>
> Thanks,
>
> Tim
>
>
> Tim Clark
> Department of Zoology
> University of Hawaii
>
> --- On *Wed, 5/19/10, Tal Galili * wrote:
>
>
> From: Tal Galili 
> Subject: Re: [R] Generating all possible models from full model
> To: "Tim Clark" 
> Cc: r-help@r-project.org
> Date: Wednesday, May 19, 2010, 12:47 AM
>
>  Hi Tim,
> So if I understand you correctly, you are talking about 7! models, that's
> hell of alot, are you considering model-selection/multiple-comparisons
> issues when you are picking your models ?
> If you are hoping to do cross validation on such a variaty of models, you
> might find out it wouldn't scale for larger problems.
>
>  There is the
> regsubsets from the {leaps} package.
> Which can also work with biglm and bigglm objects.  I am not sure what
> alternative exists for other glm objects, but it's worth checking.
>
> I can imagine you can write a function that will create all the variable
> combinations using a combo of the functions combn (to create all the
> combinations) with eval+parse.
> But asI wrote, I think you're issue here is the model selection, not just
> the creation of all the models.
>
> Best,
> Tal
>
>
>
>
>
> Contact
> Details:---
> Contact me: 
> tal.gal...@gmail.com|
>   972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> --
>
>
>
>
> On Wed, May 19, 2010 at 2:00 PM, Tim Clark 
> http://us.mc361.mail.yahoo.com/mc/compose?to=mudiver1...@yahoo.com>
> > wrote:
>
>>   Not necessarily.  In the example I included:
>>
>> manta~year*site
>>
>> in the model, which includes both first order terms and interactions:
>>
>> manta~year+site+year:site
>>
>> I am just wanting to know if there is an easier way than writing out all
>> the possible models long-hand given a full model with all desired terms,
>> where some terms may have interactions and others don't.
>>
>>
>

[R] Categorical variables in estimating propensity score

2010-05-19 Thread Qian Zhang
Hello,

I am looking to generate propensity score in R using ps() command. My
question comes from how to include categorical variables in the equation to
generate propensity score. How should I include reference category in the
estimation? Should I drop the reference group manually before running the
estimation or R would automatically omit it as if I include it in the
estimation? I look forward to your response!

Qian Zhang
Department of Sociology
University of South Carolina

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Godavarthi, Murali
Hi Steve,

Thanks so much for your inputs! I was actually trying to implement your
suggestions, I get the below error (please see the results of predict
command below). 

What we are trying to do is to feed in values for about 23
characteristics of an individual, and use the randomForest() function to
determine if the individual is a violent offender. Expected output is 0
or 1, indicating yes/no. 

Am I going wrong again? Here is what I was doing:

1) Created a text file with following data:
"imurder" "itheft" "irobbery" "iassault" "idrug" "iburglary" "igun"
"psych" "Freq" "priors" "firstage" "intage" "sex" "race" "marstat"
"empac" "educ" "zipcode" "suspendmn" "drugs" "alco" "probation" "parole"
"10" 0 0 0 1 0 0 0 0 0 58 19 19 "1" "BLACK" "SINGLE" "UNEMPLD" 0 21215 0
0 0 1 0

The above format in which the text file was created is in the same
format as the one which is already working, but has characteristics of
about 290 individuals fed-in instead of just one individual as above.
Not sure why this doesn't work!


2) Executed the below command sequence:


> library(randomForest)
randomForest 4.5-34
Type rfNews() to see new features/changes/bug fixes.

> load("C://Program Files//R//R-2.10.1//bin//rfoutput")

> testmurali<-read.table("ex.data",T)

> load(testmurali)
Error in load(testmurali) : bad 'file' argument

> load("testmurali")

> names(testmurali)
 [1] "imurder"   "itheft""irobbery"  "iassault"  "idrug"
"iburglary" "igun"  "psych" "Freq"  "priors"   
[11] "firstage"  "intage""sex"   "race"  "marstat"   "empac"
"educ"  "zipcode"   "suspendmn" "drugs"
[21] "alco"  "probation" "parole"   

> predict(rfoutput,newdata=testmurali,type="response")
Error in predict.randomForest(rfoutput, newdata = testmurali, type =
"response") : 
  Type of predictors in new data do not match that of the training data.
> 

The model rfoutput used in the above predict command is also based on a
working example with similar data. 


Also, does load command accept a data string input directly (without
storing it into a file and then providing path of the file as a string)?

Please suggest. Thanks in advance!


Best Regards,
Murali Godavarthi
410-585-3746 (w)

ITCD - DPSCS Data Mining






-Original Message-
From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com] 
Sent: Tuesday, May 18, 2010 4:13 PM
To: Godavarthi, Murali
Cc: r-help@r-project.org
Subject: Re: [R] Regarding the 'R' Load Command

Hi,

On Tue, May 18, 2010 at 2:49 PM, Godavarthi, Murali
 wrote:
> Hi,
>
> I'm new to 'R' and need some help on the "Load" command. Any responses
> will be highly appreciated. Thanks in advance!
>
> As per manuals, the "Load" command expects a binary file input that is
> saved using a "save" command.

Or a path to the file ...

> However it is required that we need to
> call the 'R' program from
>
> Java web application using RJava, and pass a string to the 'R" program
> instead of a binary file. Is it possible?

Yes, pay closer attention to the description for the "file" argument
in the load function (see ?load):

"""a (readable binary) connection **or a character string** giving the
name of the file to load"""

(emphasis mine)

> I was exploring the options of using TextConnections, file connections
> and other types of connections in order to read a stream of input
> (either from a file, stdin etc). I am able to read the string, but the
> Save and Load commands are not accepting the string input. Here is the
> sequence of commands I tried running, and the error received. There is
> no clue on this error, especially when trying to use the eval function
> in randomForest package, even on the internet. Can anyone help please!
>
>> library(randomForest)
>
> randomForest 4.5-34
>
> Type rfNews() to see new features/changes/bug fixes.
>
>
>
>> load("C://Program Files//R//R-2.10.1//bin//rfoutput")
>
>
>
>> zz <- file("ex.data", "w")
>
>
>
>> cat("\"imurder\" \"itheft\" \"irobbery\" \"iassault\" \"idrug\"
> \"iburglary\" \"igun\" \"psych\" \"Freq\" \"priors\" \"firstage\"
> \"intage\" \"sex\" \"race\" \"marstat\" \"empac\"
>
> \"educ\" \"zipcode\" \"suspendmn\" \"drugs\" \"alco\" \"probation\"
> \"parole\"",file = zz, sep = "\n", fill = TRUE)
>
>
>
>> cat("\"10\" 0 0 0 1 0 0 0 0 0 58 19 19 \"1\" \"BLACK\" \"SINGLE\"
> \"UNEMPLD\" 0 21215 0 0 0 1 0",file = zz, sep = "\n", fill = TRUE)

What are you trying to do here?
It looks like you want to save a table of sorts. First create your
data into a data.frame, then save that data.frame to a file using
write.table (or write.csv, etc).

>> save(zz, file = "testmurali", version = 2)

You're saving a file "object" here, not the contents of the file.
Once you successfully serialize your data into a text file, just load
it from "like normal" using read.table (or similar).

Anyway, I'm not sure what we're talking about here, but in short:

1. You need to make sure that you are correctly saving what you think
you're saving.
2. You can pass a character s

Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Godavarthi, Murali
Hi Steve, Gavin

This is being really helpful. I've pasted the working data, and my test
data below after running the str command on both of those variables. The
working sample actually contains about 300 records, hence I am not able
to paste the whole data here. However my sample test data which I am
trying to get working, is only 1 record, and I've pasted the dput result
below. Datatypes  seem to match in both variables for me in terms of
being num/factor. Please suggest where it could be wrong. Thank You!



mytestdata

structure(list(imurder = 0, itheft = 0, irobbery = 0, iassault = 1L,
idrug = 0L, iburglary = 0L, igun = 0L, psych = 0L, Freq = 0L, priors
= 58L, firstage = 19L, intage = 19L, sex = structure(1L, .Label = "1",
class = "factor"), race = structure(1L, .Label = "BLACK", class =
"factor"), marstat = structure(1L, .Label = "SINGLE", class =
"factor"), empac = structure(1L, .Label = "UNEMPLD", class =
"factor"), educ = 0L, zipcode = 21215L, suspendmn = 0L, drugs = 0L,
alco = 0L, probation = 1L, parole = 0L), .Names = c("imurder", "itheft",
"irobbery", "iassault", "idrug", "iburglary", "igun", "psych", "Freq",
"priors", "firstage", "intage", "sex", "race", "marstat", "empac",
"educ", "zipcode", "suspendmn", "drugs", "alco", "probation", "parole"),
class = "data.frame", row.names = "10")


> str(testdata)
'data.frame':   291 obs. of  23 variables:
 $ imurder  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ itheft   : num  0 0 0 0 0 1 0 0 0 0 ...
 $ irobbery : num  0 0 0 0 0 0 0 0 0 0 ...
 $ iassault : num  1 0 1 0 0 0 0 0 0 0 ...
 $ idrug: num  0 1 0 1 1 0 0 1 1 1 ...
 $ iburglary: num  0 0 0 0 0 0 0 0 0 0 ...
 $ igun : num  0 0 0 0 0 0 0 0 0 0 ...
 $ psych: num  0 0 0 0 0 0 0 0 0 0 ...
 $ Freq : num  0 0 0 0 0 0 0 0 0 0 ...
 $ priors   : num  58 4 2 0 6 22 0 36 0 0 ...
 $ firstage : num  19 39 28 0 49 32 0 24 0 55 ...
 $ intage   : num  19 39 28 25 49 32 32 24 30 55 ...
 $ sex  : Factor w/ 2 levels "1","2": 1 2 1 2 2 1 1 1 1 1 ...
 $ race : Factor w/ 5 levels "WHITE","BLACK",..: 2 2 1 1 2 1 1 2 2 2
...
 $ marstat  : Factor w/ 7 levels "SINGLE","MARRIED",..: 1 2 2 1 2 4 7 1
7 3 ...
 $ empac: Factor w/ 6 levels "EMPLD FT","EMPLD PT",..: 3 4 3 3 3 3 6
3 6 3 ...
 $ educ : num  0 0 0 1 0 0 0 0 0 1 ...
 $ zipcode  : num  21215 21217 21223 21223 21217 ...
 $ suspendmn: num  0 600 0 0 60 3 2 479 0 3 ...
 $ drugs: num  0 1 0 0 0 1 0 0 0 1 ...
 $ alco : num  0 0 0 0 0 1 0 0 0 1 ...
 $ probation: num  1 1 0 0 1 1 1 1 0 1 ...
 $ parole   : num  0 0 0 0 0 0 0 0 0 0 ...
> 
> 
> 
> 
> str(mytestdata)
'data.frame':   1 obs. of  23 variables:
 $ imurder  : num 0
 $ itheft   : num 0
 $ irobbery : num 0
 $ iassault : num 1
 $ idrug: num 0
 $ iburglary: num 0
 $ igun : num 0
 $ psych: num 0
 $ Freq : num 0
 $ priors   : num 58
 $ firstage : num 19
 $ intage   : num 19
 $ sex  : Factor w/ 1 level "1": 1
 $ race : Factor w/ 1 level "BLACK": 1
 $ marstat  : Factor w/ 1 level "SINGLE": 1
 $ empac: Factor w/ 1 level "UNEMPLD": 1
 $ educ : num 0
 $ zipcode  : num 21215
 $ suspendmn: num 0
 $ drugs: num 0
 $ alco : num 0
 $ probation: num 1
 $ parole   : num 0
>



Best Regards,

Murali Godavarthi

410-585-3746 (w)

ITCD - DPSCS Data Mining


-Original Message-
From: Gavin Simpson [mailto:gavin.simp...@ucl.ac.uk] 
Sent: Wednesday, May 19, 2010 12:58 PM
To: Steve Lianoglou
Cc: Godavarthi, Murali; r-help@r-project.org
Subject: Re: [R] Regarding the 'R' Load Command

I think the answer is clear from the error: R thinks the type of data in
the components of 'testmurali' do not match those of the data used to
fit the original randomForest.

The OP should go back to his model fitting code and do

str(obj)

where 'obj' is the name of his original data object used to fit the
randomForest and compare it with

str(testmurali)

to see why the types of data are different. Look for variables that were
factors or characters in one data set and numeric/integer in the other.
This smells like a data import issue...

If revelation still doesn't occur Murali, *please* follow Steve's
suggestions and post and message that shows exactly (i.e. the R code
executed) along side a data set *we* can load into R without jumping
through hoops or having to divine what your data look like using a
crystal ball or ESP.

HTH

G

On Wed, 2010-05-19 at 12:24 -0400, Steve Lianoglou wrote:
> Hi Murali,
> 
> I'm sorry, but you're making this too difficult to provide any help.
> Describing what your data structures are and contain is too tedious to
> follow, and end up being rather ambiguous anyway.
> 
> My first guess: by your error message, perhaps the columns of the data
> to predict on are different than the training.
> 
> If you want to get help, please provide your data in a form that we
> can test against. Look at this post by Hadley Wickham to help you do
> that:
> http://gist.github.com/270442
> 
> In particular, not the use of "dput" that you should use so we can

Re: [R] contrasts for lmer model

2010-05-19 Thread Max Kuhn
Basically, I haven't gotten around to adding a contrast.lmer methods
yet. It's only my list, but in the short term, you are welcome to
contribute code to the package.

There have also been discussions on this list about the
appropriateness of such contrats with nonlinear models, but I need to
implement the contrast anyway (to handle the linear cases).

Max

On Wed, May 19, 2010 at 9:23 AM, Kay Cichini  wrote:
>
> plugged it together myself -
> obviously this should work:
>
> m0<-lm(rich ~ gap*stage,data=richness)
>
> cc<-contrast(m0,
> list(stage = levels(stage), gap = "1"),
> list(stage = levels(stage), gap = "0"))
>
> # plug in matrix from contrast(m0)
> summary(glht(m3, linfct = cc$X))
>
>
>         Simultaneous Tests for General Linear Hypotheses
>
> Fit: glmer(formula = rich ~ gap * stage + (1 | site), family = poisson,
>    REML = 1)
>
> Linear Hypotheses:
>       Estimate Std. Error z value Pr(>|z|)
> 1 == 0  0.04820    0.15594   0.309   0.9965
> 2 == 0 -0.44679    0.10204  -4.379 4.78e-05 ***
> 3 == 0 -0.52847    0.09133  -5.786 2.88e-08 ***
> 4 == 0 -0.44396    0.15549  -2.855   0.0171 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Adjusted p values reported -- single-step method)
>
> -
> 
> Kay Cichini
> Postgraduate student
> Institute of Botany
> Univ. of Innsbruck
> 
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/contrasts-for-lmer-model-tp682p820.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange case of partial matching in .[ - possible bug / wrong documentation?

2010-05-19 Thread Hilmar Berger
Hi all,

This occurred in R-2.11.0 (WinXP).

The R-help page of .[ says that:

"Character indices can in some circumstances be partially matched (see
pmatch) to the names or dimnames of the object being subsetted (but never
for subassignment). Unlike S (Becker et al p. 358)), R has never used
partial matching when extracting by [, and as from R 2.7.0 partial matching
is not by default used by [[ (see argument exact)."

My understanding is therefore that .[ should never try partial matching.

However:

> df = data.frame(a=c(1,2,3,9), b=c(4,5,6,10))
> rownames(df) = c("ef","gg","hh","fe")
> df
   a  b
ef 1  4
gg 2  5
hh 3  6
fe 9 10


> df["e",]
  a b
ef 1 4

> rownames(df) = c("ef","gg","hh","efg")
> df["e",]
a  b
NA NA NA

So, it looks like partial matching is done using pmatch("e",rownames(df))
for "[". If this is true, the help page is not correct.

Thanks !
Regards,
Hilmar

---
Hilmar Berger
Integromics S.L. / CNB-CSIC
Madrid, Spain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Object of type 'closure' not subsettable in GROFIT

2010-05-19 Thread trubilar

Hello Everyone, I am quite new in R software.
I am doing a grofit to study growth.
I have a matrix of time with 10 weeks and 26 individuals ie 10 columns x 26
rows
and I have a data.frame with 3 columns: 1 for Individual Id (which repeats
it self, since I have 10 values for each individual correponding to each
week) the size of the individual (1 or 2) and the Length value.

I run the grofit and get the following error
Data error [-1:-3]: object type "closure" not subsettable

I have no idea what it means or if I am doing something wrong!
Could anyone help me?
Thanks


Tamara Rubilar
Lab. Bentos
Centro Nacional Patagonico
Puerto Madryn
Argentina
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Object-of-type-closure-not-subsettable-tp975861p942.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in gam and spatial scale of variables

2010-05-19 Thread Lucia Rueda

Hi Joris,

We're using mgcv. 

We have data on abundance of groupers on line transects that have the same
legth. My coworker has selected a bunch of variables and he has calculated
them in terms of total area in different sizes of buffers around the
centroid of the transect. He has run gam models (quasipoisson, mgcv) for
each explanatory variable at each size of buffer. Then he has selected the
signifficant variables. Some variables explain a higher percentage of
deviance at different sizes of buffers. And now he wants to build a gam
model trying the different explanatory variables but using the values that
correspond to the size of the buffer where they explain a higher deviance,
so one variable might have the values of a smaller scale whereas other might
correspond to a higher buffer size (I don't know if I made myself clear). I
am wondering if this is correct. 

Also I don't know if he should include an offset in spite all the transects
have the same length. 

I'm in charge of looking at the spatial correlation once he builds the
model. I don't know much about it but I was thinking of doing a Moran test,
correlogram and variogram and then if there's spatial autocorrelation doing
gamm, sar or gee. 

Thanks,

Lucia 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp483p976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help Preparing Data for Heatmap creation

2010-05-19 Thread Derek Dees
All -

Below are samples of the data, a description of my approach and the
work I've done so far. My goal is two-fold, first learn more about R
and the creation of heatmaps; second, to create a heat map where the
vertical access is a series of days and the horizontal is the hour of
a day with the color of the cells being determined by the number of
texts in that hour. While the project doesn't have a lot of general
applicability to anything really significant for my job, I'm curious
about the distribution of texts sent by myself, my wife and my
children. So, a project that can scratch two itches with one stick.

My general approach is to:

0: Read in the data from the file downloaded from my cell phone
provider into a 6 column variable.
1: Add a 7th column combining the Date and Time columns, formatted as
a POSIXct value.
2: Create a data frame with 5 columns - day, hour, number sent, number
received, total number
4: Create the heatmap using the first two columns plus whichever of
the remaining 3 I'm interested in at the moment.

The issues I'm currently having revolve around creating the data frame
to hold the information that I will use for the heatmap. I cannot seem
to get round or round.POSIXt to properly round to the date hour
format, and I'm not sure I'm understanding working with POSIXct data.
I also am floundering on how to sum sent/received counts by hour. Not
having a statistics background and being an R newbie, I suspect I'm
asking the wrong questions in the documentation.

Any suggestions regarding how to achieve my aims or points to useful
material beyond "Learning R" and the R man pages would be appreciated
greatly.

I'm using R 2.10.1 on Windows XP.

The code I'm using is:

#!/usr/bin/R
# rdf:
# dc:title textingHeatMap.R
# dc:date 2010.05.14
# dc:creator http://www.mm.com/user/djdees/knows/who#derek-dees
# dc:language R
# dc:rights Copyright ©
# dc:description Creates a heatmap showing # of texts per hour per day.
# doap:SVNRepository
# doap:browse
# doap:homepage
# doap:wiki
# doap:program-language
# doap:version 1.0
# cvs:date $Date$

# Libraries/Packages

# Functions

# Setup
file <- "\\tmp\\R\\UnbilledMessaging.action"

# Read Data
rawData <- read.csv(file, header=TRUE, sep="\t",quote="\"")

# Work Data
rawData$time.stamp <- paste(rawData$Date, rawData$Time)
rawData$time.stamp <- as.POSIXct(rawData$time.stamp, format="%m/%d/%Y %I:%M %p")
format(rawData$time.stamp, "%m/%d/%Y %I")

# Plot

=
Sanitized data looks like:
"Date"  "Time"  "To""From"  "Direction" "Message Type"
"05/03/2010""9:49 AM"   "0123456789""0123456789""Sent"  "--"
"05/03/2010""9:46 AM"   "0123456789""0123456789"
"Received"  "--"

-- 
Derek
=
djd...@gmail.com

The three-legged stool of understanding is held up by history,
languages, and mathematics. Equipped with these three you can learn
anything you want to learn. But if you lack any one of them you are
just another ignorant peasant with dung on your boots. — Robert A.
Heinlein

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joining two (or more) frequency tables

2010-05-19 Thread Wu Gong

> ## Create a sample data.
> data <- data.frame(father.id = letters[1:5],
+ diagnosis = sample(c(100,200,300,340),5,replace=TRUE),
+ diagnosis1 = sample(c(100,200,300,340),5,replace=TRUE),
+ diagnosis2 = sample(c(100,200,300,340),5,replace=TRUE))
> data
  father.id diagnosis diagnosis1 diagnosis2
1 a   340100300
2 b   200300340
3 c   200300100
4 d   200100300
5 e   200100100
> 
> ## Create a matrix by replicating the father.id column 3 times
> ## Because we have 3 diagnosis columns to match when use table() function
> match <- as.matrix(data[,rep(1:2,c(3,0))])
> match
 father.id father.id.1 father.id.2
[1,] "a"   "a" "a"
[2,] "b"   "b" "b"
[3,] "c"   "c" "c"
[4,] "d"   "d" "d"
[5,] "e"   "e" "e"
> 
> ## Count
> table(match,as.matrix(data[,2:4]))
 
match 100 200 300 340
a   1   0   1   1
b   0   1   1   1
c   1   1   1   0
d   1   1   1   0
e   2   1   0   0
> 


Please refer to:
http://r.789695.n4.nabble.com/Counting-Frequencies-in-Data-Frame-tt2221342.html#a2221487
http://r.789695.n4.nabble.com/Counting-Frequencies-in-Data-Frame-tt2221342.html#a2221487
 

-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Joining-two-or-more-frequency-tables-tp576p946.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Displaying smooth bases - mgcv package

2010-05-19 Thread Simon Wood


On Wednesday 19 May 2010 15:08, Joris Meys wrote:
> Dear all,
>
> for demonstration purposes I want to display the basis functions used by a
> thin plate regression spline in a gamm model. I've been searching the help
> files, but I can't really figure out how to get the plots of the basis
> functions. Anybody an idea?
>
> Some toy code :
>
> require(mgcv)
> require(nlme)
>
> x1 <- 1:1000
> x2 <- runif(1000,10,500)
>
> fx1 <- -4*sin(x1/50)
> fx2 <- -10*(x2)^(1/4)
> y <- 60+ fx1 + fx2 + rnorm(1000,0,5)
>
> test <- gamm(y~s(x1)+s(x2))
>
> plot(test$gam)
>

## plots basis functions in space of identifiability constraints, set
## absorb.cons=FALSE for original basis functions
um <- smoothCon(s(x1),data=data.frame(x1=sort(x1)),
  knots=NULL,absorb.cons=TRUE)
X <- um[[1]]$X
plot(sort(x1),X[,1],ylim=range(X),type="l")
for (i in 2:ncol(X)) lines(sort(x1),X[,i],col=i)

best,
Simon






> This gives me the conditional estimates, but not the basis functions for
> the different splines. I'd like to have the basis functions for s(x1) and
> s(x2).
>
> Cheers
> Joris

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating all possible models from full model

2010-05-19 Thread Ben Bolker
Frank E Harrell Jr  Vanderbilt.Edu> writes:

> 
> Please read the large number of notes in the e-mail archive about the 
> invalidity of such modeling procedures.
> 
> Frank
> 

  I'm curious: do you have an objection to multi-model averaging
a la Burnham, Anderson, and White (as implemented in the MuMIn
package)?  i.e., *not* just picking the
best model, and *not* trying to interpret statistical significance
of particular coefficients, but trying to maximize predictive
capability by computing the AIC values of many candidate models
and weighting predictions accordingly (and incorporating among-model variation
when computing prediction uncertainty)?  (I would look for the
answer in your book, but I have lost my copy by loaning it out 
& haven't got a new one yet ...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save in for loop

2010-05-19 Thread Jorge Ivan Velez
Hi Ivan,

How about this?

i <- 1:4
sapply(i,
function(i){
x <- data.frame(a=(i+1):(i+10), b=LETTERS[(i+1):(i+10)])
 save(x, file = paste("file", i, ".rda", sep=""))
}
 )

HTH,
Jorge


On Wed, May 19, 2010 at 10:56 AM, Ivan Calandra <> wrote:

> Dear users,
>
> My problem concerns save() within a for loop.
> Here is my code:
>
> for (i in 1:4) {
>  temp <- data.frame(a=(i+1):(i+10), b=LETTERS[(i+1):(i+10)])
>  filename <- paste("file", i, sep="")
>  assign(filename, temp)
>  save(filename, file=paste(filename, ".rda", sep=""))
> }
>
> As you can see, save() doesn't work as I would like: (1) the object saved
> is called "filename" (instead of "file1", "file2", etc), and (2) it of
> course contains only the name (as character) instead of the data.frame
>
> How can I fix it?
>
> I usually use lists for such cases, but (1) in the real thing, it gets
> complicated with the names and structure (because I want to save lists with
> 3 dimensions instead of simple data.frames, as in this example) and (2) I
> prefer saving each list separately (and I cannot save only one element of an
> object either).
>
> I'm not sure I'm really clear because it's difficult for me to explain it,
> but I hope you'll understand (and let me know what you would help you to
> understand)
>
> Thank you in advance
> Ivan
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calan...@uni-hamburg.de
>
> **
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in gam and spatial scale of variables

2010-05-19 Thread Simon Wood

> We are analizing the relationship between the abundance of groupers in line
> transects and some variables. We are using the quasipoisson distribution.
> Do we need to include the length of the transects as an offset if they all
> have the same length??
--- not just for fitting, I suppose: although I guess you may need some care 
in interpreting the units of the fitted model predictions, if you leave it 
out. 

> Also, can we include in the gam models variables that are measured at
> different spatial scales? We have done an analysis to see what variables
> are better for different sizes of buffers around the transect lines and
> some variables are better at different scales. Can we run the gam model
> with several explanatory variables if they are measured at different
> spatial scales?
--- Do you mean, for example, that that sea surface temperature was measured 
every in 10km grid squares by satellite, whereas salinity was measured every 
quarter nautical mile directly? 

--- If so, I think that you can use such data, but you  need a clear method 
for converting what is measured about the covariate to  a covariate value 
associated with each response measurement. As an example you might have 
salinity measures that are widely scattered, and do not coincide with the 
locations of response measurements. One option is to smooth or interpolate 
the salinity values, and use the resulting predicted salinities at each 
response datum location as covariates. Of course if you do this sort of thing 
it's important that only such predicted salinities are used for predicting 
from the model (i.e. not to switch to direct measurements of salinity for 
prediction)

best,
Simon

>
> Thanks,
>
> Lucia

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] col allocation is not right

2010-05-19 Thread Phil Spector

Changbin -
   Please take a look at the help file for the function
"palette", which is how R maps col= numbers to colors.
Also look at the default output of that function:


palette()
[1] "black"   "red" "green3"  "blue""cyan""magenta" "yellow" 
[8] "gray"


You might get better results by specifying the colors that you
want directly.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 19 May 2010, Changbin Du wrote:


plot(svm.auc, col=2, main="ROC curves comparing classification performance\n
of six machine learning models")
legend(0.5, 0.6, c(ns, nb, nr, nt, nl,ne), 2:6, 9) # Draw a legend.

plot(bo.auc, col=3, add=T) # add=TRUE draws on the existing chart
plot(rf.auc, col=4, add=T)
plot(tree.auc, col=5, add=T)
plot(nn.auc, col=6, add=T)
plot(en.auc, col=9,lty="dotted",lwd=3, add=T)

Hi, Dear community,

I am use the above codes to draw plot, but find that col =9 is not used by
the R.  Instead, it use the col=2 when plot en.auc. WHy this happens and how
to check the col allocation in R?

Thanks!


--
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in gam and spatial scale of variables

2010-05-19 Thread Simon Wood
On Wednesday 19 May 2010 15:29, Joris Meys wrote:
> Could you specify the package you use? If it is mgcv, this one centers your
> variables before applying the smooths. That's something to take into
> account when comparing different models.
--- er, actually it only centres variables in this way for some smoothing 
bases, for numerical stability purposes: but this is done in a way that is 
user transparent and makes absolutely no difference to model interpretation 
or comparison. Of course the smooths themselves are subject to `centering 
constraints'  (but that's very different to centering the variables) ---  
these are just identifiability constraints --- all gam fitting packages have 
to put some identifiability constraints on the smooths, and the centering 
constraints used by `mgcv' and `gam' have the benefit of minimizing the 
standard errors on the constrained smooths. 

best,
Simon

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] col allocation is not right

2010-05-19 Thread Changbin Du
plot(svm.auc, col=2, main="ROC curves comparing classification performance\n
of six machine learning models")
legend(0.5, 0.6, c(ns, nb, nr, nt, nl,ne), 2:6, 9) # Draw a legend.

plot(bo.auc, col=3, add=T) # add=TRUE draws on the existing chart
plot(rf.auc, col=4, add=T)
plot(tree.auc, col=5, add=T)
plot(nn.auc, col=6, add=T)
plot(en.auc, col=9,lty="dotted",lwd=3, add=T)

Hi, Dear community,

I am use the above codes to draw plot, but find that col =9 is not used by
the R.  Instead, it use the col=2 when plot en.auc. WHy this happens and how
to check the col allocation in R?

Thanks!


-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Joshua Wiley
Dear Mohan,

Is this what you want?

rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))

If your State column is a factor, it will return a warning that NAs
were introduced (but the totals will still be at the bottom).  If
State is class character, then that row will have a name "Total" and
then the numbers.

Best regards,

Josh

On Wed, May 19, 2010 at 10:52 AM, Mohan L  wrote:
> Dear All,
>
> I have data some thing like this:
>
>    State Jan Feb Mar Apr May Jun  AAA 1 1 0 2 2 0  BBB 1298 1195 1212 1244
> 1158 845  CCC 0 0 0 1 2 1  DDD 5 11 17 15 10 9  EEE 18 28 27 23 23 16  FFF
> 68 152 184 135 111 86
>
>
> I want to sum all the column(Jan, Feb, Mar ...) and have to merge the total
> at last row. like this:
>
> StateJanFebMarAprMayJunAAA110220BBB12981195121212441158845CCC 000121DDD51117
> 15109EEE182827232316FFF6815218413511186Total    1390 1387 1440 1420 1306
> 957
>
>
> I am doing some thing like this, but I don't know how to merge "Total" to
> "data"  Or I don't know there may be a alternative way.
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
>  State  Jan  Feb  Mar  Apr  May Jun
> 1   AAA    1    1    0    2    2   0
> 2   BBB 1298 1195 1212 1244 1158 845
> 3  CCC     0    0    0    1    2   1
> 4   DDD    5   11   17   15   10   9
> 5   EEE   18   28   27   23   23  16
> 6   FFF   68  152  184  135  111  86
>
>> attributes(data)
> $names
> [1] "State" "Jan"   "Feb"   "Mar"   "Apr"   "May"   "Jun"
>
> $class
> [1] "data.frame"
>
> $row.names
> [1] 1 2 3 4 5 6
>
>
>> x <- data[,2:ncol(data)]
>
>> x
>
>   Jan  Feb  Mar  Apr  May Jun
> 1    1    1    0    2    2   0
> 2 1298 1195 1212 1244 1158 845
> 3    0    0    0    1    2   1
> 4    5   11   17   15   10   9
> 5   18   28   27   23   23  16
> 6   68  152  184  135  111  86
>
>> Total <- sapply(x,sum,na.rm=T)
>
>> Total
>  Jan  Feb  Mar  Apr  May  Jun
> 1390 1387 1440 1420 1306  957
>
> I hope there may be alternative way.  Any help will be appreciated.
>
> Thanks & Rg
> Mohan L
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Blanking out specific cells in a data frame

2010-05-19 Thread Stefan Grosse

Am 19.05.2010 20:08, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):

I do know that you can't actually have unequal column lengths.  The
reality is I am creating a pretty table to export to EXCEL and it

Now that I have Stefan's solution, which turns all the un-needed info
into NAs I can use NAToUnkown() to blank them out completely.
   
when I import data to Excel NA's are imported as empty cell's. So you 
should not even need this function (which I do not know).


Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Blanking out specific cells in a data frame

2010-05-19 Thread Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Hi Ista,

Thanks a lot for your response.  It looks like the solution Stefan
suggested is the same as you are and it works great.  

I do know that you can't actually have unequal column lengths.  The
reality is I am creating a pretty table to export to EXCEL and it
contains some summary statistics as well as the information from a
chi-square test (test stat, df, pvalue).  When I added the info from the
chi-square test to the data frame it populated all the rows and I just
wanted to know how to blank them out.

Now that I have Stefan's solution, which turns all the un-needed info
into NAs I can use NAToUnkown() to blank them out completely.

Thanks, again, for getting back to me!

Jen



-Original Message-
From: Ista Zahn [mailto:istaz...@gmail.com] 
Sent: Wednesday, May 19, 2010 1:59 PM
To: r-help@r-project.org
Cc: Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Subject: Re: [R] Blanking out specific cells in a data frame

Hi Jen,
You cannot have a dataframe with unequal column lengths, so you have two

options (well maybe more, but two that come to mind): set the values to 
missing instead of deleting them, or store your data in a list instead
of a 
data frame.

For option 1 (recommended) all you need is
mydf[-1, 6] <- NA

Best,
Ista
On Wednesday 19 May 2010 1:36:50 pm Sabatier, Jennifer F.
(CDC/OID/NCHHSTP) 
wrote:
> Hi R-Help,
> 
> I am a new R user.  I have used SAS for many years (just FYI on what I
> am used to and possible obstacles it presents).
> 
> I have a data frame:
> 
> mydf <-data.frame(matrix(rnorm(102), ncol=6)
> 
> I would like to be able to delete ALL the information in column 6 for
> rows 2 through r, where r=#rows.
> 
> How can I do that?
> 
> I want the resulting data frame to look like this:
> 
> X1X2  X3  X4  X5  X6
> Data  datadatadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> 
> 
> (Sorry for using the word "data" and not having a cut and paste of my
> actual data frame...the computer I am writing this email on does not
> have R so I have to improvise.)
> 
> Thanks,
> 
> Jen
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save in for loop

2010-05-19 Thread Shi, Tao
Ivan,

Try this:

eval(parse(text=paste("save(file", i, ", file=\"file", i, ".RData\")", sep="")))

...Tao




- Original Message 
> From: Ivan Calandra 
> To: r-help@r-project.org
> Sent: Wed, May 19, 2010 7:56:44 AM
> Subject: [R] save in for loop
> 
> Dear users,

My problem concerns save() within a for loop.
Here is my 
> code:

for (i in 1:4) {
temp <- data.frame(a=(i+1):(i+10), 
> b=LETTERS[(i+1):(i+10)])
filename <- paste("file", i, sep="")

> assign(filename, temp)
save(filename, file=paste(filename, ".rda", 
> sep=""))
}

As you can see, save() doesn't work as I would like: (1) 
> the object saved is called "filename" (instead of "file1", "file2", etc), and 
> (2) it of course contains only the name (as character) instead of the 
> data.frame

How can I fix it?

I usually use lists for such cases, 
> but (1) in the real thing, it gets complicated with the names and structure 
> (because I want to save lists with 3 dimensions instead of simple 
> data.frames, 
> as in this example) and (2) I prefer saving each list separately (and I 
> cannot 
> save only one element of an object either).

I'm not sure I'm really clear 
> because it's difficult for me to explain it, but I hope you'll understand 
> (and 
> let me know what you would help you to understand)

Thank you in 
> advance
Ivan

-- Ivan CALANDRA
PhD Student
University of 
> Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. 
> Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 
> 42838 6231

> href="mailto:ivan.calan...@uni-hamburg.de";>ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__

> ymailto="mailto:R-help@r-project.org"; 
> href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list

> href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank 
> >https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
> guide http://www.R-project.org/posting-guide.html
and provide commented, 
> minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Blanking out specific cells in a data frame

2010-05-19 Thread Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
SWEET!

That's EXACTLY what I need.  Thanks!

I can now just use NAToUnknown to turn the NA into blanks.

Thanks a lot!

Jen


-Original Message-
From: Stefan Grosse [mailto:singularit...@gmx.net] 
Sent: Wednesday, May 19, 2010 1:53 PM
To: r-help@r-project.org; Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Subject: Re: [R] Blanking out specific cells in a data frame

Am 19.05.2010 19:36, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):
> mydf<-data.frame(matrix(rnorm(102), ncol=6)
>
you mean something like:
mydf[2:length(mydf[,1]),6]<-NA

hth
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Blanking out specific cells in a data frame

2010-05-19 Thread Ista Zahn
Hi Jen,
You cannot have a dataframe with unequal column lengths, so you have two 
options (well maybe more, but two that come to mind): set the values to 
missing instead of deleting them, or store your data in a list instead of a 
data frame.

For option 1 (recommended) all you need is
mydf[-1, 6] <- NA

Best,
Ista
On Wednesday 19 May 2010 1:36:50 pm Sabatier, Jennifer F. (CDC/OID/NCHHSTP) 
wrote:
> Hi R-Help,
> 
> I am a new R user.  I have used SAS for many years (just FYI on what I
> am used to and possible obstacles it presents).
> 
> I have a data frame:
> 
> mydf <-data.frame(matrix(rnorm(102), ncol=6)
> 
> I would like to be able to delete ALL the information in column 6 for
> rows 2 through r, where r=#rows.
> 
> How can I do that?
> 
> I want the resulting data frame to look like this:
> 
> X1X2  X3  X4  X5  X6
> Data  datadatadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> Data  datadatadatadata
> 
> 
> (Sorry for using the word "data" and not having a cut and paste of my
> actual data frame...the computer I am writing this email on does not
> have R so I have to improvise.)
> 
> Thanks,
> 
> Jen
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Blanking out specific cells in a data frame

2010-05-19 Thread Stefan Grosse

Am 19.05.2010 19:36, schrieb Sabatier, Jennifer F. (CDC/OID/NCHHSTP):

mydf<-data.frame(matrix(rnorm(102), ncol=6)
   

you mean something like:
mydf[2:length(mydf[,1]),6]<-NA

hth
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Adding column sum to new row in data frame

2010-05-19 Thread Mohan L
Dear All,

I have data some thing like this:

State Jan Feb Mar Apr May Jun  AAA 1 1 0 2 2 0  BBB 1298 1195 1212 1244
1158 845  CCC 0 0 0 1 2 1  DDD 5 11 17 15 10 9  EEE 18 28 27 23 23 16  FFF
68 152 184 135 111 86


I want to sum all the column(Jan, Feb, Mar ...) and have to merge the total
at last row. like this:

StateJanFebMarAprMayJunAAA110220BBB12981195121212441158845CCC 000121DDD51117
15109EEE182827232316FFF6815218413511186Total1390 1387 1440 1420 1306
957


I am doing some thing like this, but I don't know how to merge "Total" to
"data"  Or I don't know there may be a alternative way.

> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
> data
  State  Jan  Feb  Mar  Apr  May Jun
1   AAA11022   0
2   BBB 1298 1195 1212 1244 1158 845
3  CCC 00012   1
4   DDD5   11   17   15   10   9
5   EEE   18   28   27   23   23  16
6   FFF   68  152  184  135  111  86

> attributes(data)
$names
[1] "State" "Jan"   "Feb"   "Mar"   "Apr"   "May"   "Jun"

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4 5 6


> x <- data[,2:ncol(data)]

> x

   Jan  Feb  Mar  Apr  May Jun
111022   0
2 1298 1195 1212 1244 1158 845
300012   1
45   11   17   15   10   9
5   18   28   27   23   23  16
6   68  152  184  135  111  86

> Total <- sapply(x,sum,na.rm=T)

> Total
 Jan  Feb  Mar  Apr  May  Jun
1390 1387 1440 1420 1306  957

I hope there may be alternative way.  Any help will be appreciated.

Thanks & Rg
Mohan L

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Blanking out specific cells in a data frame

2010-05-19 Thread Sabatier, Jennifer F. (CDC/OID/NCHHSTP)
Hi R-Help,

I am a new R user.  I have used SAS for many years (just FYI on what I
am used to and possible obstacles it presents).

I have a data frame:

mydf <-data.frame(matrix(rnorm(102), ncol=6)

I would like to be able to delete ALL the information in column 6 for
rows 2 through r, where r=#rows.

How can I do that?

I want the resulting data frame to look like this:

X1  X2  X3  X4  X5  X6
Datadatadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata
Datadatadatadatadata


(Sorry for using the word "data" and not having a cut and paste of my
actual data frame...the computer I am writing this email on does not
have R so I have to improvise.)

Thanks,

Jen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice: How to specify strip text in an xyplot?

2010-05-19 Thread Peter Ehlers

On 2010-05-19 9:02, Marius Hofert wrote:

Hi,

How can I specify the strip text in the xyplot below?
I also tried to work with strip.names=c("TRUE","TRUE"), but that did not work.

Cheers,

Marius

library(lattice)
x<- 1:10
y<- cbind(1:10,-(1:10))

xyplot(y[,1]+y[,2]~x,outer=TRUE,strip=function(var.name,...)
strip.default(var.name=c("plot 1","plot 2"),...))


xyplot(y[,1]+y[,2]~x,outer=TRUE,
   strip = strip.custom(factor.levels = c("plot 1","plot 2")))

 -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Gavin Simpson
I think the answer is clear from the error: R thinks the type of data in
the components of 'testmurali' do not match those of the data used to
fit the original randomForest.

The OP should go back to his model fitting code and do

str(obj)

where 'obj' is the name of his original data object used to fit the
randomForest and compare it with

str(testmurali)

to see why the types of data are different. Look for variables that were
factors or characters in one data set and numeric/integer in the other.
This smells like a data import issue...

If revelation still doesn't occur Murali, *please* follow Steve's
suggestions and post and message that shows exactly (i.e. the R code
executed) along side a data set *we* can load into R without jumping
through hoops or having to divine what your data look like using a
crystal ball or ESP.

HTH

G

On Wed, 2010-05-19 at 12:24 -0400, Steve Lianoglou wrote:
> Hi Murali,
> 
> I'm sorry, but you're making this too difficult to provide any help.
> Describing what your data structures are and contain is too tedious to
> follow, and end up being rather ambiguous anyway.
> 
> My first guess: by your error message, perhaps the columns of the data
> to predict on are different than the training.
> 
> If you want to get help, please provide your data in a form that we
> can test against. Look at this post by Hadley Wickham to help you do
> that:
> http://gist.github.com/270442
> 
> In particular, not the use of "dput" that you should use so we can
> paste the result in our workspace and get the data objects you try to
> describe. So, to be clear: send us a chunk of text we can paste into R
> that will recreate a workspace that can reproduce your problem. Please
> trim your data files to be only as large as necessary (eg. provide 3
> observations instead of 300)
> 
> Thanks,
> -steve
> 
> On Wed, May 19, 2010 at 11:40 AM, Godavarthi, Murali
>  wrote:
> > Hi Steve,
> >
> > Thanks so much for your inputs! I was actually trying to implement your
> > suggestions, I get the below error (please see the results of predict
> > command below).
> >
> > What we are trying to do is to feed in values for about 23
> > characteristics of an individual, and use the randomForest() function to
> > determine if the individual is a violent offender. Expected output is 0
> > or 1, indicating yes/no.
> >
> > Am I going wrong again? Here is what I was doing:
> >
> > 1) Created a text file with following data:
> > "imurder" "itheft" "irobbery" "iassault" "idrug" "iburglary" "igun"
> > "psych" "Freq" "priors" "firstage" "intage" "sex" "race" "marstat"
> > "empac" "educ" "zipcode" "suspendmn" "drugs" "alco" "probation" "parole"
> > "10" 0 0 0 1 0 0 0 0 0 58 19 19 "1" "BLACK" "SINGLE" "UNEMPLD" 0 21215 0
> > 0 0 1 0
> >
> > The above format in which the text file was created is in the same
> > format as the one which is already working, but has characteristics of
> > about 290 individuals fed-in instead of just one individual as above.
> > Not sure why this doesn't work!
> >
> >
> > 2) Executed the below command sequence:
> >
> >
> >> library(randomForest)
> > randomForest 4.5-34
> > Type rfNews() to see new features/changes/bug fixes.
> >
> >> load("C://Program Files//R//R-2.10.1//bin//rfoutput")
> >
> >> testmurali<-read.table("ex.data",T)
> >
> >> load(testmurali)
> > Error in load(testmurali) : bad 'file' argument
> >
> >> load("testmurali")
> >
> >> names(testmurali)
> >  [1] "imurder"   "itheft""irobbery"  "iassault"  "idrug"
> > "iburglary" "igun"  "psych" "Freq"  "priors"
> > [11] "firstage"  "intage""sex"   "race"  "marstat"   "empac"
> > "educ"  "zipcode"   "suspendmn" "drugs"
> > [21] "alco"  "probation" "parole"
> >
> >> predict(rfoutput,newdata=testmurali,type="response")
> > Error in predict.randomForest(rfoutput, newdata = testmurali, type =
> > "response") :
> >  Type of predictors in new data do not match that of the training data.
> >>
> >
> > The model rfoutput used in the above predict command is also based on a
> > working example with similar data.
> >
> >
> > Also, does load command accept a data string input directly (without
> > storing it into a file and then providing path of the file as a string)?
> >
> > Please suggest. Thanks in advance!
> >
> >
> > Best Regards,
> > Murali Godavarthi
> > 410-585-3746 (w)
> >
> > ITCD - DPSCS Data Mining
> >
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com]
> > Sent: Tuesday, May 18, 2010 4:13 PM
> > To: Godavarthi, Murali
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Regarding the 'R' Load Command
> >
> > Hi,
> >
> > On Tue, May 18, 2010 at 2:49 PM, Godavarthi, Murali
> >  wrote:
> >> Hi,
> >>
> >> I'm new to 'R' and need some help on the "Load" command. Any responses
> >> will be highly appreciated. Thanks in advance!
> >>
> >> As per manuals, the "Load" command expects a binary file input that is
> >> saved using a "save" command.
> >
> 

Re: [R] automate curve drawing on nls() object

2010-05-19 Thread array chip
I do know it, but wanted to spend a little more effort to make generalized 
function for this type of plot.

Thanks

--- On Tue, 5/18/10, Shi, Tao  wrote:

> From: Shi, Tao 
> Subject: Re: [R] automate curve drawing on nls() object
> To: "array chip" , r-help@r-project.org
> Date: Tuesday, May 18, 2010, 9:25 PM
> In this case, Ben's approach is the
> way to go.  
> 
> I'm curious how you fit nls without knowing the model
> formula beforehand?
> 
> ...Tao
> 
> 
> 
> 
> 
> - Original Message 
> > From: array chip 
> > To: r-help@r-project.org;
> TaoShi 
> > Sent: Tue, May 18, 2010 5:22:47 PM
> > Subject: Re: [R] automate curve drawing on nls()
> object
> > 
> > well, this is not going automate enough because you
> have to know how the model 
> > (formula) looks like, and how many parameters there
> are in the model beforehand 
> > to do what you are suggesting.
> 
> Thanks
> 
> 
> --- On Tue, 5/18/10, 
> > Shi, Tao <
> > href="mailto:shida...@yahoo.com";>shida...@yahoo.com>
> wrote:
> 
> > 
> > From: Shi, Tao <
> > href="mailto:shida...@yahoo.com";>shida...@yahoo.com>
> > Subject: Re: 
> > [R] automate curve drawing on nls() object
> > To: "array chip" <
> > ymailto="mailto:arrayprof...@yahoo.com";
> 
> > href="mailto:arrayprof...@yahoo.com";>arrayprof...@yahoo.com>,
> 
> > ymailto="mailto:r-help@r-project.org";
> 
> > href="mailto:r-help@r-project.org";>r-help@r-project.org
> > Date: 
> > Tuesday, May 18, 2010, 7:42 PM
> > I can't directly answer your 
> > question
> > regarding 'expression', but can you just replace b,
> c,d, 
> > and
> > e with coef(obj)[1], coef(obj)[2], ...
> > etc.   You still can 
> > automate the whole
> > process this way, right?
> > 
> > 
> > 
> > 
> > 
> > 
> > - Original Message 
> > > From: array 
> > chip <
> > href="mailto:arrayprof...@yahoo.com";>arrayprof...@yahoo.com>
> > > 
> > To: 
> > href="mailto:r-help@r-project.org";>r-help@r-project.org
> > > Sent: 
> > Tue, May 18, 2010 4:13:33 PM
> > > Subject: [R] automate curve drawing on 
> > nls() object
> > > 
> > > Hi, I would like to use the curve() 
> > function to draw
> > the predicted curve from an 
> > > nls() object. 
> > for 
> > > example:
> > 
> > 
> >
> dd<-read.table("dd.txt",sep='\t',header=T,row.names=1)
> > 
> >
> obj<-nls(y~c+(d-c)/(1+(x/e)^b),data=dd,start=list(b=-1,
> > 
> > > 
> > c=0, d=100, e=150))
> > coef(obj)
> >           b  
> > >        
> >   c   
> >        d    
> > >        e 
> > 
> > -1.1416422   0.6987028 102.8613176 
> > > 135.9373131
> > 
> >
> curve(0.699+(102.86-0.699)/(1+(x/135.94)^(-1.1416)),1,2)
> > 
> > 
> > Now 
> > > I am going to have a lot of datasets to do this,
> so
> > 
> > certainly I would like to 
> > > automate this. Suppose that I can create 
> > a character
> > string for the formula, but 
> > > I am not sure how 
> > to pass that character string into
> > the curve() to make it 
> > > 
> > work. The help page of curve() says the first
> argument
> > is "an expression 
> > written 
> > > as a function of x, or alternatively the name of
> a
> > 
> > function which will be 
> > > plotted". I tried the following, for 
> > example:
> > 
> > substitute(expression(c + 
> > > (d - c)/(1 + 
> > (x/e)^b)),as.list(coef(obj)))
> > will 
> > > return:
> > 
> > "expression(0.698704171233635 + (102.861317499063 - 
> > > 
> > 0.698704171233635)/(1 +
> > 
> > (x/135.937317917920)^-1.14164217993857))"
> > 
> > so I 
> > > 
> > tried:
> > curve(substitute(expression(c + (d - c)/(1 +
> (x/e)^b)), 
> > 
> > > as.list(coef(obj))), 1,2)
> > 
> > but it returns an 
> > error:
> > "Error in 
> > > xy.coords(x, y, xlabel, ylabel, log) : 
> > 
> >   'x' and 'y' lengths 
> > > differ"
> > 
> > Any 
> > suggestions?
> > 
> > 
> > A related question:
> > 
> > If I 
> > do 
> > > this:
> > substitute(expression(c + (d - c)/(1 + 
> > 
> > > (x/e)^b)),as.list(coef(obj)))
> > 
> > I will get: 
> > > 
> > 
> > "expression(0.698704171233635 + (102.861317499063 -
> > 
> > 0.698704171233635)/(1 + 
> > > 
> > (x/135.937317917920)^-1.14164217993857))"
> > 
> > But if I 
> > 
> > > do:
> > 
> >
> substitute(parse(text=as.character(obj$call$formula[3]),srcfile=NULL),as.list(coef(obj)))
> > 
> > 
> > I 
> > > only get:
> > "parse(text = 
> > as.character(obj$call$formula[3]), srcfile =
> > NULL)" 
> > > as a 
> > result, not have b,c,d,e replaced by coefficient
> > 
> > > 
> > values.
> > 
> > where
> >     
> > > 
> > 
> >
> parse(text=as.character(obj$call$formula[3]),srcfile=NULL)
> > returns the 
> > 
> > > wanted expression:
> > "expression(c + (d - c)/(1 + 
> > (x/e)^b))"
> > 
> > Why is 
> > > that?
> > 
> > 
> > Thanks
> > 
> > John
> > 
> > 
> > __
> > 
> > > 
> > ymailto="mailto:
> > href="mailto:R-help@r-project.org";>R-help@r-project.org"
> > 
> > 
> > > href="mailto:
> > href="mailto:R-help@r-project.org";>R-help@r-project.org">
> > ymailto="mailto:R-help@r-project.org";
> 
> > href="mailto:R-help@r-project.org";>R-help@r-project.org

[R] useR!, regular registration deadline today

2010-05-19 Thread Katharine Mullen
This is a reminder that the regular registration deadline for the useR!
2010 conference, July 21-23 (http://www.R-project.org/useR-2010) is today.
After today, late registration will be possible until June 20th.

The link to submit registration information is:
http://www.R-project.org/useR-2010/registration/registration.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calling R-tists

2010-05-19 Thread Tal Galili
Hi Katharine,
Very cool idea!
I republished your call on my blog:
http://www.r-statistics.com/2010/05/user-2010-is-looking-for-a-t-shirt-design/

And encourage people who will create such a design to link to it (for
example by posting about it here on the list), and also to tag it (if they
put it on flickr or picasa and so on) with the tag:
useR2010Tshirt
So it would be easy later to find online.

I hope this call will be heard by creative designers,
Best,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Wed, May 19, 2010 at 6:56 PM, Katharine Mullen  wrote:

> Participants in the R User Conference, useR! 2010, July 21-23,
> (http://R-project.org/useR-2010) will each receive a t-shirt, thanks to
> the sponsorship of Mango Solutions (http://www.mango-solutions.com/).
>
> This email is a call for designs for the front of the t-shirt.  The design
> should be made using a single color of your choice.  The design should be
> in the form of a high-resolution (at least 300 dpi) jpeg, png or gif file,
> and will be printed on a 13'' x 13'' area on the shirt.  There are no
> rules for the content.
>
> The shirts will be white.  The back of the t-shirt will include the useR!
> logo (in blue and black), and the logo of Mango Solutions (in orange and
> black).
>
> The staff of Mango Solutions will decide which design to use.  The creator
> of the chosen design will receive 5 free t-shirts plus a book token (i.e.,
> a gift certificate for a book).  You don't have to be registered for the
> conference to enter a design.
>
> Please email your design proposal by Sunday, June 6, to
> usert-sh...@mango-solutions.com.  The designer that submits the chosen
> design will be notified by June 15th.
>
> Thanks in advance for enriching the conference and the wardrobes of useR!
> participants!
>
> Kate Mullen, for the useR! 2010 Organizing Committee
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple language output - Correct in RGui, wrong in .txt after sink()

2010-05-19 Thread mark . redshaw
I have the following problem with outputting multilingual data to a file. 
I get (except for Korean) what I expect as result in the RGui, but when I 
use sink() to output to a text file loose the characters in the foreign 
languages.
I post a small example below. Since I am not sure how well my email system 
as the list copes with all the different characters I have additionally 
created a pdf version of this example.
The first part of the example behaves as I expect for all languages except 
Korean. I believe that the Korean language may be a problem with the font, 
it would be great if someone could confirm this?
In the second part with output to the txt file I get the  type 
unicode as output not the expected characters. My main problem is how can 
I output the characters as I expect?

> RM_EN <- c("Alfalfa hay","Alfalfa meal","Alfalfa silage")
> RM_DE <- c("Luzerneheu","Lurzernegrünmehl","Luzernesilage")
> RM_RU <- c("Люцерновое сено","Люцерновая травяная мука","Люцерновый 
сенаж")
> RM_CN <- c("苜蓿干草","苜蓿草粉","苜蓿青贮")
> RM_JP <- c("アルファルファ乾草","アルファルファ ミール","アルファルファ 
サイレージ")
> RM_KR <- c("알팔파 건초","알팔파 박","알팔파 사일리지")
> 
> RMLANG <- data.frame(RM_EN,RM_DE,RM_RU,RM_CN,RM_JP,RM_KR)
> nrm <- NROW(RMLANG)
> 
> for(i in 1:nrm)
+ {
+ cat(format("English",width = 12, justify = c("left")), 
as.character(RMLANG$RM_EN[i]),"\n",sep="")
+ cat(format("Deutsch",width = 12, justify = c("left")), 
as.character(RMLANG$RM_DE[i]),"\n",sep="")
+ cat(format("Russian",width = 12, justify = c("left")), 
as.character(RMLANG$RM_RU[i]),"\n",sep="")
+ cat(format("Japanese",   width = 12, justify = c("left")), 
as.character(RMLANG$RM_JP[i]),"\n",sep="")
+ cat(format("Chinese",width = 12, justify = c("left")), 
as.character(RMLANG$RM_CN[i]),"\n",sep="")
+ cat(format("Korean",width = 12, justify = c("left")), 
as.character(RMLANG$RM_KR[i]),"\n","\n","\n",sep="")
+ }
English Alfalfa hay
Deutsch Luzerneheu
Russian Люцерновое сено
Japaneseアルファルファ乾草
Chinese 苜蓿干草
Korean  알팔파 건초

English Alfalfa meal
Deutsch Lurzernegrünmehl
Russian Люцерновая травяная мука
Japaneseアルファルファ ミール
Chinese 苜蓿草粉
Korean  알팔파 박

English Alfalfa silage
Deutsch Luzernesilage
Russian Люцерновый сенаж
Japaneseアルファルファ サイレージ
Chinese 苜蓿青贮
Korean  알팔파 사일리지

> for(i in 1:nrm)
+ {
+ sink("output.txt")
+ cat(format("English",width = 12, justify = c("left")), 
as.character(RMLANG$RM_EN[i]),"\n",sep="")
+ cat(format("Deutsch",width = 12, justify = c("left")), 
as.character(RMLANG$RM_DE[i]),"\n",sep="")
+ cat(format("Japanese",   width = 12, justify = c("left")), 
as.character(RMLANG$RM_JP[i]),"\n",sep="")
+ cat(format("Chinese",width = 12, justify = c("left")), 
as.character(RMLANG$RM_CN[i]),"\n",sep="")
+ cat(format("Korean", width = 12, justify = c("left")), 
as.character(RMLANG$RM_KR[i]),"\n","\n","\n",sep="")
+ sink()
+ }
> 
Output.txt contains:
""
English Alfalfa hay
Deutsch Luzerneheu
Japanese
Korean   

English Alfalfa meal
Deutsch Lurzernegrünmehl
Japanese 
Korean   

English Alfalfa silage
Deutsch Luzernesilage
Japanese 
Korean   
""



many thanks
Mark Redshaw
 Mark Redshaw 
Animal Nutrition Services 
Evonik Degussa GmbH, HN-M-AN, Rodenbacher Chaussee 4, 63457 Hanau, Germany 

Tel: +49 61 81 59 6788 
www.aminoacidsandmore.com 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to plot the sample size on top of the bars of a barchart plot?

2010-05-19 Thread Greg Snow
Yes, and many of us also know why doing so is a bad idea.  

This has been discussed before, if you search the archives you can find the 
full discussion showing how, why not, and better approaches.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Javier
> Sent: Wednesday, May 19, 2010 9:46 AM
> To: r-help@r-project.org
> Subject: [R] How to plot the sample size on top of the bars of a
> barchart plot?
> 
> 
> Dear users,
> 
> Is there anyone how knows how to plot the sample size on top of the
> bars of
> a barchart plot?
> 
> Many thanks in advance
> 
> Javier
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-
> plot-the-sample-size-on-top-of-the-bars-of-a-barchart-plot-
> tp2223062p2223062.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding the 'R' Load Command

2010-05-19 Thread Steve Lianoglou
Hi Murali,

I'm sorry, but you're making this too difficult to provide any help.
Describing what your data structures are and contain is too tedious to
follow, and end up being rather ambiguous anyway.

My first guess: by your error message, perhaps the columns of the data
to predict on are different than the training.

If you want to get help, please provide your data in a form that we
can test against. Look at this post by Hadley Wickham to help you do
that:
http://gist.github.com/270442

In particular, not the use of "dput" that you should use so we can
paste the result in our workspace and get the data objects you try to
describe. So, to be clear: send us a chunk of text we can paste into R
that will recreate a workspace that can reproduce your problem. Please
trim your data files to be only as large as necessary (eg. provide 3
observations instead of 300)

Thanks,
-steve

On Wed, May 19, 2010 at 11:40 AM, Godavarthi, Murali
 wrote:
> Hi Steve,
>
> Thanks so much for your inputs! I was actually trying to implement your
> suggestions, I get the below error (please see the results of predict
> command below).
>
> What we are trying to do is to feed in values for about 23
> characteristics of an individual, and use the randomForest() function to
> determine if the individual is a violent offender. Expected output is 0
> or 1, indicating yes/no.
>
> Am I going wrong again? Here is what I was doing:
>
> 1) Created a text file with following data:
> "imurder" "itheft" "irobbery" "iassault" "idrug" "iburglary" "igun"
> "psych" "Freq" "priors" "firstage" "intage" "sex" "race" "marstat"
> "empac" "educ" "zipcode" "suspendmn" "drugs" "alco" "probation" "parole"
> "10" 0 0 0 1 0 0 0 0 0 58 19 19 "1" "BLACK" "SINGLE" "UNEMPLD" 0 21215 0
> 0 0 1 0
>
> The above format in which the text file was created is in the same
> format as the one which is already working, but has characteristics of
> about 290 individuals fed-in instead of just one individual as above.
> Not sure why this doesn't work!
>
>
> 2) Executed the below command sequence:
>
>
>> library(randomForest)
> randomForest 4.5-34
> Type rfNews() to see new features/changes/bug fixes.
>
>> load("C://Program Files//R//R-2.10.1//bin//rfoutput")
>
>> testmurali<-read.table("ex.data",T)
>
>> load(testmurali)
> Error in load(testmurali) : bad 'file' argument
>
>> load("testmurali")
>
>> names(testmurali)
>  [1] "imurder"   "itheft"    "irobbery"  "iassault"  "idrug"
> "iburglary" "igun"      "psych"     "Freq"      "priors"
> [11] "firstage"  "intage"    "sex"       "race"      "marstat"   "empac"
> "educ"      "zipcode"   "suspendmn" "drugs"
> [21] "alco"      "probation" "parole"
>
>> predict(rfoutput,newdata=testmurali,type="response")
> Error in predict.randomForest(rfoutput, newdata = testmurali, type =
> "response") :
>  Type of predictors in new data do not match that of the training data.
>>
>
> The model rfoutput used in the above predict command is also based on a
> working example with similar data.
>
>
> Also, does load command accept a data string input directly (without
> storing it into a file and then providing path of the file as a string)?
>
> Please suggest. Thanks in advance!
>
>
> Best Regards,
> Murali Godavarthi
> 410-585-3746 (w)
>
> ITCD - DPSCS Data Mining
>
>
>
>
>
>
> -Original Message-
> From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com]
> Sent: Tuesday, May 18, 2010 4:13 PM
> To: Godavarthi, Murali
> Cc: r-help@r-project.org
> Subject: Re: [R] Regarding the 'R' Load Command
>
> Hi,
>
> On Tue, May 18, 2010 at 2:49 PM, Godavarthi, Murali
>  wrote:
>> Hi,
>>
>> I'm new to 'R' and need some help on the "Load" command. Any responses
>> will be highly appreciated. Thanks in advance!
>>
>> As per manuals, the "Load" command expects a binary file input that is
>> saved using a "save" command.
>
> Or a path to the file ...
>
>> However it is required that we need to
>> call the 'R' program from
>>
>> Java web application using RJava, and pass a string to the 'R" program
>> instead of a binary file. Is it possible?
>
> Yes, pay closer attention to the description for the "file" argument
> in the load function (see ?load):
>
> """a (readable binary) connection **or a character string** giving the
> name of the file to load"""
>
> (emphasis mine)
>
>> I was exploring the options of using TextConnections, file connections
>> and other types of connections in order to read a stream of input
>> (either from a file, stdin etc). I am able to read the string, but the
>> Save and Load commands are not accepting the string input. Here is the
>> sequence of commands I tried running, and the error received. There is
>> no clue on this error, especially when trying to use the eval function
>> in randomForest package, even on the internet. Can anyone help please!
>>
>>> library(randomForest)
>>
>> randomForest 4.5-34
>>
>> Type rfNews() to see new features/changes/bug fixes.
>>
>>
>>
>>> load("C://Program Files//R//R-2.10.1//bin//rfoutpu

Re: [R] Where is the construction of a dist object from raw data described?

2010-05-19 Thread Gavin Simpson
On Wed, 2010-05-19 at 10:10 -0400, Steven Lembark wrote:
> Any reference to the appropriate documentation would
> be most appreciated.
> 
> I am using the TSP module for clustering of HIV
> genetic sequences. The distances have already been
> computed and available as either upper-triangular
> or square, i.e.:
> 
> a  1 2 3
> b  4 5
> c  6
> d
> 
> or 
> 
> a 0 1 2 3
> b 1 0 4 5
> c 2 4 0 6
> d 3 5 6 0
> 
> The TSP modules takes in a "dist" object.
> 
> Catch: The only way I can see to get a dist
> object is with dist(), which computes the 
> distances for itself rather than taking them
> as-is.
> 
> Q: How does one convert either of the strucutres
>above into a "dist" object without having to 
>first feed them through dist()?

as.dist() will convert the square matrix into a dist object

I'm not sure of a convenient way of importing data in the "upper" form
you show without having to go via a matrix, or build the dist object by
hand, so if this really is an either/or situation and the square form is
always available and it is not a problem loading the dissimilarity
matrix into RAM, I'd stick with as.dist.

HTH

G

> 
> I can easily split the labels into a seprate
> output file, leaving me with the rownames 
> and colnames values for the result in a separate
> place if that makes explaining how to get the 
> numeric values into a dist any easier.
> 
> Google, searching r-project.org, and the R
> Nutshell book all lead me back to dist() or 
> daisy().
> 
> thanks
> 
> --
> Steven Lembark  85-09 90th St.
> Workhorse Computing   Woodhaven, NY, 11421
> lemb...@wrkhors.com+1 888 359 3508
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >