Re: [R] Want help on data resampling!

2009-11-18 Thread Tal Galili
Use
?sample
With replace On the number of rows, and then use them as indicators for
picking rows from the data set.

For example:

*sample(10, replace=T)*


Read more here:
http://www.ats.ucla.edu/stat/R/library/bootstrap.htm



--


My contact information:
Tal Galili
E-mail: tal.gal...@gmail.com
Phone number: 972-52-7275845
FaceBook: Tal Galili
My Blogs:
http://www.talgalili.com (Web and general, Hebrew)
http://www.biostatistics.co.il (Statistics, Hebrew)
http://www.r-statistics.com/ (Statistics,R, English)




On Thu, Nov 19, 2009 at 9:40 AM, ke fang  wrote:

> Dear all.
>  I have a data matrix that each row containing a specific individual's
> information  including individual observation and  properties. I'm trying to
> use R to create some bootstrap samples with this data matrix. I have tried
> the boot() function in boot package, but it seems that this function need
> one or more statistic to be summarized. I can't just get my data resampled.
> I also tried the resample() function but get nothing. Can some body give me
> some hint on solving this problem?
>  Thanks in advance!
>  Here is some of my data. Rows represent individual. Columns represent
> individual information and observation.
>
> 10 168 133 22.5 1 0 3.45 4.890349
> 11 672 15 25.5 1 0 3.9 2.70805
> 12 168 201 25.7 1 0 3.9 5.303305
> 17 216 125 46.5 0 0 4.7 4.828314
> 18 216 103 95 0 0 9.5 4.634729
> 19 504 92 64 0 0 7 4.521789
> 20 504 52 81.5 0 0 8.2 3.951244
>
>
>  ___
>  ºÃÍæºØ¿¨µÈÄã·¢£¬ÓÊÏäºØ¿¨È«ÐÂÉÏÏߣ¡
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Want help on data resampling!

2009-11-18 Thread ke fang
Dear all.
  I have a data matrix that each row containing a specific individual's 
information  including individual observation and  properties. I'm trying to 
use R to create some bootstrap samples with this data matrix. I have tried the 
boot() function in boot package, but it seems that this function need one or 
more statistic to be summarized. I can't just get my data resampled. I also 
tried the resample() function but get nothing. Can some body give me some hint 
on solving this problem?
  Thanks in advance!
  Here is some of my data. Rows represent individual. Columns represent 
individual information and observation.

10 168 133 22.5 1 0 3.45 4.890349 
11 672 15 25.5 1 0 3.9 2.70805 
12 168 201 25.7 1 0 3.9 5.303305 
17 216 125 46.5 0 0 4.7 4.828314 
18 216 103 95 0 0 9.5 4.634729 
19 504 92 64 0 0 7 4.521789 
20 504 52 81.5 0 0 8.2 3.951244 


  ___ 
  ºÃÍæºØ¿¨µÈÄã·¢£¬ÓÊÏäºØ¿¨È«ÐÂÉÏÏߣ¡ 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cochran's Theorem

2009-11-18 Thread Charles C. Berry

On Wed, 18 Nov 2009, Peng Yu wrote:


I want to understand ANOVA better. But a few textbook that I have do
not describe Cochran's Theorem in details. Could somebody recommend a
book for me?



The theorem is ever so briefly described in Yates obituary of Cochran:

http://www.jstor.org/stable/2982120

If you wish to understand quadratic forms in statistics and are willing to 
invest some time, you cannot do better than to study Rao's book, Linear 
Statistical inference and Its Applications.


HTH,

Chuck



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I change the colour and format for the trelli plot ?

2009-11-18 Thread Deepayan Sarkar
On Thu, Nov 19, 2009 at 6:03 AM, ychu066  wrote:
>
> http://old.nabble.com/file/p26418382/hist1.png hist1.png  i want three plots
> along on the side , how to i do that ?
>
>  and I also want to change the colour of the bars for each plot, how do i do
> that ?
>
> i got the code here to draw that ..

This code did not produce the plot you have linked to. The answer to
your question depends on how you created the plot, so you have to tell
us that. Changing the color in all panels is easy:

histogram(rnorm(100), col = "goldenrod")

Different colors in different panels is a little more work:

histogram(~rnorm(100) | gl(3, 1, 100),
  mycolors = sample(colors(), 3),
  panel = function(..., col, mycolors) {
  panel.histogram(..., col = mycolors[panel.number()])
  })

-Deepayan

> columns <- 8:153
> plots <- vector("list", length(columns))
> j <- 0
> for (i in columns)
> {
>  plots[[ j <- j+1 ]] <-
>    histogram( ~ data[,i],
>      ylab = "Frequency", xlab = "Score",
>      xlim = c(1,5), ylim = c(0,100),
>      main = colnames(data)[i]
>    )
> }
>
> print(plots[[1]])
>
> # or export
>
> for (i in seq_along(plots))
> {
>  png(paste("hist", i, ".png", sep = ""))
>  print(plots[[i]])
>  dev.off()
> }
> --
> View this message in context: 
> http://old.nabble.com/How-do-I-change-the-colour-and-format-for-the-trelli-plot---tp26418382p26418382.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Petr PIKAL
Well, from what you say it seems to me that you could also use Pareto 
charts together with some aggregation of data. But it depends on what you 
want to show to your audience. Below is some code which I slightly adapted 
form original author.

Regards
Petr

#--
# pareto. Produces a Pareto plot of effects. 
# 
# Parameters: 
# effects - vector or matrix of effects to plot. 
# names - vector of names to label the effects. 
# xlab - String to display as the x axis label. 
# ylab - String to display as the y axis label. 
# perlab - Label for the cumulative percentage label. 
# heading - Vector of names for plot heading. 
# 
pareto <- function(effects, names=NULL, xlab=NULL, ylab="Magnitude of 
Effect", indicate.percent=TRUE, perlab="Cumulative Percentage", 
heading=NULL, trunc.perc=.95, long.names=FALSE,...) 
{ 
# set up graphics parameters, note: set las=2 for perpendicular 
axis. 
oldpar <- par( mar=c(6, 4, 2, 4) + 0.1 , las=3) 
on.exit(par(oldpar)) 
 
if( ! is.matrix(effects)) effects<-as.matrix( effects ) 
 
for( i in 1:ncol(effects) ) 
{ 
 
if( i==2 ) oldpar$ask<-par(ask=TRUE)$ask 
# draw bar plot 
eff.ord <- rev(order(abs(effects[,i]))) 
ef <- abs(effects[eff.ord,i]) 
names<-as.character(names)[eff.ord]
# plot barplot 

# get cumulative sum of effects 
sumeff <- cumsum(ef) 
m<-max(ef) 
sm<-sum(ef) 
sumeff <- sumeff/sm

vyber<-sumeff>trunc.perc
suma.ef<-sum(ef[vyber])
sumeff<-c(sumeff[!vyber],1)*m
ef<-c(ef[!vyber],suma.ef)
names<-c(as.character(names[!vyber]),"Dalsi")
ylimit<-max(ef) + max(ef)*0.19 
ylimit<-c(0,ylimit) 
par( mar=c(6, 4, 2, 4) + 0.1 , las=3) 
 
if (long.names) {
x<- barplot(ef, names.arg=names, ylim=ylimit, xlab=xlab, 
ylab=ylab, main=heading[i], plot=F, ...)
x<- barplot(ef, ylim=ylimit, xlab=xlab, ylab=ylab, 
main=heading[i], ...)
text(x,ylimit[2]/10, names, srt=90, adj=0, cex=.7)} else {

x<-barplot(ef, names.arg=names, ylim=ylimit, xlab=xlab, 
ylab=ylab, main=heading[i], ...) 
}


if( indicate.percent == TRUE ){ 



# draws curve. 
lines(x, sumeff, lty="solid", lwd=2, col="purple") 

# draw 80% line 
lines( c(0,max(x)), rep(0.8*m,2) ) 
# draw axis labling percentage. 
at <- c(0:5)* m/5 
axis(4, at=at, 
labels=c("0","20","40","60","80","100"), pos=max(x)+.6) 
# add axis lables 
par(las=0) 
mtext(perlab, 4, line=2) 
} 
 
} # end for each col 
} 


#Don Wingate 


r-help-boun...@r-project.org napsal dne 18.11.2009 16:17:32:

> yes in my data the 1st column is the main category say suppose 
"Secretary"
> the second column is the sub category "HR Dept" the 3rd column is the 
list
> of duties performed by the Secretary from HR dept and 4th column is time
> required to perform the duty
> 
> so there are many such posts and dept with varied duties and times resp.
> 
> Regards
> 
> Our Thoughts have the Power to Change our Destiny.
> Sunita
> 
> 
> On Wed, Nov 18, 2009 at 8:42 PM, Petr PIKAL  
wrote:
> 
> > Hi
> >
> > r-help-boun...@r-project.org napsal dne 18.11.2009 16:01:27:
> >
> > > Yes I tried all the basic ones like box plot, pie chart, etc but the
> > data
> > > representation isnt that clear.
> > >
> >
> > I agree with Tal. But it partly depends on your data. If you have many
> > levels and only few time values in each boxplot would not look well. 
Maybe
> > you could check also ?xtabs or ?table and/or R graph gallery
> > http://addictedtor.free.fr/graphiques/ if you find suitable graph.
> >
> > Regards
> > Petr
> >
> >
> >
> > >
> > > Regards
> > >
> > > Our Thoughts have the Power to Change our Destiny.
> > > Sunita
> > >
> > >
> > > On Wed, Nov 18, 2009 at 7:20 PM, Tal Galili 
> > wrote:
> > >
> > > > I would start with
> > > > ?boxplot
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > My contact information:
> > > > Tal Galili
> > > > E-mail: tal.gal...@gmail.com
> > > > Phone number: 972-52-7275845
> > > > FaceBook: Tal Galili
> > > > My Blogs:
> > > > http://www.talgalili.com (Web and general, Hebrew)
> > > > http://www.biostatistics.co.il (Statistics, Hebrew)
> > > > http://www.r-statistics.com/ (Statistics,R, English)
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 18, 2009 at 2:47 PM, Sunita P

Re: [R] Labels in horizontal dendrogram not placed correctly?

2009-11-18 Thread Chris Campbell
On Mon, Nov 16, 2009 at 07:28, joris meys  wrote:
> Hi all,
>
> I tried plotting a horizontal dendrogram, but it seems as if the
> labels are not taken into account in the function plot.dendrogram().
>
> A minimal example :
> Test <- data.frame(
>    x1x = c(1:10),
>    x2x = c(2:11),
>    x3x = c(11:2)
> )
>
> TestDist <- daisy(data.frame(t(Test)))
> TestAgnes <- agnes(TestDist)
> plot(as.dendrogram(TestAgnes),horiz=T)
>
> If I run this in R 2.10.0, I get a horizontal dendrogram with the
> labels to the far right, and partly outside the plot area. This is
> highly inconvenient. Am I doing something wrong or is this a bug?
>
> Kind regards
> Joris
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Extend your right margin before plotting and you will be able to see the labels:

par(mar=c(5.1, 4.1, 4.1, 5.1))
plot(as.dendrogram(TestAgnes),horiz=T)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem post request with RCurl

2009-11-18 Thread Duncan Temple Lang
Use

 curlPerform(url = 'http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi', postfields = 
q)


That gives me:


  

  

  

  

  
  

  
31406321645402938
  

  

  


Rajarshi Guha wrote:
> Hi, I am trying to use a CGI service (Pubchem PUG) via RCurl and am
> running into a problem where the data must be supplied via POST - but I
> don't know the keyword for the argument.
> 
> The data to be sent is an XML fragment. I can do this via the command
> line using curl: I save the XML string to a file called query.xml and
> then do
> 
> curl -d @query.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi";
> 
> I get the expected response. More importantly, the verbose option shows:
> 
>> Accept: */*
>> Content-Length: 1227
>> Content-Type: application/x-www-form-urlencoded
> 
> However, when I try to do this via RCurl, the data doesn't seem to get
> sent:
> 
> q <- "   
>  
>  
>
>  
>  value=\"summary-table\">0 
>  value=\"assay-central\">0 
>
>  
>
>  
> pccompound 
>
> 3243128 
>
>  
>
>  
>
>  
>
>   "
> 
>> postForm(url, q, style="post", .opts = list(verbose=TRUE))
> * About to connect() to pubchem.ncbi.nlm.nih.gov port 80 (#0)
> *   Trying 130.14.29.110... * connected
> * Connected to pubchem.ncbi.nlm.nih.gov (130.14.29.110) port 80 (#0)
>> POST /pug/pug.cgi HTTP/1.1
> Host: pubchem.ncbi.nlm.nih.gov
> Accept: */*
> Content-Length: 0
> Content-Type: application/x-www-form-urlencoded
> 
> As you can see, the data in q doesn't seem to get sent (content-length =
> 0).
> 
> Does anybody have any suggestions as to why the call to postForm doesn't
> work, but the command line call does?
> 
> Thanks,
> 
> 
> Rajarshi Guha| NIH Chemical Genomics Center
> http://www.rguha.net | http://ncgc.nih.gov
> 
> Q:  Why did the mathematician name his dog "Cauchy"?
> A:  Because he left a residue at every pole.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I change the colour and format for the trelli plot ?

2009-11-18 Thread ychu066

tried reading the help(histogram) but didnt fiind it helpful



ychu066 wrote:
> 
> I have solved the first problem by using layout=c(1,3) 
> 
> but still cant find the solution for the next problem of getting
> differenct colour for the bars in the 3 different histogram plots
> 
> 
> ychu066 wrote:
>> 
>>  http://old.nabble.com/file/p26418382/hist1.png hist1.png  i want three
>> plots along on the side , how to i do that ?
>> 
>>  and I also want to change the colour of the bars for each plot, how do i
>> do that ?
>> 
>> i got the code here to draw that ..
>> columns <- 8:153 
>> plots <- vector("list", length(columns)) 
>> j <- 0 
>> for (i in columns) 
>> {   
>>   plots[[ j <- j+1 ]] <- 
>> histogram( ~ data[,i], 
>>   ylab = "Frequency", xlab = "Score", 
>>   xlim = c(1,5), ylim = c(0,100), 
>>   main = colnames(data)[i]
>> ) 
>> } 
>> 
>> print(plots[[1]]) 
>> 
>> # or export 
>> 
>> for (i in seq_along(plots)) 
>> { 
>>   png(paste("hist", i, ".png", sep = "")) 
>>   print(plots[[i]]) 
>>   dev.off() 
>> } 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-do-I-change-the-colour-and-format-for-the-trelli-plot---tp26418382p26419023.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re placing the dates format in R for exporting the data set...

2009-11-18 Thread ychu066

hey Jim ,

I have solve the column name problems now.  But i am still unable to read
the date in R ...

toms_dat<- replace(toms_dat, toms_dat ==2009-08-24, 6)

the toms_dat is a data frame , and I  want to replace the date to be a
single number eg:1,2,3, 

regards,
Tom.





jholtman wrote:
> 
> First of all '2009-08-06' is 1995; this is probably not what you were
> expecting.  What do you what your expression to do?  Is 'toms_dat' a
> dataframe?  if so, your expression 'toms_dat ==2009-08-06' seem
> strange.  So tell us what you want to do, not how you want to do it.
> 
> On Tue, Nov 17, 2009 at 4:54 PM, ychu066 
> wrote:
>>
>> hi everyone, i am having difficulties with replacing the dates format in
>> R
>> for exporting the data set...
>>
>> eg: the code that i used was
>> toms_dat<- replace(toms_dat, toms_dat ==2009-08-06, 2)
>> toms_dat<- replace(toms_dat, toms_dat ==2009-08-04, 1)
>>
>> but when i export the data as into txt file or excel file the dates come
>> up
>> with very large numbers .:drunk:
>>
>> please help me ...=)
>> --
>> View this message in context:
>> http://old.nabble.com/replacing-the-dates-format-in-R-for-exporting-the-data-set...-tp26396492p26396492.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/replacing-the-dates-format-in-R-for-exporting-the-data-set...-tp26396492p26420068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I change the colour and format for the trelli plot ?

2009-11-18 Thread ychu066

I have solved the first problem by using layout=c(1,3) 

but still cant find the solution for the next problem of getting differenct
colour for the bars in the 3 different histogram plots


ychu066 wrote:
> 
>  http://old.nabble.com/file/p26418382/hist1.png hist1.png  i want three
> plots along on the side , how to i do that ?
> 
>  and I also want to change the colour of the bars for each plot, how do i
> do that ?
> 
> i got the code here to draw that ..
> columns <- 8:153 
> plots <- vector("list", length(columns)) 
> j <- 0 
> for (i in columns) 
> {   
>   plots[[ j <- j+1 ]] <- 
> histogram( ~ data[,i], 
>   ylab = "Frequency", xlab = "Score", 
>   xlim = c(1,5), ylim = c(0,100), 
>   main = colnames(data)[i]
> ) 
> } 
> 
> print(plots[[1]]) 
> 
> # or export 
> 
> for (i in seq_along(plots)) 
> { 
>   png(paste("hist", i, ".png", sep = "")) 
>   print(plots[[i]]) 
>   dev.off() 
> } 
> 

-- 
View this message in context: 
http://old.nabble.com/How-do-I-change-the-colour-and-format-for-the-trelli-plot---tp26418382p26419007.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HOw to delete a row in the data matrix and change the order of the row ???

2009-11-18 Thread ychu066

done thanks 

Uwe Ligges-3 wrote:
> 
> Assign new row names?
> 
> Uwe Ligges
> 
> 
> ychu066 wrote:
>> hi, 
>> 
>> i delete row 65,70,75 in my data
>> data<- data[-c(65,70,75),]  
>> 
>> But i also want the order of the row to match up 
>> eg: 
>> 
>> 67  1111111111111   
>> 1   
>> 1
>> 68  1111111111111   
>> 1   
>> 1
>> 69  1111111111111   
>> 1   
>> 1
>> 71  1111111111111   
>> 1   
>> 1
>> 72  1111111111111   
>> 1   
>> 1
>> 73  1111111111111   
>> 1   
>> 1
>> 74  1111111111111   
>> 1   
>> 1
>> 76  1111111111111   
>> 1   
>> 1
>> 77  1111111111111   
>> 1   
>> 1
>> 
>> I dont want this , I don't want a gap between 69-71 , 73-74 and 74-76.
>> 
>> i want it like this 
>> 67  1111111111111   
>> 1   
>> 1
>> 68  1111111111111   
>> 1   
>> 1
>> 69  1111111111111   
>> 1   
>> 1
>> 70  1111111111111   
>> 1   
>> 1
>> 71  1111111111111   
>> 1   
>> 1
>> 72  1111111111111   
>> 1   
>> 1
>> 73  1111111111111   
>> 1   
>> 1
>> 74  1111111111111   
>> 1   
>> 1
>> 75  1111111111111   
>> 1   
>> 1
>> 
>> please help me ... 
>> 
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/HOw-to-delete-a-row-in-the-data-matrix-and-change-the-order-of-the-row-tp26401860p26419345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: Classification problem. Help!

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 5:12 PM, J_Laberga wrote:



Hello,
I need help with this. Let's say that I have n features that I want  
to use
to predict which class an observation belongs to. Using training  
data I try

to do the following:


training$result <- as.factor(training$result)
model <- glm(result ~., family=binomial("logit"), data = training)


However, when I run the model on my test data I receive predictions  
that
have continuous values. I.e. if I have the classes 0 and 1 in  
"results" I

get predictions of 0.234235 and so on.
How do I force the output to be just 0 or 1? What am I missing?


The fact that predict gives you probabilities? If you want a decision,  
then you need to specify a decision rule, i.e. a threshold.






David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM Param Tuning with using SNOW package

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 12:35 PM, Max Kuhn wrote:


On Tue, Nov 17, 2009 at 6:01 PM, raluca  wrote:


Hello,

Is the first time I am using SNOW package and I am trying to tune  
the cost
parameter for a linear SVM, where the cost (variable cost1) takes  
10 values

between 0.5 and 30.

I have a large dataset and a pc which is not very powerful, so I  
need to

tune the parameters using both CPUs of the pc.

Somehow I cannot manage to do it. It seems that both CPUs are  
fitting the
model for the same values of cost1, I guess the first 5, but not  
for the

last 5.

Please, can anyone help me! :-((


This is pretty easy to do with the train() funciton in the caret
package. From ?train, here is an example for a different data set


library(caret)
library(snow)
library(mlbench)

data(BostonHousing)

mpiCalcs <- function(X, FUN, ...)

+   {
+ theDots <- list(...)
+ parLapply(theDots$cl, X, FUN)
+   }


library(snow)
cl <- makeCluster(5, "MPI")

## 50 bootstrap models distributed across 5 workers
mpiControl <- trainControl(workers = 5,

+number = 50,
+computeFunction = mpiCalcs,
+computeArgs = list(cl = cl))

set.seed(1)
usingMPI <-  train(medv ~ .,

+data = BostonHousing,
+"svmLinear",
+tuneGrid = data.frame(.C = seq(.5, 30, length =  
10)),

+trControl = mpiControl)


stopCluster(cl)

[1] 1



Well, that _was_ interesting. I submitted this job modified to set the  
number of clusters and workers set to eight on a Mac Pro (with 8 cores  
and 16 GB) and watched the cpu usage as reported by Activity  
Monitor.app. The cpu activity is divided into system and user and over  
the course of that run (which took a several minutes) the system  
proportion gradually rose o about 75% of total.


Was it your expectation that this task was comparable in complexity to  
that offered by the OP?


And should I be looking for a tangible result? Looking at usingMPI  
with str() I see a 50 x 506 matrix, no it's a list, usingMPI%control 
$index, of integers as well as quite a bit of other material that  
looks like input and side-effects of the multi-processor activity or  
setup.


--
David




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM: Classification problem. Help!

2009-11-18 Thread Max Kuhn
> What am I missing?

A trip to the help page. predict.glm has details on the "type"
argument specific to this situation:

"the type of prediction required. The default is on the scale of the
linear predictors; the alternative "response" is on the scale of the
response variable. Thus for a default binomial model the default
predictions are of log-odds (probabilities on logit scale) and type =
"response" gives the predicted probabilities. The "terms" option
returns a matrix giving the fitted values of each term in the model
formula on the linear predictor scale."

You will have to do the conversion to the class estimate.

-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem post request with RCurl

2009-11-18 Thread Rajarshi Guha
Hi, I am trying to use a CGI service (Pubchem PUG) via RCurl and am  
running into a problem where the data must be supplied via POST - but  
I don't know the keyword for the argument.


The data to be sent is an XML fragment. I can do this via the command  
line using curl: I save the XML string to a file called query.xml and  
then do


curl -d @query.xml "http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi";

I get the expected response. More importantly, the verbose option shows:

> Accept: */*
> Content-Length: 1227
> Content-Type: application/x-www-form-urlencoded

However, when I try to do this via RCurl, the data doesn't seem to get  
sent:


q <- "InputData_query>  Query_type>  QueryType_qas>QueryActivitySummary>  QueryActivitySummary_output value=\"summary-table\">0QueryActivitySummary_output>  QueryActivitySummary_type value=\"assay-central\">0QueryActivitySummary_type>  QueryActivitySummary_scids>QueryUids>  QueryUids_ids>List>  pccompoundList_db>  List_uids>3243128ID-List_uids_E>  List_uids>   
QueryUids>  QueryActivitySummary_scids>QueryActivitySummary>  PCT-QueryType>PCT-InputData_query>  "


> postForm(url, q, style="post", .opts = list(verbose=TRUE))
* About to connect() to pubchem.ncbi.nlm.nih.gov port 80 (#0)
*   Trying 130.14.29.110... * connected
* Connected to pubchem.ncbi.nlm.nih.gov (130.14.29.110) port 80 (#0)
> POST /pug/pug.cgi HTTP/1.1
Host: pubchem.ncbi.nlm.nih.gov
Accept: */*
Content-Length: 0
Content-Type: application/x-www-form-urlencoded

As you can see, the data in q doesn't seem to get sent (content-length  
= 0).


Does anybody have any suggestions as to why the call to postForm  
doesn't work, but the command line call does?


Thanks,


Rajarshi Guha| NIH Chemical Genomics Center
http://www.rguha.net | http://ncgc.nih.gov

Q:  Why did the mathematician name his dog "Cauchy"?
A:  Because he left a residue at every pole.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Save and load workspace in R: strange error.

2009-11-18 Thread Juancarloshb

Hello,

I get the same message when I write the attach function, where is that you
change the user quota?
I use R with mac OS

Thanks.


Hongxiao Zhu wrote:
> 
> Hi,
> 
> I finally figured out where the problem is. It was because the 
> user account has a 10GB quota for the systerm that I used.
> Once the space limit is reached, R will have this error whenever you
> want to save or load.
> Gosh! It took me so long to realize this.
> 
> Hongxiao 
> **
>   *  Hongxiao Zhu  *
>   *  Department of Statistics, Rice Univeristy *
>   *  Office: DH 3136, Phone: 713-348-2839  *
>   *  http://www.stat.rice.edu/~hxzhu/  *
>   **
> 
> On Wed, 3 Oct 2007, Tony Plate wrote:
> 
>> Did you check whether 'junk4.RData' was created and what its length was - 
>> maybe an empty file is being created.  Is there some sort of quota or 
>> permissions problem?  My suggestion would be to look at the size and 
>> permissions on the directory and the file.  If you need more help, I
>> would 
>> suggest posting more details back to the list, e.g., what OS you are
>> using, 
>> and a directory listing that shows file sizes and permissions (i.e., as
>> you 
>> get with 'ls -l' on Unix systems.)
>>
>> -- Tony Plate
>>
>> Hongxiao Zhu wrote:
>>> Hi,
>>> 
>>> I tried to load a .RData object on unix system using R, it gives error:
>>> 
>>> Error: restore file may be empty -- no data loaded
>>> In addition: Warning message:
>>> file 'junk3.RData' has magic number ''
>>> Use of save versions prior to 2 is deprecated
>>> 
>>> This happens only for using MY user account for the Unix system. I tried 
>>> to use a friends's user account to load the same data object, it is
>>> fine. And it never happened to me before until sometime last week.
>>> And This error happens even when I generate a simple random number
>>> from my user account and save it, and load it again.(So obviously it is 
>>> not a R version mismatch problem). Does anybody know what happened?
>>> 
>>> Here is an example what happened:
>>> 
 x=rnorm(100)
 save.image('junk4.RData')
 load('junk4.RData')
>>> Error: restore file may be empty -- no data loaded
>>> In addition: Warning message:
>>> file 'junk4.RData' has magic number ''
>>> Use of save versions prior to 2 is deprecated
>>> 
>>> Thanks for any suggestion.
>>> 
>>> Hongxiao
>>> 
>>> 
>>> **
>>>   *  Hongxiao Zhu  *
>>>   *  Department of Statistics, Rice Univeristy *
>>>   *  Office: DH 3136, Phone: 713-348-2839  *
>>>   *  http://www.stat.rice.edu/~hxzhu/  *
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>
>>
>> !DSPAM:4703b16f15261021468!
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Save-and-load-workspace-in-R%3A-strange-error.-tp12968832p26418491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I change the colour and format for the trelli plot ?

2009-11-18 Thread ychu066

http://old.nabble.com/file/p26418382/hist1.png hist1.png  i want three plots
along on the side , how to i do that ?

 and I also want to change the colour of the bars for each plot, how do i do
that ?

i got the code here to draw that ..
columns <- 8:153 
plots <- vector("list", length(columns)) 
j <- 0 
for (i in columns) 
{   
  plots[[ j <- j+1 ]] <- 
histogram( ~ data[,i], 
  ylab = "Frequency", xlab = "Score", 
  xlim = c(1,5), ylim = c(0,100), 
  main = colnames(data)[i]
) 
} 

print(plots[[1]]) 

# or export 

for (i in seq_along(plots)) 
{ 
  png(paste("hist", i, ".png", sep = "")) 
  print(plots[[i]]) 
  dev.off() 
} 
-- 
View this message in context: 
http://old.nabble.com/How-do-I-change-the-colour-and-format-for-the-trelli-plot---tp26418382p26418382.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re placing the dates format in R for exporting the data set...

2009-11-18 Thread ychu066


Ok thanks 



jholtman wrote:
> 
> ?write.table
> 
> If you read the help file, and do a little experimenting, you will see
> that there is a parameter 'rownames=FALSE' that may answer your
> question.
> 
> Also since you did not have column names on your input, you get V1,
> V2,...  You can put your own column names.  It helps again to read the
> help file on 'read.table' and look at the parameter 'col.names'.
> There is also the colnames function.  It also might help to (re)read
> the Intro to R.
> 
> On Tue, Nov 17, 2009 at 8:27 PM, ychu066 
> wrote:
>>
>> Moreover,  I want to rename the column name V1,V2,V3,V4.V146.  how do
>> i
>> write the code in R ???
>>
>> thanks everyone that look at the thread/
>>
>>
>>
>> ychu066 wrote:
>>>
>>> hi everyone, i am having difficulties with replacing the dates format in
>>> R
>>> for exporting the data set...
>>>
>>> eg: the code that i used was
>>> toms_dat<- replace(toms_dat, toms_dat ==2009-08-06, 2)
>>> toms_dat<- replace(toms_dat, toms_dat ==2009-08-04, 1)
>>>
>>> but when i export the data as into txt file or excel file the dates come
>>> up with very large numbers .:drunk:
>>>
>>> please help me ...=)
>>>
>> http://old.nabble.com/file/p26400792/what.csv what.csv
>> --
>> View this message in context:
>> http://old.nabble.com/replacing-the-dates-format-in-R-for-exporting-the-data-set...-tp26396492p26400792.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/replacing-the-dates-format-in-R-for-exporting-the-data-set...-tp26396492p26418090.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals - a statistical question, nothing to do with R

2009-11-18 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Moshe Olshansky
> Sent: Wednesday, November 18, 2009 3:21 PM
> To: R-help@r-project.org
> Subject: [R] Confidence intervals - a statistical question, nothing to do 
> with R
> 
> Dear list,
> 
> I have r towns, T1,...,Tr where town i has population Ni. For each town I 
> randomly
> sampled Mi individuals and found that Ki of them have a certain property. So 
> Pi =
> Ki/Mi is an unbiased estimate of the proportion of people in town i having 
> that
> property and the weighted average of Pi is an unbiased estimate of the 
> proportion of
> the entire population (all r towns) having this property.
> I can compute confidence intervals for the proportion of people having that 
> property
> for each city (in my case Mi << Ni and so binomial distribution is a good
> approximation to Ki).
> My question is: how can I compute confidence interval for the proportion of 
> people
> in the entire population (r towns) having that property? Either analytical or 
> numerical
> (simulation?) method will be all right.
> 
> Thank you in advance,
> 
> Moshe.
> 

You might want to look at the survey package for getting appropriate variance 
estimates.

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error using 32-bit R and RODBC package on 64-bit Windows Server OS with R version 2.10

2009-11-18 Thread Marc Schwartz
Just to clarify on your first point, it is not that RODBC cannot work  
with 64 bit ODBC drivers. Given your particular configuration of 64  
bit Windows, 32 bit R and 64 bit ODBC drivers, you are likely running  
into compatibility issues.


From the error message below, it would seem that you are also either  
missing the requisite Oracle client software, or your system  
configuration variables are not set or are not set to the proper paths.


The 32 bit Windows Oracle downloads are available from:

  
http://www.oracle.com/technology/software/tech/oci/instantclient/htdocs/winsoft.html

Now, I don't run Windows (have not in a long time), so I am not clear  
as to the subtleties that may be in play here given that you may be  
installing 32 bit drivers over an existing 64 bit installation. You  
may need to remove the 64 bit install, in order to have a clean  
install of the 32 bit Oracle client apps. If they install into  
different locations, that might help to solve the problem, in which  
case, you need to be careful in configuring any system environment  
variables so that they point to the proper location.


If you have access to in-house tech support or an Oracle SysAdmin, I  
would highly recommend that you seek them out to aid in ensuring that  
you end up with a clean 32 bit Oracle client installation. As I noted  
previously, I would be sure that you can connect to the Oracle server  
using the 32 bit Oracle Instant Client application as a test to ensure  
that OS and Oracle related configuration issues have been resolved.  
Then test the RODBC connection within R. That two step process has  
helped me to debug local configuration issues on both Linux and OSX.


HTH,

Marc Schwartz

On Nov 18, 2009, at 3:12 PM, helpme wrote:

Now that I know RODBC only works with 32-bit ODBC drivers this  
explains the

problem I was having.

The system has a 64 bit ODBC driver is definitely installed. I can  
tell
because when you go to system32 folder and click on odbcad32.exe it  
goes to
the Microsoft ODBC manager where I can select the driver installed  
for the

64-bit Oracle system.

The system32 folder contains the 64 bit driver for ODBC. When I go  
to the
syswow64 directory and click on the odbcad32.exe it does not take me  
to the

Microsoft ODBC manager. Instead I get this error:

Navigate to C:\Windows\syswow64\odbcad32.exe
2.) Select System DSN
3.) Add "Microsoft ODBC for Oracle"
I receive this error: The Oracle(tm) client and networking  
components were
not found. These components are supplied by Oracle Corporation and  
are part
of the Oracle Version 7.3 (or greater) client software installation.  
You

will be unable to use this driver until these components have been
installed.



I don't believe the 32-bit ODBC driver is present. What is the best  
way to
tell if the 32-bit Oracle client software isn't installed and I'm  
wondering
if anyone has experience to install it on a 64-bit system and call  
it from

RODC?


On Mon, Nov 16, 2009 at 4:54 PM, Marc Schwartz  
 wrote:



On Nov 16, 2009, at 2:39 PM, helpme wrote:

I am receiving an error when trying to connect to the Oracle Database

using
RODBC on a 64-bit Windows Server OS. The version of R is 2.10.0- 
win32.exe


Is this the wrong version. Does RODBC only work with 32-bit ODBC  
drivers?


've read over all the posts and documentation manuals.
The system is Windows Server 2003 with R 2.81. and the latest  
downloadable
RODBC package. The Oracle SID/DSN is mfopdw. I made sure to add it  
to
Control Panel->Administrative Priviledges->Microsoft ODBC system/ 
user DNS.


I've also tried the following in no particular order:

1.) Turn on all oracle services in control panel->administrative
priviledges.
2.) Checked tsnnames.ora for SID.
3.) Add microsoft ODBC service to Control Panel services for SID
4.) Use Sqldeveler to test connection another way besides R (It was
successful)
5.) channel<-odbcDriverConnect(
connection="Driver={Microsoft ODBC for Oracle};
DSN=abc,UID=abc;PWD=abc;"case="oracle")

received error drivers SQLAllocHandle on SQL_HANDLE_ENV failed one  
time;
another time I got the error that Oracle client and networking  
components

7.3 or greater is not found.

6.) tnsping mfopdw

lsnrctl start mfopdw

tried to add oracle/bin to path

Nothing is working.



Three quick comments:

1. A better place to post these types of queries would be on the R- 
SIG-DB

e-mail list, which is focused in this domain. More info here:

https://stat.ethz.ch/mailman/listinfo/r-sig-db

2. Prof. Ripley will be a more definitive resource, so I would wait  
until

he might respond.

3. If you have not yet, be sure to read the RODBC vignette, which is
available either via:

vignette("ROBDC")

or online at:

http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf


That all being said, since you have now posted what may be the root  
cause
of your problem, which is the 64/32 bit details, I will venture a  
guess to
say that this may be the pr

[R] Confidence intervals - a statistical question, nothing to do with R

2009-11-18 Thread Moshe Olshansky
Dear list,

I have r towns, T1,...,Tr where town i has population Ni. For each town I 
randomly sampled Mi individuals and found that Ki of them have a certain 
property. So Pi = Ki/Mi is an unbiased estimate of the proportion of people in 
town i having that property and the weighted average of Pi is an unbiased 
estimate of the proportion of the entire population (all r towns) having this 
property.
I can compute confidence intervals for the proportion of people having that 
property for each city (in my case Mi << Ni and so binomial distribution is a 
good approximation to Ki).
My question is: how can I compute confidence interval for the proportion of 
people in the entire population (r towns) having that property? Either 
analytical or numerical (simulation?) method will be all right.

Thank you in advance,

Moshe.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preferred Method for Reading in and Processing Access Database

2009-11-18 Thread Michael Bibo
Jason Rupert  yahoo.com> writes:

> 
> By any chance is there a preferred way to allow R to read in data from an 
Access Database and then also process
> that data?   
> 
> Thanks for any hints and tips since I have traditionally been working with 
csv file. 
> 

I find using package RODBC quite straightforward.  I usually create a query in 
Access to assemble the data I want from various tables, and then just access 
the query from R via RODBC.

Michael Bibo
Queensland Health

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unnecesary code?

2009-11-18 Thread Duncan Murdoch

hunsynte...@hush.com wrote:

Dear R-ers,

While browsing the R sources, I found the following piece of code 
in src\main\memory.c:


static void reset_pp_stack(void *data)
{
R_size_t *poldpps = data;
R_PPStackSize =  *poldpps;
}

To me, it looks like the poldpps pointer is a nuissance; can't you 
just cast the data pointer and derefer it at once? Say,


static void reset_pp_stack(void *data)
{
R_PPStackSize = * (R_size_t *) data;
}
  

What would you gain by this change?

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about function heatmap

2009-11-18 Thread Chris Campbell
On Tue, Nov 17, 2009 at 17:03, Waverley @ Palo Alto
 wrote:
> Hi,
>
> I am using the function heatmap(stats) to draw a microarray heatmap,
> columns are samples and rows are gene features.
>
> I did a 2D clustering during the heatmap drawing.  The features and
> samples indeed cluster into several blocks both vertically and
> horizontally.
>
> I can get the index of re-ordered rows and columns after the heatmap
> drawing by typing the the return variable of the heatmap function.
> However, I cannot  separate these index by the the dendro tree. All
> the indexes labeled at the bottom and right of the plot all jammed
> together.  I cannot by looking at the plot to find where the borders
> are.
>
> Can someone help?  Essentially I want the dendro tree of the genes
> which are grouped after the clustering so that, e.g., I want to check
> whether genes clustered together are in the same pathway etc.
>
> Thanks in advance.
>
> --
> Waverley @ Palo Alto
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

The heatmap function does not return the full clustering information.
If you look in the help, you'll see that the rows and columns are
reordered by:
dd <- as.dendrogram(hclustfun(distfun(X)))
so you need to run your own clustering separately:

x <- 
matrix(rnorm(100),ncol=10,dimnames=list(paste("gene",1:10),paste("sample",1:10)))
x.hclust <- hclust(dist(x))
plot(as.dendrogram(x.hclust))
x.ident <- rect.hclust(x.hclust,k=2)
x.ident

To get the sample clusters, transpose the matrix for the distance calculation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message; ylim + log="y"

2009-11-18 Thread Ravi Varadhan
It can plot log axis from 1 to 10, but that is not what you are plotting.
Your ylim includes 0, and you cannot do log(0).  

This will draw the plotting frame that you want:

plot(c(),c(), xlim=c(1,10), ylim=c(1,10), log="y")

Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Martin Batholdy
Sent: Wednesday, November 18, 2009 5:32 PM
To: r help
Subject: Re: [R] error message; ylim + log="y"

> You have no data to plot.  What were you expecting it to do?


Well, I get the same error messages when I use real data.
So it has to do with the ylim-values specified.
When I get rid of the ylim argument definition it does work.


But why?
I don't understand why R can't plot a logarithmic y-axis from 1 to 10.000.
It doesn't need data for that, does it?




Am 18.11.2009 um 23:19 schrieb jim holtman:

> like this?
> 
>> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
> Error in axis(side = side, at = at, labels = labels, ...) :
>  CreateAtVector [log-axis()]: axp[0] = 0 < 0!
> In addition: Warning messages:
> 1: In is.na(y) : is.na() applied to non-(list or vector) of type 'NULL'
> 2: In plot.window(...) :
>  nonfinite axis limits [GScale(-inf,4,2, .); log=1]
> 3: In axis(side = side, at = at, labels = labels, ...) :
>  CreateAtVector "log"(from axis()): axp[0] = 0 !
> 
> 
> You have no data to plot.  What were you expecting it to do?  When you
> say "lot of error messages", please include them and also follow the
> posting guide.
> 
> On Wed, Nov 18, 2009 at 4:52 PM, Martin Batholdy
>  wrote:
>> Hi,
>> 
>> 
>> I get a lot of error messages with this command, but I don't understand
why;
>> 
>> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
>> 
>> 
>> thanks for any help!
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] row-wise means

2009-11-18 Thread Peter Alspach
Tena koe Anjan

?rowMeans
?apply

You'll need to subset your data matrix to exclude the first column
(e.g., yourData[,-1]).

HTH 

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of ANJAN PURKAYASTHA
> Sent: Thursday, 19 November 2009 11:28 a.m.
> To: r-help@r-project.org
> Subject: [R] row-wise means
> 
> I have a dataframe with 3 columns. The first column stores an 
> index. I would like to calculate the mean of the numbers 
> stored in each of the rest of the columns.
> So,
> here is my data matrix:
> col1 col2 col3
> 1 23 34
> 2 45 56
> 3 23 56
> 4 34 68
> 
> For each row I would like to calculate the means of the 
> numbers stored in
> col2 and col3.
> How can this be done in R?
> TIA,
> Anjan
> 
> --
> =
> anjan purkayastha, phd
> bioinformatics analyst
> whitehead institute for biomedical research nine cambridge 
> center cambridge, ma 02142
> 
> purkayas [at] wi [dot] mit [dot] edu
> 703.740.6939
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear Discriminant Analysis and Wilks Lambda

2009-11-18 Thread Julius Tesoro
Dear all,

I am trying to recreate a discriminant analysis in R based on the article from 
"Dong,J.-J.,etal.,Discriminant analysis of the geomorphic characteristics and 
stability of landslide dams, Geomorphology (2009)". 
I used lda (MASS) to determine the discriminant functions but I noticed that it 
is not the same as in the paper.

I have three questions (1) Why does results from lda() does not show a 
constant? Isnt the discriminant function supposed to be D = a + b1*x1 + b2*x2 + 
... + bm*xm? If there is one, where can I find it? (2) Why are the linear 
discriminant coefficients different from the paper? The discriminant function 
in the paper is:
D = − 2.62*log10(Peak.flow) − 4.67*log10(Dam.height) + 4.57*log10(Dam.width) 
+2.67*log10(Dam.Length) +8.26 (He used SPSS for the analysis)

(3) I used manova to perform the Wilks test. However, I am missing the 
significant values for the Wilks test. How come?

I know these are newbie questions but I hope someone out there may have the 
answer. Thanks all

Here is the code I used for the Linear Discriminant Analysis:
>criteria<-c("Catchment.area", "Stream.order", "Mean.flow", "Peak.flow","UCG", 
>"DCG","Landslide.volume","Landslide.area","HTD",
"Slope.height","Dam.height", "Dam.width", "Dam.length", "Lake.depth", 
"Lake.area", "Dam.volume","SClass")
>tabcrit<-subset(tabata, rowSums(is.na(tabata[criteria]))==0)
>tabcrit<-tabcrit[criteria]

>stabledams<-subset(tabcrit, SClass=="Stable")
>unstabledams<-subset(tabcrit, SClass=="Unstable")

>st<-sample(nrow(stabledams))
>ust<-sample(nrow(unstabledams))

>training <- rbind(stabledams[st[1:5], ], unstabledams[ust[1:17],])
>tr.lda<-lda(SClass~log10(Catchment.area)+log10(Dam.height)+log10(Dam.width)+log10(Dam.length),
> 
data=training)
>tr.lda
Coefficients of linear discriminants:
 LD1
log10(Catchment.area)  1.0967609
log10(Dam.height)  0.9818473
log10(Dam.width)  -1.9813511
log10(Dam.length) -0.7131808

For the Wilks test:

>tr.matrix<-as.matrix(training[-17])
>tr.manova<-manova(tr.matrix~training$SClass)
>tr.wilks<-summary(tr.manova, test="Wilks")
> tr.wilks
Df   Wilks approx F num Df den Df Pr(>F)
training$SClass  1 0.29328  0.75303 16  5 0.6979
Residuals   20   

Cheers,

Julius Tesoro



  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Preferred Method for Reading in and Processing Access Database

2009-11-18 Thread Jason Rupert
By any chance is there a preferred way to allow R to read in data from an 
Access Database and then also process that data?   

Thanks for any hints and tips since I have traditionally been working with csv 
file. 

Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message; ylim + log="y"

2009-11-18 Thread Peter Alspach
Tena koe Martin

This is what I get (it is unclear to me why you don't tell use
specifically what you get):

plot(1,1, xlim=c(1,10), ylim=c(0,1), log="y", type='n')
Warning message:
In plot.window(...) : nonfinite axis limits [GScale(-inf,4,2, .); log=1]
plot(1,1, xlim=c(1,10), ylim=c(1,1), log="y", type='n')

Not unreasonably, R has difficulty determining the y axis limits when
you tell it the minimum is log(0).

HTH 

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Batholdy
> Sent: Thursday, 19 November 2009 11:32 a.m.
> To: r help
> Subject: Re: [R] error message; ylim + log="y"
> 
> > You have no data to plot.  What were you expecting it to do?
> 
> 
> Well, I get the same error messages when I use real data.
> So it has to do with the ylim-values specified.
> When I get rid of the ylim argument definition it does work.
> 
> 
> But why?
> I don't understand why R can't plot a logarithmic y-axis from 
> 1 to 10.000.
> It doesn't need data for that, does it?
> 
> 
> 
> 
> Am 18.11.2009 um 23:19 schrieb jim holtman:
> 
> > like this?
> > 
> >> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
> > Error in axis(side = side, at = at, labels = labels, ...) :
> >  CreateAtVector [log-axis()]: axp[0] = 0 < 0!
> > In addition: Warning messages:
> > 1: In is.na(y) : is.na() applied to non-(list or vector) of 
> type 'NULL'
> > 2: In plot.window(...) :
> >  nonfinite axis limits [GScale(-inf,4,2, .); log=1]
> > 3: In axis(side = side, at = at, labels = labels, ...) :
> >  CreateAtVector "log"(from axis()): axp[0] = 0 !
> > 
> > 
> > You have no data to plot.  What were you expecting it to 
> do?  When you 
> > say "lot of error messages", please include them and also 
> follow the 
> > posting guide.
> > 
> > On Wed, Nov 18, 2009 at 4:52 PM, Martin Batholdy 
> >  wrote:
> >> Hi,
> >> 
> >> 
> >> I get a lot of error messages with this command, but I don't 
> >> understand why;
> >> 
> >> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
> >> 
> >> 
> >> thanks for any help!
> >>[[alternative HTML version deleted]]
> >> 
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >> 
> > 
> > 
> > 
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> > 
> > What is the problem that you are trying to solve?
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data linkage functions for probabilistic linkage using person identifiers

2009-11-18 Thread Doran, Harold
Interesting enough, I just posted a package to CRAN with a function that might 
be useful. It is called MiscPsycho and is for psychometric work. The updated 
version of the package should be available in a day or so. It has a function 
called stringMatch which just implements the Levenshtein distance or a 
normalized version of the distance (what I call the LND). Then, there is a 
function called stringProbs which gives the probability of observing a given 
LND.

In education, we merge data sets all the time using a unique ID. It turns out, 
however, that the unique ID is not so unique. It is often shared by many kids 
over time, duplicated within a year, etc. So, we need to first merge using the 
ID and then validate that we have merged properly using some other mechanism. I 
think the LND is very useful for this purpose.

So, here is an example of the function in this package:

### A perfect match gives an LND of 1
> stringMatch('William Clinton', 'William Clinton', normalize='YES')
[1] 1

### A close match gives an LND less than 1
> stringMatch('William Clinton', 'Bill Clinton', normalize='YES')
[1] 0.733

If your database is small, you can actually look at the records and see if 
values less than 1 are really the same name spelled differently, misspelled, 
etc.

But, if your data set has hundreds of thousands of records that becomes 
impossible. So, what I do is compute the probability that you would observe an 
LND of .7 or higher. This is implemented in the stringProbs function. Let's say 
the probability of observing an LND of .7 is .05 and lower values are even 
higher. Assuming you are willing to live with this much risk, you might then 
subset your data and retain records as "valid merges" only if the LND value is 
greater than .7.

The record linking literature is very big, but it is extremely small in 
education. So, I have a paper in press demonstrating this application and 
comparing it to other linking methods, like use of Soundex codes. In the paper, 
I also discuss how you would combine other demographic information, such as 
birthdates, etc to further explore probabilities of a correct match.

Harold



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David Winsemius
Sent: Wednesday, November 18, 2009 4:32 PM
To: Dagan A WRIGHT
Cc: r-help@r-project.org
Subject: Re: [R] Data linkage functions for probabilistic linkage using person 
identifiers


On Nov 18, 2009, at 1:21 PM, Dagan A WRIGHT wrote:

> I am somewhat new to R although using and liking already.  I am  
> curious if there are any probabilistic packages similar in function  
> to others such and Link King (http://www.the-link-king.com/).  I am  
> looking for functions in SSN, First/Last name, date of birth, and a  
> couple other indicators for matching.
>

Cannot comment on similarities to Link King but have used the  
functions found with this search in similar applications:

RSiteSearch("Levenshtein")  #yes, that is spelled correctly


> Thanks
>
> Dagan Wright, Ph.D., M.S.P.H.
> Lead Addictions Research Analyst, Analysis & Evaluation Unit
> Addictions & Mental Health Division (AMH)
> 500 Summer St. NE E86
> Salem, Oregon 97301-1118
>
> Office number: 503-945-5726
> Fax number: 503-378-8467
> dagan.a.wri...@state.or.us
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM: Classification problem. Help!

2009-11-18 Thread J_Laberga

Hello,
I need help with this. Let's say that I have n features that I want to use
to predict which class an observation belongs to. Using training data I try
to do the following:

> training$result <- as.factor(training$result)
> model <- glm(result ~., family=binomial("logit"), data = training)

However, when I run the model on my test data I receive predictions that
have continuous values. I.e. if I have the classes 0 and 1 in "results" I
get predictions of 0.234235 and so on.
How do I force the output to be just 0 or 1? What am I missing?


Thanks!
John
-- 
View this message in context: 
http://old.nabble.com/GLM%3A-Classification-problem.-Help%21-tp26416707p26416707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message; ylim + log="y"

2009-11-18 Thread jim holtman
like this?

> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
Error in axis(side = side, at = at, labels = labels, ...) :
  CreateAtVector [log-axis()]: axp[0] = 0 < 0!
In addition: Warning messages:
1: In is.na(y) : is.na() applied to non-(list or vector) of type 'NULL'
2: In plot.window(...) :
  nonfinite axis limits [GScale(-inf,4,2, .); log=1]
3: In axis(side = side, at = at, labels = labels, ...) :
  CreateAtVector "log"(from axis()): axp[0] = 0 !


You have no data to plot.  What were you expecting it to do?  When you
say "lot of error messages", please include them and also follow the
posting guide.

On Wed, Nov 18, 2009 at 4:52 PM, Martin Batholdy
 wrote:
> Hi,
>
>
> I get a lot of error messages with this command, but I don't understand why;
>
> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
>
>
> thanks for any help!
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM: Classification problem. Help!

2009-11-18 Thread John
Hello,
I need help with this. Let's say that I have n features that I want to
use to predict which class an observation belongs to. Using training
data I try to do the following:

> training$result <- as.factor(training$result)
> model <- glm(result ~., family=binomial("logit"), data = training)

However, when I run the model on my test data I receive predictions
that have continuous values. I.e. if I have the classes 0 and 1 in
"results" I get predictions of 0.234235 and so on.
How do I force the output to be just 0 or 1? What am I missing?


Thanks!
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] border/box/frame around plot

2009-11-18 Thread Greg Snow
Run the following command after creating your plot (at the point where you 
would run box()):

> par(c('bty','xpd'))

Then look at the help for par, or post the results back here for us to look at 
and have better information on what you problem may be.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of brbell01
> Sent: Wednesday, November 18, 2009 10:16 AM
> To: r-help@r-project.org
> Subject: [R] border/box/frame around plot
> 
> 
> Hello I need to know how to put a closed frame around my plot.  I am
> plotting
> using the igraph package, and I have been able to use box() with
> limited
> success. Box() puts a border around only the upper and right edges of
> the
> plot area, but misses the axes. By default, setting the axes=TRUE in
> igraph
> does not produce closed axes (ie axes that run through the origin and
> up to
> the limits of the plot window).  Any ideas?
> --
> View this message in context: http://old.nabble.com/border-box-frame-
> around-plot-tp26410451p26410451.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting the name of a single object in R for debugging output

2009-11-18 Thread Andrew


Henrique,
 
It works great.  Perfect!   Thank you.
 
Warm regards,
 
Andrew
--- On Wed, 11/18/09, Henrique Dallazuanna  wrote:


Try this:

debugPrint <- function(x, ...){
    print(sprintf("%s: %d", deparse(substitute(x)), x), ...)
}

On Wed, Nov 18, 2009 at 8:35 AM, Andrew  wrote:
> I often use a debug flag (set to TRUE) to turn on various debugging print 
> statements in my R scripts.  I was thinking I should create a function 
> debugPrint(object,debugFlag),
> to print out the object name and contents if the debugFlag is set to TRUE..  
> Then I wouldn't have to make my script ugly(..er) than it already is by 
> adding IF statements all over the place.  I've seen how ls() dumps object 
> names, but how do I get access to the character representation of the name of 
> an object.
>
> E.g.
>
> myVar<- 10
>
> print(myVar) produces "10"
>
> I'd like to print out something like " myVar : 10"
>
> I'd appreciate any suggestions.
>
> Regards,
>
> Andrew
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread anna_l

Ok I´ve been trying to understand what is happening: the data.frame I am
sending on the xls file has been constructed by the following way: I used
the RODBC package to read dates and prices columns into a dataframe so the
first column in excel is of type "date". In the data.frame it is not numeric
but still double. So when I change the xls files I read by turning these
dates datas into numbers, everything works well. So I don´t know if
something has to be changed within the data.frame containing the dates or if
I have to do everything in numbers and use RDCOM after to change the date
column format of the excel file back into date type.


anna_l wrote:
> 
> Thanks Karl, well I am getting an error now after the following sqlSave
> command:
> sqlSave( xlsFile, datas, tablename = 'Datas_and_coefficients', rownames =
> FALSE )
> 
> -->  [RODBC] Failed exec in Update
> 22018 39 [Microsoft][Driver ODBC for Excel]invalid character value for the
> diffusion specification (null) (null)
> 
> 
> More specifically, take a look at the 'append' and 'safer' arguments.
> 
> -- 
> Karl Ove Hufthammer
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 




-
Anna Lippel
new in R so be careful I should be asking a lt of questions!:teeth:
-- 
View this message in context: 
http://old.nabble.com/Writing-a-data-frame-in-an-excel-file-tp26378240p26416861.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM Param Tuning with using SNOW package

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 4:21 PM, raluca wrote:



Hi David,

I have no idea what "magic" you did, but running exactly the same  
code as
you, I have the same problem as before, meaning that I get results  
that are
identical from 2 in 2, while I should get diffrent results for each  
value of

cost1 (which is a vector with 10 values running between 0.5 and 30)


Maybe your should post more details about the hardware? Magic?  I am  
not particularly experienced with parallel process. All I did was read  
the help pages and make a couple of changes that appeared better at  
matching what the functions specified and the samples illustrated.  
This is actually the first parallel code that I have gotten to run.




This is the result I get.

0.2197162, 0.2197162,  0.1467448,  0.1467448,  0.2247955,  0.2247955,
0.1073280, 0.1073280 0.2332475, 0.2332475

Anyway, thanks a lot for trying.

PS. Probably I should switch to Mac :)


I just ran it again (took a couple of seconds on a 2009 unibody  
MacBook Pro (Core 2 Duo) w/ 8 GB):

> RMSEP
[[1]]
[1] 0.1720245

[[2]]
[1] 0.3396405

[[3]]
[1] 0.2359737

[[4]]
[1] 0.203541

[[5]]
[1] 0.1965804

[[6]]
[1] 0.1662158

[[7]]
[1] 0.1705594

[[8]]
[1] 0.2553175

[[9]]
[1] 0.1748892

[[10]]
[1] 0.09500263




David Winsemius wrote:


I cannot really be sure what you are trying to do,  but doing a bit  
of

"surgery" on your code lets it run on a multicore Mac:

library(e1071)
library(snow)
library(pls)

data(gasoline)

X=gasoline$NIR
Y=gasoline$octane

NR=10
cost1=seq(0.5,30, length=NR)

sv.lin<- function(c) {

for (i in 1:NR) {

ind=sample(1:60,50)
gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))

svm.lin   <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
cross=5)
results.lin   <- predict(svm.lin, gTest$X)

e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))

return(e.test.lin)
}
}

cl<- makeCluster(2, type="SOCK" )

clusterEvalQ(cl, library(e1071))
cost1=seq(0.5,30, length=NR)

clusterExport(cl,c("NR","Y","X",  "cost1"))
# Pretty sure you need a copy of cost1 on each node.


RMSEP<-clusterApply(cl, cost1, sv.lin)
# I thought the second argument was the matrix or vector over which  
to

iterate.

stopCluster(cl)

# Since I don't know what the model meant, I cannot determine whehter
this result is interpretable>

RMSEP

[[1]]
[1] 0.1921887

[[2]]
[1] 0.1924917

[[3]]
[1] 0.1885066

[[4]]
[1] 0.1871466

[[5]]
[1] 0.3550932

[[6]]
[1] 0.1226460

[[7]]
[1] 0.2426345

[[8]]
[1] 0.2126299

[[9]]
[1] 0.2276286

[[10]]
[1] 0.2064534

--
David Winsemius, MD

On Nov 18, 2009, at 7:09 AM, raluca wrote:



Hi Charlie,


Yes, you are perfectly right, when I make the clusters I should put
2, not
10 (it remained 10 from previous trials with 10 slaves).

cl<- makeCluster(2, type="SOCK" )

To tell the truth I do not understand very well what the 2nd
parameter for
clusterApplyLB() has to be.

If the function sv.lin has just 1 parameter, sv.lin(c), where c is
the cost,
how should I call clusterApplyLB?


? clusterApply LB(cl, ?,sv.lin, c=cost1)  ?



Below, I am providing a working example, using the gasoline data
that comes
in the pls package.

Thank you for your time!


library(e1071)
library(snow)
library(pls)

data(gasoline)

X=gasoline$NIR
Y=gasoline$octane

NR=10
cost1=seq(0.5,30, length=NR)


sv.lin<- function(c) {

for (i in 1:NR) {

ind=sample(1:60,50)
gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))

svm.lin   <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
cross=5)
results.lin   <- predict(svm.lin, gTest$X)

e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))

return(e.test.lin)
}
}


cl<- makeCluster(2, type="SOCK" )


clusterEvalQ(cl,library(e1071))


clusterExport(cl,c("NR","Y","X"))


RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)

stopCluster(cl)





cls59 wrote:



raluca wrote:


Hello,

Is the first time I am using SNOW package and I am trying to tune
the
cost parameter for a linear SVM, where the cost (variable cost1)
takes 10
values between 0.5 and 30.

I have a large dataset and a pc which is not very powerful, so I
need to
tune the parameters using both CPUs of the pc.

Somehow I cannot manage to do it. It seems that both CPUs are
fitting the
model for the same values of cost1, I guess the first 5, but not
for the
last 5.

Please, can anyone help me!

Here is the code:

data <- data.frame(Y=I(Y),X=I(X))
data.X<-data$X
data.Y<-data$Y





Helping you will be difficult as we're only three lines into your
example
and already I have no idea what the data you are using looks like.
Example code needs to be fully reproducible-- that means a small
slice of
representative data needs to be provided or faked using an
appropriate
random number generator.

Some things did jump out at me about your approach and I've made  
some

notes below.



raluca wrote:


NR=10
cost1=seq(0.5,30, length=NR)

sv.lin<- function(cl,c) {

for (i in 1:NR) {

ind

[R] row-wise means

2009-11-18 Thread ANJAN PURKAYASTHA
I have a dataframe with 3 columns. The first column stores an index. I would
like to calculate the mean of the numbers stored in each of the rest of the
columns.
So,
here is my data matrix:
col1 col2 col3
1 23 34
2 45 56
3 23 56
4 34 68

For each row I would like to calculate the means of the numbers stored in
col2 and col3.
How can this be done in R?
TIA,
Anjan

-- 
=
anjan purkayastha, phd
bioinformatics analyst
whitehead institute for biomedical research
nine cambridge center
cambridge, ma 02142

purkayas [at] wi [dot] mit [dot] edu
703.740.6939

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message; ylim + log="y"

2009-11-18 Thread Martin Batholdy
> You have no data to plot.  What were you expecting it to do?


Well, I get the same error messages when I use real data.
So it has to do with the ylim-values specified.
When I get rid of the ylim argument definition it does work.


But why?
I don't understand why R can't plot a logarithmic y-axis from 1 to 10.000.
It doesn't need data for that, does it?




Am 18.11.2009 um 23:19 schrieb jim holtman:

> like this?
> 
>> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
> Error in axis(side = side, at = at, labels = labels, ...) :
>  CreateAtVector [log-axis()]: axp[0] = 0 < 0!
> In addition: Warning messages:
> 1: In is.na(y) : is.na() applied to non-(list or vector) of type 'NULL'
> 2: In plot.window(...) :
>  nonfinite axis limits [GScale(-inf,4,2, .); log=1]
> 3: In axis(side = side, at = at, labels = labels, ...) :
>  CreateAtVector "log"(from axis()): axp[0] = 0 !
> 
> 
> You have no data to plot.  What were you expecting it to do?  When you
> say "lot of error messages", please include them and also follow the
> posting guide.
> 
> On Wed, Nov 18, 2009 at 4:52 PM, Martin Batholdy
>  wrote:
>> Hi,
>> 
>> 
>> I get a lot of error messages with this command, but I don't understand why;
>> 
>> plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")
>> 
>> 
>> thanks for any help!
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Median on Aggregated data

2009-11-18 Thread William Dunlap
You could use S+.  Its median function has
a weights argument.  E.g.,
   > median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8))
   [1] 3
   > median(c(1,2,3,4e4),  weights=c(1e8,1e8,1,2e8+10))
   [1] 4
   > median(c(1,2,3,4e4),  weights=c(1e8,1e8,1,2e8+1))
   [1] 20001.5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Satsangi, 
> Vivek (GE Capital)
> Sent: Wednesday, November 18, 2009 1:55 PM
> To: r-help@r-project.org
> Subject: [R] Median on Aggregated data
> 
> Folks,
>  
> I have the following code, that works fine on smaller data sets. For
> larger datasets, it runs out of memory and runs way too slow 
> because we
> are essentially creating large vectors with rep() and then calling
> median() on it. (I learned this approach from a post on the web). 
>  
> Below that, I have written the corresponding SAS code. The SAS code
> works fast because I can just tell the proc summary (by the weights
> option) that the Counts variable is a frequency.
>  
> So, the question is, is there a simple way to do the same 
> thing in R? I
> have to run this on a large dataset -- for a small set it is not a
> problem.
>  
>  
> -- Begin R code 
> 
> N <- 1005 * 14; 
> myNorm <- data.frame(PaydexNormingCategory = numeric(N),
> SIC = numeric(N), CatMedian = numeric(N));
>  
> k=1;
> #j = 7941;  ## For testing only
> for (j in levels(SIC)){
>  for (i in levels(PaydexNormingCategory)){
>  myData <- dfpaydex[(Paydex==i) & (SIC==j),];
>  myMedian <- with(myData, 
> levels(Paydex)[median(rep(as.numeric(Paydex),
> Counts))]);
>  myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
>  k <- k+1;
>  }
> }
>  
> -- Begin SAS code
> 
> 
> proc summary data=SASUser.PaydexNormfull nway; 
> 
>class PaydexNormingCategory SIC ;
>weight Counts;
>   var Paydex;
> 
>  output out=outstat (drop=_type_ _freq_)
> median= / autoname;   
>  run;
> 
> -- End SAS code 
> 
> 
> Thanks for your guidance!
> 
> 
> Vivek Satsangi
> GE Capital
> Americas
> 
> GE imagination at work
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Median on Aggregated data

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 4:55 PM, Satsangi, Vivek (GE Capital) wrote:


Folks,

I have the following code, that works fine on smaller data sets. For
larger datasets, it runs out of memory and runs way too slow because  
we

are essentially creating large vectors with rep() and then calling
median() on it. (I learned this approach from a post on the web).

Below that, I have written the corresponding SAS code. The SAS code
works fast because I can just tell the proc summary (by the weights
option) that the Counts variable is a frequency.

So, the question is, is there a simple way to do the same thing in  
R? I

have to run this on a large dataset -- for a small set it is not a
problem.



Not sure and I see no reproducible dataset (that I recognize), but  
Harrell's  Hmisc:::wtd.quantile might be an alternate approach.





-- Begin R code  


N <- 1005 * 14;
myNorm <- data.frame(PaydexNormingCategory = numeric(N),
   SIC = numeric(N), CatMedian = numeric(N));

k=1;
#j = 7941;  ## For testing only
for (j in levels(SIC)){
for (i in levels(PaydexNormingCategory)){
myData <- dfpaydex[(Paydex==i) & (SIC==j),];
myMedian <- with(myData, levels(Paydex)[median(rep(as.numeric(Paydex),
Counts))]);
myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
k <- k+1;
}
}

-- Begin SAS code


proc summary data=SASUser.PaydexNormfull nway;

  class PaydexNormingCategory SIC ;
  weight Counts;
 var Paydex;

output out=outstat (drop=_type_ _freq_)
   median= / autoname;
run;

-- End SAS code  



Thanks for your guidance!


Vivek Satsangi
GE Capital
Americas

GE imagination at work


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Median on Aggregated data

2009-11-18 Thread Satsangi, Vivek (GE Capital)
Folks,
 
I have the following code, that works fine on smaller data sets. For
larger datasets, it runs out of memory and runs way too slow because we
are essentially creating large vectors with rep() and then calling
median() on it. (I learned this approach from a post on the web). 
 
Below that, I have written the corresponding SAS code. The SAS code
works fast because I can just tell the proc summary (by the weights
option) that the Counts variable is a frequency.
 
So, the question is, is there a simple way to do the same thing in R? I
have to run this on a large dataset -- for a small set it is not a
problem.
 
 
-- Begin R code 
N <- 1005 * 14; 
myNorm <- data.frame(PaydexNormingCategory = numeric(N),
SIC = numeric(N), CatMedian = numeric(N));
 
k=1;
#j = 7941;  ## For testing only
for (j in levels(SIC)){
 for (i in levels(PaydexNormingCategory)){
 myData <- dfpaydex[(Paydex==i) & (SIC==j),];
 myMedian <- with(myData, levels(Paydex)[median(rep(as.numeric(Paydex),
Counts))]);
 myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
 k <- k+1;
 }
}
 
-- Begin SAS code


proc summary data=SASUser.PaydexNormfull nway; 

   class PaydexNormingCategory SIC ;
   weight Counts;
  var Paydex;

 output out=outstat (drop=_type_ _freq_)
median= / autoname;   
 run;

-- End SAS code 

Thanks for your guidance!


Vivek Satsangi
GE Capital
Americas

GE imagination at work


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cochran's Theorem

2009-11-18 Thread Peng Yu
I want to understand ANOVA better. But a few textbook that I have do
not describe Cochran's Theorem in details. Could somebody recommend a
book for me?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error message; ylim + log="y"

2009-11-18 Thread Martin Batholdy
Hi,


I get a lot of error messages with this command, but I don't understand why;

plot(c(),c(), xlim=c(1,10), ylim=c(0,1), log="y")


thanks for any help!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to choose appropriate linear model? (ANOVA)

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 4:06 PM, Steve Lianoglou wrote:


Hi,

On Nov 18, 2009, at 3:33 PM, Rolf Turner wrote:



On 19/11/2009, at 9:10 AM, Tal Galili wrote:


Hello Peng,
What you are talking about is "model selection" process.
Although it also sound like you are referring to the more general  
subject of

regression model strategies, consider finding this book:
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322

Frank Harrell is a very insightful lecturer, I heard his writing  
is also

good.

I would love to read recommendation from other R members regarding  
your

question.


Alan Miller's book ``Subset Selection in Regression'' (Chapman and  
Hall,

1990) has some relevance.


You can also look into the "more recent" approaches, like penalized  
regression. Specifically I'm talking about the lasso or elasticnet.  
Look for the relevant papers by Trevor Hastie and  Tibshirani  
(you'll get them from their websites)




Just for the record, Harrell's text cites, discusses and endorses  
penalized approaches. You can also read his more recent presentation  
at his website.


--
David

Lucky for you, the "glmnet" package is available for R, implements  
both the lasso and the elasticnet, and was written by these same  
people.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 |  Memorial Sloan-Kettering Cancer Center
 |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM Param Tuning with using SNOW package

2009-11-18 Thread raluca

Hi David,

I have no idea what "magic" you did, but running exactly the same code as
you, I have the same problem as before, meaning that I get results that are
identical from 2 in 2, while I should get diffrent results for each value of
cost1 (which is a vector with 10 values running between 0.5 and 30) 
This is the result I get.
 
0.2197162, 0.2197162,  0.1467448,  0.1467448,  0.2247955,  0.2247955,
0.1073280, 0.1073280 0.2332475, 0.2332475

Anyway, thanks a lot for trying. 

PS. Probably I should switch to Mac :)


David Winsemius wrote:
> 
> I cannot really be sure what you are trying to do,  but doing a bit of  
> "surgery" on your code lets it run on a multicore Mac:
> 
> library(e1071)
> library(snow)
> library(pls)
> 
> data(gasoline)
> 
> X=gasoline$NIR
> Y=gasoline$octane
> 
> NR=10
> cost1=seq(0.5,30, length=NR)
> 
> sv.lin<- function(c) {
> 
> for (i in 1:NR) {
> 
> ind=sample(1:60,50)
> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
> 
> svm.lin <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
> cross=5)
> results.lin   <- predict(svm.lin, gTest$X)
> 
> e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
> 
> return(e.test.lin)
> }
> }
> 
> cl<- makeCluster(2, type="SOCK" )
> 
> clusterEvalQ(cl, library(e1071))
> cost1=seq(0.5,30, length=NR)
> 
> clusterExport(cl,c("NR","Y","X",  "cost1"))
> # Pretty sure you need a copy of cost1 on each node.
> 
> 
> RMSEP<-clusterApply(cl, cost1, sv.lin)
> # I thought the second argument was the matrix or vector over which to  
> iterate.
> 
> stopCluster(cl)
> 
> # Since I don't know what the model meant, I cannot determine whehter  
> this result is interpretable>
>  > RMSEP
> [[1]]
> [1] 0.1921887
> 
> [[2]]
> [1] 0.1924917
> 
> [[3]]
> [1] 0.1885066
> 
> [[4]]
> [1] 0.1871466
> 
> [[5]]
> [1] 0.3550932
> 
> [[6]]
> [1] 0.1226460
> 
> [[7]]
> [1] 0.2426345
> 
> [[8]]
> [1] 0.2126299
> 
> [[9]]
> [1] 0.2276286
> 
> [[10]]
> [1] 0.2064534
> 
> -- 
> David Winsemius, MD
> 
> On Nov 18, 2009, at 7:09 AM, raluca wrote:
> 
>>
>> Hi Charlie,
>>
>>
>> Yes, you are perfectly right, when I make the clusters I should put  
>> 2, not
>> 10 (it remained 10 from previous trials with 10 slaves).
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>> To tell the truth I do not understand very well what the 2nd  
>> parameter for
>> clusterApplyLB() has to be.
>>
>> If the function sv.lin has just 1 parameter, sv.lin(c), where c is  
>> the cost,
>> how should I call clusterApplyLB?
>>
>>
>> ? clusterApply LB(cl, ?,sv.lin, c=cost1)  ?
>>
>>
>>
>> Below, I am providing a working example, using the gasoline data  
>> that comes
>> in the pls package.
>>
>> Thank you for your time!
>>
>>
>> library(e1071)
>> library(snow)
>> library(pls)
>>
>> data(gasoline)
>>
>> X=gasoline$NIR
>> Y=gasoline$octane
>>
>> NR=10
>> cost1=seq(0.5,30, length=NR)
>>
>>
>> sv.lin<- function(c) {
>>
>> for (i in 1:NR) {
>>
>> ind=sample(1:60,50)
>> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
>> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>>
>> svm.lin<- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
>> cross=5)
>> results.lin   <- predict(svm.lin, gTest$X)
>>
>> e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>>
>> return(e.test.lin)
>> }
>> }
>>
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>>
>> clusterEvalQ(cl,library(e1071))
>>
>>
>> clusterExport(cl,c("NR","Y","X"))
>>
>>
>> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>>
>> stopCluster(cl)
>>
>>
>>
>>
>>
>> cls59 wrote:
>>>
>>>
>>> raluca wrote:

 Hello,

 Is the first time I am using SNOW package and I am trying to tune  
 the
 cost parameter for a linear SVM, where the cost (variable cost1)  
 takes 10
 values between 0.5 and 30.

 I have a large dataset and a pc which is not very powerful, so I  
 need to
 tune the parameters using both CPUs of the pc.

 Somehow I cannot manage to do it. It seems that both CPUs are  
 fitting the
 model for the same values of cost1, I guess the first 5, but not  
 for the
 last 5.

 Please, can anyone help me!

 Here is the code:

 data <- data.frame(Y=I(Y),X=I(X))
 data.X<-data$X
 data.Y<-data$Y


>>>
>>>
>>> Helping you will be difficult as we're only three lines into your  
>>> example
>>> and already I have no idea what the data you are using looks like.
>>> Example code needs to be fully reproducible-- that means a small  
>>> slice of
>>> representative data needs to be provided or faked using an  
>>> appropriate
>>> random number generator.
>>>
>>> Some things did jump out at me about your approach and I've made some
>>> notes below.
>>>
>>>
>>>
>>> raluca wrote:

 NR=10
 cost1=seq(0.5,30, length=NR)

 sv.lin<- function(cl,c) {

 for (i in 1:NR) {

 ind=sample(1:414,276)

 hogTest<-  data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
 ho

Re: [R] Re ading multiple Excel 2007 files with a loop

2009-11-18 Thread neuro
A small example.
regards Christian


> library(gdata)
> fname <- list.files("C:/dm/test",pattern=".xls", full.names = TRUE, recursive 
> =TRUE, ignore.case = TRUE)
> 
> for (sp in 1:length(fname)) {
+ print(fname[sp])
+ data <- read.xls(fname[sp], sheet=1, verbose=FALSE,perl="perl")
+ print(data)
+ }
[1] "C:/dm/test/xls1/file1.xls"
Converting xls file to csv file... Done.
Reading csv file... Done.
A   B
1 100 100
2 200 200
[1] "C:/dm/test/xls1/file2.xls"
Converting xls file to csv file... Done.
Reading csv file... Done.
A   B
1 100 100
2 200 300
[1] "C:/dm/test/xls2/file5.xls"
Converting xls file to csv file... Done.
Reading csv file... Done.
A   B
1 100 100
2 200 300
3 200 100

regards Christian


> -Ursprüngliche Nachricht-
> Von: "Rolf Turner" 
> Gesendet: 18.11.09 21:38:12
> An: "Mark W.Miller" 
> CC: "r-help@r-project.org" 
> Betreff: Re: [R] Re ading multiple Excel 2007 files with a loop


> 
> Have you looked at the read.xls() function from the gdata package?
> It automates the conversion to *.csv for you.  It has worked seamlessly
> for me on the occasions on which I've needed to use it.
> 
>   cheers,
> 
>   Rolf Turner
> 
> On 19/11/2009, at 9:09 AM, Mark W. Miller wrote:
> 
> >
> >
> > I have several hundred Excel 2007 data files in a folder.  I would  
> > like to
> > read every file in a single given folder using a loop.
> >
> > I have searched the FAQ, the forum archives here, other or older R  
> > boards
> > and the R Import / Export documentation, and have asked some very
> > knowledgeable R users without learning of a solution.  I hope  
> > someone here
> > can help.
> >
> > I understand that the most common suggestion is to convert the  
> > files to csv
> > format.  However, there are so many files in my case (ultimately >  
> > 1000) I
> > would rather avoid doing that.
> >
> > I have also found many solutions to this problem for txt files and  
> > files in
> > additional formats other than Excel 2007.
> >
> > I can read three Excel 2007 files one at a time with the following  
> > example
> > code using R 2.10.0 on a computer running Windows (XP, I think):
> >
> >
> >
> >
> > library(RODBC)
> >
> >
> > channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
> > *.xlsx,
> > *.xlsm, *.xlsb);
> > DBQ=U:\\test folder\\testA.xlsx; ReadOnly=False")
> >
> > sqlTables(channel)
> >
> > my.data.A <- sqlFetch(channel, "Sheet1")
> >
> > odbcClose(channel)
> >
> >
> >
> > channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
> > *.xlsx,
> > *.xlsm, *.xlsb);
> > DBQ=U:\\test folder\\testB.xlsx; ReadOnly=False")
> >
> > sqlTables(channel)
> >
> > my.data.B <- sqlFetch(channel, "Sheet1")
> >
> > odbcClose(channel)
> >
> >
> >
> > channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
> > *.xlsx,
> > *.xlsm, *.xlsb);
> > DBQ=U:\\test folder\\testC.xlsx; ReadOnly=False")
> >
> > sqlTables(channel)
> >
> > my.data.C <- sqlFetch(channel, "Sheet1")
> >
> > odbcClose(channel)
> >
> >
> >
> >
> >
> > # However, when I attempt to read the same three files with the  
> > loop below I
> > receive an error:
> >
> >
> >
> >
> > library(RODBC)
> >
> >
> > setwd("U:/test folder")
> >
> >
> > fname <- list.files(pattern=".\\.xlsx", full.names = FALSE,  
> > recursive =
> > TRUE, ignore.case = TRUE)
> >
> > z <- length(fname)
> >
> > print(z)
> >
> >
> > for (sp in 1:z) {
> >
> > channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
> > *.xlsx,
> > *.xlsm, *.xlsb);
> >
> > DBQ=U:\\test folder\\fname[sp]; ReadOnly=False")
> >
> > sqlTables(channel)
> >
> > my.data <- sqlFetch(channel, "Sheet1")
> >
> > print(my.data)
> >
> > odbcClose(channel)
> > }
> >
> >
> >
> >
> > # The error I receive states:
> >
> > Error in odbcTableExists(channel, sqtable) :
> >   ‘Sheet1’: table not found on channel
> >
> >
> > # Thank you sincerely in advance for any help with this problem.
> >
> > Mark Miller
> >
> > Gainesville, Florida
> ##
> Attention: 
> This e-mail message is privileged and confidential. If you are not the 
> intended recipient please delete the message and notify the sender. 
> Any views or opinions presented are solely those of the author.
> 
> This e-mail has been scanned and cleared by MailMarshal 
> www.marshalsoftware.com
> ##
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data linkage functions for probabilistic linkage using person identifiers

2009-11-18 Thread David Winsemius


On Nov 18, 2009, at 1:21 PM, Dagan A WRIGHT wrote:

I am somewhat new to R although using and liking already.  I am  
curious if there are any probabilistic packages similar in function  
to others such and Link King (http://www.the-link-king.com/).  I am  
looking for functions in SSN, First/Last name, date of birth, and a  
couple other indicators for matching.




Cannot comment on similarities to Link King but have used the  
functions found with this search in similar applications:


RSiteSearch("Levenshtein")  #yes, that is spelled correctly



Thanks

Dagan Wright, Ph.D., M.S.P.H.
Lead Addictions Research Analyst, Analysis & Evaluation Unit
Addictions & Mental Health Division (AMH)
500 Summer St. NE E86
Salem, Oregon 97301-1118

Office number: 503-945-5726
Fax number: 503-378-8467
dagan.a.wri...@state.or.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unnecesary code?

2009-11-18 Thread hunsyntesat
Dear R-ers,

While browsing the R sources, I found the following piece of code 
in src\main\memory.c:

static void reset_pp_stack(void *data)
{
R_size_t *poldpps = data;
R_PPStackSize =  *poldpps;
}

To me, it looks like the poldpps pointer is a nuissance; can't you 
just cast the data pointer and derefer it at once? Say,

static void reset_pp_stack(void *data)
{
R_PPStackSize = * (R_size_t *) data;
}

-- Hun

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re ading multiple Excel 2007 files with a loop

2009-11-18 Thread Mark W. Miller

Thank you for all of the responses.  They were all very helpful.  The best
response came from a gentleman at Berkeley who suggested I change the
channel statement to that used below:
Mark Miller

Gainesville, Florida




library(RODBC)


setwd("U:/test folder")


fname <- list.files(pattern=".\\.xlsx", full.names = FALSE, recursive =
TRUE, ignore.case = TRUE)

z <- length(fname)

print(z)


for (sp in 1:z) {

channel <- odbcDriverConnect(paste("DRIVER=Microsoft Excel Driver (*.xls,
*.xlsx, *.xlsm, *.xlsb); DBQ=U:\\test folder\\",fname[sp],";
ReadOnly=False",sep=''))
 
sqlTables(channel)

my.data <- sqlFetch(channel, "Sheet1")

print(my.data)

odbcClose(channel)
}


-- 
View this message in context: 
http://old.nabble.com/Reading-multiple-Excel-2007-files-with-a-loop-tp26414828p26415864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error using 32-bit R and RODBC package on 64-bit Windows Server OS with R version 2.10

2009-11-18 Thread helpme
Now that I know RODBC only works with 32-bit ODBC drivers this explains the
problem I was having.

The system has a 64 bit ODBC driver is definitely installed. I can tell
because when you go to system32 folder and click on odbcad32.exe it goes to
the Microsoft ODBC manager where I can select the driver installed for the
64-bit Oracle system.

The system32 folder contains the 64 bit driver for ODBC. When I go to the
syswow64 directory and click on the odbcad32.exe it does not take me to the
Microsoft ODBC manager. Instead I get this error:

Navigate to C:\Windows\syswow64\odbcad32.exe
2.) Select System DSN
3.) Add "Microsoft ODBC for Oracle"
I receive this error: The Oracle(tm) client and networking components were
not found. These components are supplied by Oracle Corporation and are part
of the Oracle Version 7.3 (or greater) client software installation. You
will be unable to use this driver until these components have been
installed.



I don't believe the 32-bit ODBC driver is present. What is the best way to
tell if the 32-bit Oracle client software isn't installed and I'm wondering
if anyone has experience to install it on a 64-bit system and call it from
RODC?


On Mon, Nov 16, 2009 at 4:54 PM, Marc Schwartz  wrote:

> On Nov 16, 2009, at 2:39 PM, helpme wrote:
>
>  I am receiving an error when trying to connect to the Oracle Database
>> using
>> RODBC on a 64-bit Windows Server OS. The version of R is 2.10.0-win32.exe
>>
>> Is this the wrong version. Does RODBC only work with 32-bit ODBC drivers?
>>
>> 've read over all the posts and documentation manuals.
>> The system is Windows Server 2003 with R 2.81. and the latest downloadable
>> RODBC package. The Oracle SID/DSN is mfopdw. I made sure to add it to
>> Control Panel->Administrative Priviledges->Microsoft ODBC system/user DNS.
>>
>> I've also tried the following in no particular order:
>>
>> 1.) Turn on all oracle services in control panel->administrative
>> priviledges.
>> 2.) Checked tsnnames.ora for SID.
>> 3.) Add microsoft ODBC service to Control Panel services for SID
>> 4.) Use Sqldeveler to test connection another way besides R (It was
>> successful)
>> 5.) channel<-odbcDriverConnect(
>> connection="Driver={Microsoft ODBC for Oracle};
>> DSN=abc,UID=abc;PWD=abc;"case="oracle")
>>
>> received error drivers SQLAllocHandle on SQL_HANDLE_ENV failed one time;
>> another time I got the error that Oracle client and networking components
>> 7.3 or greater is not found.
>>
>> 6.) tnsping mfopdw
>>
>> lsnrctl start mfopdw
>>
>> tried to add oracle/bin to path
>>
>> Nothing is working.
>>
>
> Three quick comments:
>
> 1. A better place to post these types of queries would be on the R-SIG-DB
> e-mail list, which is focused in this domain. More info here:
>
>  https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
> 2. Prof. Ripley will be a more definitive resource, so I would wait until
> he might respond.
>
> 3. If you have not yet, be sure to read the RODBC vignette, which is
> available either via:
>
>  vignette("ROBDC")
>
> or online at:
>
>  http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf
>
>
> That all being said, since you have now posted what may be the root cause
> of your problem, which is the 64/32 bit details, I will venture a guess to
> say that this may be the problem. Since there is not a 64 bit version of R
> for Windows (save I believe the Revolution commercial release), if you are
> using 64 bit Oracle client binaries and ODBC drivers (if they exist), they
> will not be compatible with 32 bit R/RODBC.
>
> I know that on OSX, with 64 bit R/RODBC and 32 bit ODBC drivers for Oracle,
> the connectivity would not work, so it seems reasonable that the reverse
> configuration would not be compatible either.
>
> So, first, I would be sure that you are using 32 bit ODBC drivers for
> Oracle on Windows and not 64 bit. If you installed any other Oracle client
> related software, that likely also needs to be 32 bit as well.
>
> Then I would review the above vignette document and be sure that any
> general installation references and those specifically pertaining to Windows
> have been followed consistently, especially configuring $PATH and other
> environmental configuration items required for Oracle itself, which on some
> platforms usually include things like $ORACLE_HOME, $TNS_ADMIN and so forth.
>  You indicate above:
>
>
>  "tried to add oracle/bin to path"
>
> which does not definitively indicate that you actually did so. Did you?
>  Also, check the capitalization, as the path is normally something like
> c:\Oracle\bin.
>
> If you can connect to the Oracle server using Oracle's own clients such as
> the InstantClient, that typically means that most of the system
> configuration issues are correctly set up. If that connection is successful,
> then it may bring us back to the 32/64 bit conflict.
>
> HTH,
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r

Re: [R] SOM library - where do I find it

2009-11-18 Thread Julian Burgos
You have to download it from CRAN and install it.  From the GUI, do
Packages->Install package(s).

Pretty basic stuff...you should check the documentation before posting.

Julian


>
>
> R version 2.9.2 (2009-08-24) - for windows
>
>> library(SOM)
> Error in library(SOM) : there is no package called 'SOM'
>
> Where can I get the SOM library from?
>
> Thanks in advance
> --
> View this message in context:
> http://old.nabble.com/SOM-library---where-do-I-find-it-tp26415633p26415633.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to choose appropriate linear model? (ANOVA)

2009-11-18 Thread Steve Lianoglou

Hi,

On Nov 18, 2009, at 3:33 PM, Rolf Turner wrote:



On 19/11/2009, at 9:10 AM, Tal Galili wrote:


Hello Peng,
What you are talking about is "model selection" process.
Although it also sound like you are referring to the more general  
subject of

regression model strategies, consider finding this book:
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322

Frank Harrell is a very insightful lecturer, I heard his writing is  
also

good.

I would love to read recommendation from other R members regarding  
your

question.


Alan Miller's book ``Subset Selection in Regression'' (Chapman and  
Hall,

1990) has some relevance.


You can also look into the "more recent" approaches, like penalized  
regression. Specifically I'm talking about the lasso or elasticnet.  
Look for the relevant papers by Trevor Hastie and  Tibshirani (you'll  
get them from their websites)


Lucky for you, the "glmnet" package is available for R, implements  
both the lasso and the elasticnet, and was written by these same people.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SOM library - where do I find it

2009-11-18 Thread Tobias Verbeke

Hi tdm,

tdm wrote:


R version 2.9.2 (2009-08-24) - for windows


library(SOM)

Error in library(SOM) : there is no package called 'SOM'

Where can I get the SOM library from?

Thanks in advance


R is case-sensitive, so

install.packages("som")
library("som")
?som

http://cran.r-project.org/web/packages/som/index.html

HTH,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Perform operations on dataframes called with paste in loops

2009-11-18 Thread Greg Snow
There are a few options:

You can read the help page for the function that you used to assign names to 
the data frames when you read them in (the 'see also' section is there for a 
reason).

You can read the FAQ (7.21 to be specific, but the others could save you 
re-asking FAQs in the future)

You can take a better approach by reading all your data sets into a list (use 
lapply on the vector of names), then use lapply on the list of datasets and 
avoid all the future headache/heartache that will come from the approach you 
are trying.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of separent
> Sent: Tuesday, November 17, 2009 4:15 PM
> To: r-help@r-project.org
> Subject: [R] Perform operations on dataframes called with paste in
> loops
> 
> 
> In a loop, I compose the name of a csv file using paste, then read it
> (e.g.,
> dataset1.csv, dataset2.csv, etc). The name of the dataframe assigned to
> the
> imported csv is also composed with paste (e.g., dataset1, dataset2,
> etc.).
> Now I want to perform operations on the dataframes dataset1, dataset2,
> etc.
> However, the paste function only renders a string on which I can not,
> for
> example, do operations like
> plot(paste("dataset",i,"[,1]",sep=""),paste("dataset",i,"[,2]",sep=""))
> . How
> could I call the dataframe instead of the string representing its name?
> --
> View this message in context: http://old.nabble.com/Perform-operations-
> on-dataframes-called-with-paste-in-loops-tp26399586p26399586.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SOM library - where do I find it

2009-11-18 Thread Steve Lianoglou

Hi,

On Nov 18, 2009, at 3:58 PM, tdm wrote:


R version 2.9.2 (2009-08-24) - for windows


library(SOM)

Error in library(SOM) : there is no package called 'SOM'


I think you want "som", right?


Where can I get the SOM library from?


Where you get just about every R package from, CRAN:
http://cran.r-project.org/web/packages/som/index.html

Or, from within R:

R> install.packages('som')

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SOM library - where do I find it

2009-11-18 Thread tdm


R version 2.9.2 (2009-08-24) - for windows

> library(SOM)
Error in library(SOM) : there is no package called 'SOM'

Where can I get the SOM library from?

Thanks in advance
-- 
View this message in context: 
http://old.nabble.com/SOM-library---where-do-I-find-it-tp26415633p26415633.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re ading multiple Excel 2007 files with a loop

2009-11-18 Thread Rolf Turner


Have you looked at the read.xls() function from the gdata package?
It automates the conversion to *.csv for you.  It has worked seamlessly
for me on the occasions on which I've needed to use it.

cheers,

Rolf Turner

On 19/11/2009, at 9:09 AM, Mark W. Miller wrote:




I have several hundred Excel 2007 data files in a folder.  I would  
like to

read every file in a single given folder using a loop.

I have searched the FAQ, the forum archives here, other or older R  
boards

and the R Import / Export documentation, and have asked some very
knowledgeable R users without learning of a solution.  I hope  
someone here

can help.

I understand that the most common suggestion is to convert the  
files to csv
format.  However, there are so many files in my case (ultimately >  
1000) I

would rather avoid doing that.

I have also found many solutions to this problem for txt files and  
files in

additional formats other than Excel 2007.

I can read three Excel 2007 files one at a time with the following  
example

code using R 2.10.0 on a computer running Windows (XP, I think):




library(RODBC)


channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
*.xlsx,

*.xlsm, *.xlsb);
DBQ=U:\\test folder\\testA.xlsx; ReadOnly=False")

sqlTables(channel)

my.data.A <- sqlFetch(channel, "Sheet1")

odbcClose(channel)



channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
*.xlsx,

*.xlsm, *.xlsb);
DBQ=U:\\test folder\\testB.xlsx; ReadOnly=False")

sqlTables(channel)

my.data.B <- sqlFetch(channel, "Sheet1")

odbcClose(channel)



channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
*.xlsx,

*.xlsm, *.xlsb);
DBQ=U:\\test folder\\testC.xlsx; ReadOnly=False")

sqlTables(channel)

my.data.C <- sqlFetch(channel, "Sheet1")

odbcClose(channel)





# However, when I attempt to read the same three files with the  
loop below I

receive an error:




library(RODBC)


setwd("U:/test folder")


fname <- list.files(pattern=".\\.xlsx", full.names = FALSE,  
recursive =

TRUE, ignore.case = TRUE)

z <- length(fname)

print(z)


for (sp in 1:z) {

channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls,  
*.xlsx,

*.xlsm, *.xlsb);

DBQ=U:\\test folder\\fname[sp]; ReadOnly=False")

sqlTables(channel)

my.data <- sqlFetch(channel, "Sheet1")

print(my.data)

odbcClose(channel)
}




# The error I receive states:

Error in odbcTableExists(channel, sqtable) :
  ‘Sheet1’: table not found on channel


# Thank you sincerely in advance for any help with this problem.

Mark Miller

Gainesville, Florida

##
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.


This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com

##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to choose appropriate linear model? (ANOVA)

2009-11-18 Thread Rolf Turner


On 19/11/2009, at 9:10 AM, Tal Galili wrote:


Hello Peng,
What you are talking about is "model selection" process.
Although it also sound like you are referring to the more general  
subject of

regression model strategies, consider finding this book:
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/ 
dp/0387952322


Frank Harrell is a very insightful lecturer, I heard his writing is  
also

good.

I would love to read recommendation from other R members regarding  
your

question.


Alan Miller's book ``Subset Selection in Regression'' (Chapman and Hall,
1990) has some relevance.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package for Miscellaneous Psychometrics

2009-11-18 Thread Doran, Harold
Version 1.5 of the MiscPsycho package had been uploaded to CRAN (should hit 
mirrors in a day or so). This package has a set of functions that may be useful 
for psychometric applications.

The package has been updated to include the following:

1) All functions (where appropriate) now use standard formula arguments
2) All functions now use S3 print and summary methods
3) Help files have been substantially improved
4) Where appropriate, functions now use the standard extractor functions, coef()
4) The vignette 'MP' now provides very comprehensive descriptions of the 
statistical methods and substantive examples for
all functions
5) alpha.Summary has been deprecated. The alpha function and its summary method 
now includes the "conditional alpha" that was
previously provided by alpha.Summary.
6) The following new methods have been added and tested with operational data:

* Cheat function now provides the user with choice of newton-raphson 
iterations, bisection method, or Rs internal uniroot  function
* A new function called "SSI" implements a K nearest neighbor algorithm to 
derive conditional norms for student achievement data. This was designed 
primarily to construct conditional norms for student achievement growth models, 
but is flexible such that conditional norms for any score can be constructed
* Objects of class jml can now be plotted to visually examine IRT data fit by 
the jml function
* the function "classical" now returns standard errors of the item p-values as 
well as design-consistent standard errors to reflect clustered samples

All functions have been rigorously unit tested by comparing output to known 
answers, results from other software programs, and through monte carlo 
simulations. However, should an error or bug arise, please let me know. 
Comments regarding R structure and general implementation are always 
appreciated.

Sincerely,
Harold Doran

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to install older version of R?

2009-11-18 Thread p_connolly
On Thu, 19-Nov-2009 at 01:12AM +0800, Pan, Jia-chiun wrote:

|> Dear list
|>
|> This is much like a linux problem, but I can't find any
|> reference for it. My OS is ubuntu 9.04 and a version of 2.9.2 of R has
|> been already installed in. Now, I need to install the version of 2.7.1.
|> I google a lot of websites and it seems like without a painless way
|> provided me to do it.

Go to your CRAN mirror.  Click on the 'R Sources' link where you will
find a link to older versions.  You can download the R-2.7.1.tar.gz
file from there.  Unpack the file and read the file named INSTALL.
It's short and very clearly explains how to install from source.

If it doesn't work immediately as described there, it probably means
you need to install some extra debs that have something like 'devel'
in its/their name/s.


|> If any one offers me some suggestions/reference, I will
|> appreciate.

It really is very easy to install from source.  Even I can do it.

HTH

--
Patrick Connolly
Plant and Food Research
Mt Albert
Auckland
New Zealand
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~



FREE Nokia 2630 mobile phone for anyone moving from Telecom, TelstraClear or 
2degrees to Slingshot mobile!

www.slingshot.co.nz/mobile

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread baptiste auguie
another useful trick that could come in handy, thanks!

baptiste

2009/11/18 Gabor Grothendieck :
> Here is a slight variation:
>
>> read.table(textConnection(grep("", input, value = TRUE)),
> +    colClasses = c("NULL", "NULL", "numeric"))
>          V3         V6
> 1 0.00137700 3.4644e-07
> 2 0.00019412 4.8840e-08
> 3 0.00137700 3.4644e-07
> 4 0.00019412 4.8840e-08
>
>
>
> On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
>  wrote:
>> Hi,
>>
>> Thanks for the alternative approach. However, I should have made my
>> example more complete in that other lines may also have numeric
>> values, which I'm not interested in. Below is an updated problem, with
>> my current solution,
>>
>> tc <- textConnection(
>> "some text
>>   =    1.3770E-03      =    3.4644E-07
>>   =    1.9412E-04      =    4.8840E-08
>>
>> other text
>>    =    1.3770E-03      =    3.4644E-07
>>    =    1.9412E-04      =    4.8840E-08
>>
>> lots of other material,  including numeric values
>>  1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
>>  12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
>> etc...")
>>
>> input <-
>> readLines(tc)
>> close(tc)
>>
>> ## I want to retrieve the values for
>> ## , ,  and  only
>>
>> results <- c(
>> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind),
>> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
>> simplify = rbind))
>>
>> results
>>
>> Using the suggested base R solution, I've come up with this variation,
>>
>> z <- `, grep("|||", input,
>> value=TRUE))
>>
>> test <- scan(textConnection(z),what=0)
>> test[seq(1, length(test), by=2)]
>>
>>
>> Thanks again,
>>
>> baptiste
>>
>> 2009/11/18 Bert Gunter :
>>> The previous elegant solutions required the use of the gsubfn package.
>>> Nothing wrong with that, of course, but I'm always curious whether still
>>> relatively simple base R solutions can be found, as they are often (but not
>>> always!) much faster. And anyway, it seems to be in the spirit of your query
>>> to try such a solution. So here is one base R approach that I believe works.
>>> I'll break it up into 2 lines so you can see what's going on.
>>>
>>> ## Using your example...
>>> ## First replace everything but the number with spaces
>>>
 z <- gsub("[^[:digit:]E.+-]"," ",input)
 z
>>> [1] "         "
>>> [2] "            1.3770E-03               3.4644E-07"
>>> [3] "            1.9412E-04               4.8840E-08"
>>> [4] ""
>>> [5] "          "
>>> [6] "              1.3770E-03                3.4644E-07"
>>> [7] "              1.9412E-04                4.8840E-08"
>>>
>>> ## Now it can be scanned to a numeric via
>>>
 z<-scan(textConnection(z),what=0)
>>> Read 8 items
 z
>>> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
>>> 1.9412e-04 4.8840e-08
>>>
>>> 
>>> I believe this strategy is reasonably general, but I haven't checked it
>>> carefully and would appreciate folks pointing out where it trips up (e.g.
>>> perhaps with NA's).
>>>
>>> Best,
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>>  -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>>> Behalf Of baptiste auguie
>>> Sent: Wednesday, November 18, 2009 3:57 AM
>>> To: r-help
>>> Subject: [R] parsing numeric values
>>>
>>> Dear list,
>>>
>>> I'm seeking advice to extract some numeric values from a log file
>>> created by an external program. Consider the following example,
>>>
>>> input <-
>>> readLines(textConnection(
>>> "some text
>>>   =    1.3770E-03      =    3.4644E-07
>>>   =    1.9412E-04      =    4.8840E-08
>>>
>>> other text
>>>    =    1.3770E-03      =    3.4644E-07
>>>    =    1.9412E-04      =    4.8840E-08"))
>>>
>>> ## this is what I want
>>> results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
>>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
>>>             )
>>>
>>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>>>
>>> The use of strsplit is not ideal here as there is a different number
>>> of space characters in the lines containing  and  for
>>> instance (hence the indices 8 and 9 respectively).
>>>
>>> I tried to use gsubfn for a cleaner construct,
>>>
>>> strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>>>
>>> but I can't seem to find the correct regular expression to deal with
>>> the exponent.
>>>
>>>
>>> Any tips are welcome!
>>>
>>>
>>> Best regards,
>>>
>>> baptiste
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.

Re: [R] How to choose appropriate linear model? (ANOVA)

2009-11-18 Thread Tal Galili
Hello Peng,
What you are talking about is "model selection" process.
Although it also sound like you are referring to the more general subject of
regression model strategies, consider finding this book:
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322

Frank Harrell is a very insightful lecturer, I heard his writing is also
good.

I would love to read recommendation from other R members regarding your
question.

Best,
Tal




--


My contact information:
Tal Galili
E-mail: tal.gal...@gmail.com
Phone number: 972-52-7275845
FaceBook: Tal Galili
My Blogs:
http://www.talgalili.com (Web and general, Hebrew)
http://www.biostatistics.co.il (Statistics, Hebrew)
http://www.r-statistics.com/ (Statistics,R, English)




On Wed, Nov 18, 2009 at 9:48 PM, Peng Yu  wrote:

> I'm wondering how to choose an appropriate linear model for a given
> problem. I have been reading Applied Linear Regression Models by John
> Neter, Michael H Kutner, William Wasserman and Christopher J.
> Nachtsheim. I'm still not clear how to choose an appropriate linear
> model.
>
> For multi-factor ANOVA, shall I start with all the interaction terms
> and do an F-test to see with interaction terms are not significant,
> then do a linear regression on a model without the non-significant
> iteration term?
>
> Could somebody point me some good book or chapters on this topic?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re ading multiple Excel 2007 files with a loop

2009-11-18 Thread Mark W. Miller


I have several hundred Excel 2007 data files in a folder.  I would like to
read every file in a single given folder using a loop.

I have searched the FAQ, the forum archives here, other or older R boards
and the R Import / Export documentation, and have asked some very
knowledgeable R users without learning of a solution.  I hope someone here
can help.

I understand that the most common suggestion is to convert the files to csv
format.  However, there are so many files in my case (ultimately > 1000) I
would rather avoid doing that.

I have also found many solutions to this problem for txt files and files in
additional formats other than Excel 2007.

I can read three Excel 2007 files one at a time with the following example
code using R 2.10.0 on a computer running Windows (XP, I think):




library(RODBC)


channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls, *.xlsx,
*.xlsm, *.xlsb); 
DBQ=U:\\test folder\\testA.xlsx; ReadOnly=False")
 
sqlTables(channel)

my.data.A <- sqlFetch(channel, "Sheet1")

odbcClose(channel)



channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls, *.xlsx,
*.xlsm, *.xlsb); 
DBQ=U:\\test folder\\testB.xlsx; ReadOnly=False")
 
sqlTables(channel)

my.data.B <- sqlFetch(channel, "Sheet1")

odbcClose(channel)



channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls, *.xlsx,
*.xlsm, *.xlsb); 
DBQ=U:\\test folder\\testC.xlsx; ReadOnly=False")
 
sqlTables(channel)

my.data.C <- sqlFetch(channel, "Sheet1")

odbcClose(channel)





# However, when I attempt to read the same three files with the loop below I
receive an error:




library(RODBC)


setwd("U:/test folder")


fname <- list.files(pattern=".\\.xlsx", full.names = FALSE, recursive =
TRUE, ignore.case = TRUE)

z <- length(fname)

print(z)


for (sp in 1:z) {

channel <- odbcDriverConnect("DRIVER=Microsoft Excel Driver (*.xls, *.xlsx,
*.xlsm, *.xlsb); 

DBQ=U:\\test folder\\fname[sp]; ReadOnly=False")
 
sqlTables(channel)

my.data <- sqlFetch(channel, "Sheet1")

print(my.data)

odbcClose(channel)
}




# The error I receive states:

Error in odbcTableExists(channel, sqtable) : 
  ‘Sheet1’: table not found on channel


# Thank you sincerely in advance for any help with this problem.

Mark Miller

Gainesville, Florida


-- 
View this message in context: 
http://old.nabble.com/Reading-multiple-Excel-2007-files-with-a-loop-tp26414828p26414828.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standard error for the estimated value (lmer fitted model)

2009-11-18 Thread willow1980

Dear R users,
I want to draw standard error lines for the predicted regression line
estimated by logistic regression using lmer. I have two predictors: cafr and
its quadratic form I(cafr^2), where cafr is a variable centered around the
mean of original variable. Now, the estimated value from the fitted model
will be,
(mo...@x)%*%fixef(model)
In the logit scale, the mean sum of square from fitted model will be, 
sesample=sqrt(sum(resid(model)^2)/(n-p-1)), where p is the degrees of
freedom used for fitting.
Could someone make a judgement if it is reasonable to calculate standard
error of the estimated value by
sesample*sqrt(vector%*%ginv(t(mo...@x)%*%mo...@x)%*%t(vector))
, where vector is the (1,cafr,I(cafr^2)) which representing empirical data
vector at considered point.
If this is correct, I think I can use this method to draw standard error
line. Otherwise, would you please suggest a reasonable one?
Thank you very much for your attention!
Yours sincerely, Jianghua
-- 
View this message in context: 
http://old.nabble.com/standard-error-for-the-estimated-value-%28lmer-fitted-model%29-tp26414507p26414507.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2009-11-18 Thread Moritz Fromwald

http://www.lmgtfy.com/?q=multicollinearity+test+using+R
http://www.lmgtfy.com/?q=selection+stepwise+multiple+regression+analysis+using+R

Moritz

Karen Federico schrieb:

How do you perform a multicollinearity test using R. Also how do you perform
a selection stepwise to carry out a multiple regression analysis?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to choose appropriate linear model? (ANOVA)

2009-11-18 Thread Peng Yu
I'm wondering how to choose an appropriate linear model for a given
problem. I have been reading Applied Linear Regression Models by John
Neter, Michael H Kutner, William Wasserman and Christopher J.
Nachtsheim. I'm still not clear how to choose an appropriate linear
model.

For multi-factor ANOVA, shall I start with all the interaction terms
and do an F-test to see with interaction terms are not significant,
then do a linear regression on a model without the non-significant
iteration term?

Could somebody point me some good book or chapters on this topic?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2009-11-18 Thread Karen Federico
How do you perform a multicollinearity test using R. Also how do you perform
a selection stepwise to carry out a multiple regression analysis?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
Here is a slight variation:

> read.table(textConnection(grep("", input, value = TRUE)),
+colClasses = c("NULL", "NULL", "numeric"))
  V3 V6
1 0.00137700 3.4644e-07
2 0.00019412 4.8840e-08
3 0.00137700 3.4644e-07
4 0.00019412 4.8840e-08



On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
 wrote:
> Hi,
>
> Thanks for the alternative approach. However, I should have made my
> example more complete in that other lines may also have numeric
> values, which I'm not interested in. Below is an updated problem, with
> my current solution,
>
> tc <- textConnection(
> "some text
>   =    1.3770E-03      =    3.4644E-07
>   =    1.9412E-04      =    4.8840E-08
>
> other text
>    =    1.3770E-03      =    3.4644E-07
>    =    1.9412E-04      =    4.8840E-08
>
> lots of other material,  including numeric values
>  1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
>  12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
> etc...")
>
> input <-
> readLines(tc)
> close(tc)
>
> ## I want to retrieve the values for
> ## , ,  and  only
>
> results <- c(
> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind))
>
> results
>
> Using the suggested base R solution, I've come up with this variation,
>
> z <- `, grep("|||", input,
> value=TRUE))
>
> test <- scan(textConnection(z),what=0)
> test[seq(1, length(test), by=2)]
>
>
> Thanks again,
>
> baptiste
>
> 2009/11/18 Bert Gunter :
>> The previous elegant solutions required the use of the gsubfn package.
>> Nothing wrong with that, of course, but I'm always curious whether still
>> relatively simple base R solutions can be found, as they are often (but not
>> always!) much faster. And anyway, it seems to be in the spirit of your query
>> to try such a solution. So here is one base R approach that I believe works.
>> I'll break it up into 2 lines so you can see what's going on.
>>
>> ## Using your example...
>> ## First replace everything but the number with spaces
>>
>>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>>> z
>> [1] "         "
>> [2] "            1.3770E-03               3.4644E-07"
>> [3] "            1.9412E-04               4.8840E-08"
>> [4] ""
>> [5] "          "
>> [6] "              1.3770E-03                3.4644E-07"
>> [7] "              1.9412E-04                4.8840E-08"
>>
>> ## Now it can be scanned to a numeric via
>>
>>> z<-scan(textConnection(z),what=0)
>> Read 8 items
>>> z
>> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
>> 1.9412e-04 4.8840e-08
>>
>> 
>> I believe this strategy is reasonably general, but I haven't checked it
>> carefully and would appreciate folks pointing out where it trips up (e.g.
>> perhaps with NA's).
>>
>> Best,
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>>  -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>> Behalf Of baptiste auguie
>> Sent: Wednesday, November 18, 2009 3:57 AM
>> To: r-help
>> Subject: [R] parsing numeric values
>>
>> Dear list,
>>
>> I'm seeking advice to extract some numeric values from a log file
>> created by an external program. Consider the following example,
>>
>> input <-
>> readLines(textConnection(
>> "some text
>>   =    1.3770E-03      =    3.4644E-07
>>   =    1.9412E-04      =    4.8840E-08
>>
>> other text
>>    =    1.3770E-03      =    3.4644E-07
>>    =    1.9412E-04      =    4.8840E-08"))
>>
>> ## this is what I want
>> results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
>>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
>>             )
>>
>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>>
>> The use of strsplit is not ideal here as there is a different number
>> of space characters in the lines containing  and  for
>> instance (hence the indices 8 and 9 respectively).
>>
>> I tried to use gsubfn for a cleaner construct,
>>
>> strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>>
>> but I can't seem to find the correct regular expression to deal with
>> the exponent.
>>
>>
>> Any tips are welcome!
>>
>>
>> Best regards,
>>
>> baptiste
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-pro

Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
It only works if "some text" at the beginning has no digits, dots, E
characters or sign characters.

On Wed, Nov 18, 2009 at 12:44 PM, Bert Gunter  wrote:
> The previous elegant solutions required the use of the gsubfn package.
> Nothing wrong with that, of course, but I'm always curious whether still
> relatively simple base R solutions can be found, as they are often (but not
> always!) much faster. And anyway, it seems to be in the spirit of your query
> to try such a solution. So here is one base R approach that I believe works.
> I'll break it up into 2 lines so you can see what's going on.
>
> ## Using your example...
> ## First replace everything but the number with spaces
>
>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>> z
> [1] "         "
> [2] "            1.3770E-03               3.4644E-07"
> [3] "            1.9412E-04               4.8840E-08"
> [4] ""
> [5] "          "
> [6] "              1.3770E-03                3.4644E-07"
> [7] "              1.9412E-04                4.8840E-08"
>
> ## Now it can be scanned to a numeric via
>
>> z<-scan(textConnection(z),what=0)
> Read 8 items
>> z
> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
> 1.9412e-04 4.8840e-08
>
> 
> I believe this strategy is reasonably general, but I haven't checked it
> carefully and would appreciate folks pointing out where it trips up (e.g.
> perhaps with NA's).
>
> Best,
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>  -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of baptiste auguie
> Sent: Wednesday, November 18, 2009 3:57 AM
> To: r-help
> Subject: [R] parsing numeric values
>
> Dear list,
>
> I'm seeking advice to extract some numeric values from a log file
> created by an external program. Consider the following example,
>
> input <-
> readLines(textConnection(
> "some text
>   =    1.3770E-03      =    3.4644E-07
>   =    1.9412E-04      =    4.8840E-08
>
> other text
>    =    1.3770E-03      =    3.4644E-07
>    =    1.9412E-04      =    4.8840E-08"))
>
> ## this is what I want
> results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
>             )
>
> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>
> The use of strsplit is not ideal here as there is a different number
> of space characters in the lines containing  and  for
> instance (hence the indices 8 and 9 respectively).
>
> I tried to use gsubfn for a cleaner construct,
>
> strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>
> but I can't seem to find the correct regular expression to deal with
> the exponent.
>
>
> Any tips are welcome!
>
>
> Best regards,
>
> baptiste
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread Orvalho Augusto
Why don't try the fabulous WRITEXLS package?

Caveman


On Wed, Nov 18, 2009 at 7:45 PM, anna_l  wrote:
>
> Thanks Karl, well I am getting an error now after the following sqlSave
> command:
> sqlSave( xlsFile, datas, tablename = 'Datas_and_coefficients', rownames =
> FALSE )
>
> -->  [RODBC] Failed exec in Update
> 22018 39 [Microsoft][Driver ODBC for Excel]invalid character value for the
> diffusion specification (null) (null)
>
>
> More specifically, take a look at the 'append' and 'safer' arguments.
>
> --
> Karl Ove Hufthammer
>
> __
> R-help@r-project.org mailing list
>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> -
> Anna Lippel
> new in R so be careful I should be asking a lt of questions!:teeth:
> --
> View this message in context: 
> http://old.nabble.com/Writing-a-data-frame-in-an-excel-file-tp26378240p26412410.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread baptiste auguie
Hi,

Thanks for the alternative approach. However, I should have made my
example more complete in that other lines may also have numeric
values, which I'm not interested in. Below is an updated problem, with
my current solution,

tc <- textConnection(
"some text
  =1.3770E-03  =3.4644E-07
  =1.9412E-04  =4.8840E-08

other text
   =1.3770E-03  =3.4644E-07
   =1.9412E-04  =4.8840E-08

lots of other material,  including numeric values
 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
etc...")

input <-
readLines(tc)
close(tc)

## I want to retrieve the values for
## , ,  and  only

results <- c(
strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, " += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind))

results

Using the suggested base R solution, I've come up with this variation,

z <- gsub("[^[:digit:]E.+-]"," ", grep("|||", input,
value=TRUE))

test <- scan(textConnection(z),what=0)
test[seq(1, length(test), by=2)]


Thanks again,

baptiste

2009/11/18 Bert Gunter :
> The previous elegant solutions required the use of the gsubfn package.
> Nothing wrong with that, of course, but I'm always curious whether still
> relatively simple base R solutions can be found, as they are often (but not
> always!) much faster. And anyway, it seems to be in the spirit of your query
> to try such a solution. So here is one base R approach that I believe works.
> I'll break it up into 2 lines so you can see what's going on.
>
> ## Using your example...
> ## First replace everything but the number with spaces
>
>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>> z
> [1] "         "
> [2] "            1.3770E-03               3.4644E-07"
> [3] "            1.9412E-04               4.8840E-08"
> [4] ""
> [5] "          "
> [6] "              1.3770E-03                3.4644E-07"
> [7] "              1.9412E-04                4.8840E-08"
>
> ## Now it can be scanned to a numeric via
>
>> z<-scan(textConnection(z),what=0)
> Read 8 items
>> z
> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
> 1.9412e-04 4.8840e-08
>
> 
> I believe this strategy is reasonably general, but I haven't checked it
> carefully and would appreciate folks pointing out where it trips up (e.g.
> perhaps with NA's).
>
> Best,
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>  -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of baptiste auguie
> Sent: Wednesday, November 18, 2009 3:57 AM
> To: r-help
> Subject: [R] parsing numeric values
>
> Dear list,
>
> I'm seeking advice to extract some numeric values from a log file
> created by an external program. Consider the following example,
>
> input <-
> readLines(textConnection(
> "some text
>   =    1.3770E-03      =    3.4644E-07
>   =    1.9412E-04      =    4.8840E-08
>
> other text
>    =    1.3770E-03      =    3.4644E-07
>    =    1.9412E-04      =    4.8840E-08"))
>
> ## this is what I want
> results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
>             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
>             )
>
> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>
> The use of strsplit is not ideal here as there is a different number
> of space characters in the lines containing  and  for
> instance (hence the indices 8 and 9 respectively).
>
> I tried to use gsubfn for a cleaner construct,
>
> strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>
> but I can't seem to find the correct regular expression to deal with
> the exponent.
>
>
> Any tips are welcome!
>
>
> Best regards,
>
> baptiste
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] border/box/frame around plot

2009-11-18 Thread brbell01

Hello I need to know how to put a closed frame around my plot.  I am plotting
using the igraph package, and I have been able to use box() with limited
success. Box() puts a border around only the upper and right edges of the
plot area, but misses the axes. By default, setting the axes=TRUE in igraph
does not produce closed axes (ie axes that run through the origin and up to
the limits of the plot window).  Any ideas?
-- 
View this message in context: 
http://old.nabble.com/border-box-frame-around-plot-tp26410451p26410451.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Y axis of 1-D Linear Discriminant Histograms

2009-11-18 Thread Bob Farmer
Hi all.
I would like to understand what are the units defined on the y-axis
when you plot the one-dimensional predictions (histograms) from lda()
(MASS) discriminant function objects?

While the helpfile suggests that a histogram is returned by default,
the presumably proportion-like values for each group seem to add up to
more than 1, and I'm not sure how to interpret the code from
ldahist(), which, I believe, defines the heights of each bin as

est1/(diff(breaks) * length(data[g == grp]))

where est1 is (as far as I can tell) the frequency within the bin, and
the denominator is apparently the bin width multiplied by the total
sample size for that panel.   It seems to be that a far more logical
result would be returned for each bin if the diff(breaks) component
was removed entirely.

While I don't think my concern affects the shape of each group's
histogram, I'd much prefer to display a more intuitive y-axis.

Example:
library(MASS)
ld1<-lda(Species ~ Sepal.Length + Sepal.Width, iris)
plot(ld1, type = "histogram", dimen = 1)
#(eyeballing it suggests that the sum of the "frequencies" reported on
the y-axis for each group exceeds 1)

Thanks very much.
--Bob Farmer
Dalhousie University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data linkage functions for probabilistic linkage using person identifiers

2009-11-18 Thread Dagan A WRIGHT
I am somewhat new to R although using and liking already.  I am curious if 
there are any probabilistic packages similar in function to others such and 
Link King (http://www.the-link-king.com/).  I am looking for functions in SSN, 
First/Last name, date of birth, and a couple other indicators for matching.

Thanks

Dagan Wright, Ph.D., M.S.P.H.
Lead Addictions Research Analyst, Analysis & Evaluation Unit
Addictions & Mental Health Division (AMH)
500 Summer St. NE E86
Salem, Oregon 97301-1118

Office number: 503-945-5726
Fax number: 503-378-8467
dagan.a.wri...@state.or.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mann-whitney test with more groups

2009-11-18 Thread Peter Dalgaard

Peter Ehlers wrote:

Sorry, correction below.

Peter Ehlers wrote:


Kim Vanselow wrote:

Dear r-helpers,
I want to test groups of samples for significant differences.
Question: Does Group1 differ significantly from group2.
This is a question to be answered by mann-whitney-u-test.

I know that I can use wilcox.test with 2 samples.

My problem: How can r perform the test automatically if there are 
more than 2 groups in my data frame.

Test group1 vs. 2, 1 vs. 3, 1 vs. 4, etc.


This is my skript:
Deckung <- read.table("Gesamtdeckung.csv", sep=";", header=TRUE, 
dec=",", row.names=1)


x <- Deckung$Gesamtdeckung
y <- Deckung$Klasse

#U-Test
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "2"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "3"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("2", "3"))

Any help would be greatly appreciated!

Thanks
Kim 


This sounds like serious data dredging, but if you're
sure that it's what you want, try the combn() function:

 y <- gl(4, 5)
 x <- rnorm(20)
 m <- cbind(t(combn(4, 2)), NA)
 for(i in 1:nrow(idx))
m[i, 3] <-
  wilcox.test(x ~ y, subset = y %in% idx[i,])$p.value
 m

 -Peter Ehlers


  y <- gl(4, 5)
  x <- rnorm(20)
  m <- cbind(t(combn(4, 2)), NA)
  for(i in 1:nrow(m))  # change 'idx' to 'm'
 m[i, 3] <-
   wilcox.test(x ~ y, subset = y %in% m[i,])$p.value  # ditto


There's also pairwise.wilcox.test, with multiple testing correction and 
all. (But someone called Lumley may chime in and remind you of the lack 
of guaranteed transitivity of rank tests.)



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread anna_l

Thanks Karl, well I am getting an error now after the following sqlSave
command:
sqlSave( xlsFile, datas, tablename = 'Datas_and_coefficients', rownames =
FALSE )

-->  [RODBC] Failed exec in Update
22018 39 [Microsoft][Driver ODBC for Excel]invalid character value for the
diffusion specification (null) (null)


More specifically, take a look at the 'append' and 'safer' arguments.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-
Anna Lippel
new in R so be careful I should be asking a lt of questions!:teeth:
-- 
View this message in context: 
http://old.nabble.com/Writing-a-data-frame-in-an-excel-file-tp26378240p26412410.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Bert Gunter
The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.

## Using your example...
## First replace everything but the number with spaces

> z <- gsub("[^[:digit:]E.+-]"," ",input)
> z
[1] " " 
[2] "1.3770E-03   3.4644E-07"   
[3] "1.9412E-04   4.8840E-08"   
[4] ""  
[5] "  "
[6] "  1.3770E-033.4644E-07"
[7] "  1.9412E-044.8840E-08"

## Now it can be scanned to a numeric via

> z<-scan(textConnection(z),what=0)
Read 8 items
> z
[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08


I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).

Best,

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values

Dear list,

I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,

input <-
readLines(textConnection(
"some text
   =1.3770E-03  =3.4644E-07
   =1.9412E-04  =4.8840E-08

other text
=1.3770E-03  =3.4644E-07
=1.9412E-04  =4.8840E-08"))

## this is what I want
results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
 as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
 as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
 as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
 )

## [1] 0.00137700 0.00019412 0.00137700 0.00019412

The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing  and  for
instance (hence the indices 8 and 9 respectively).

I tried to use gsubfn for a cleaner construct,

strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)

but I can't seem to find the correct regular expression to deal with
the exponent.


Any tips are welcome!


Best regards,

baptiste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Method dispatch for function

2009-11-18 Thread Stavros Macrakis
How can I determine what S3 method will be called for a particular
first-argument class?

I was imagining something like functionDispatch('str','numeric') =>
utils:::str.default , but I can't find anything like this.

For that matter, I was wondering if anyone had written a version of
`methods` which gave their fully qualified names if they were not visible,
e.g.

methods('str') =>
utils:::str.data.frameutils:::str.default
stats:::str.dendrogramstats:::str.logLikutils:::str.POSIXt

or

methods('str') =>
 $utils
   "str.data.frame" "str.default""str.POSIXt"
 $stats
   "str.dendrogram" "str.logLik"

Thank you,

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM Param Tuning with using SNOW package

2009-11-18 Thread Max Kuhn
On Tue, Nov 17, 2009 at 6:01 PM, raluca  wrote:
>
> Hello,
>
> Is the first time I am using SNOW package and I am trying to tune the cost
> parameter for a linear SVM, where the cost (variable cost1) takes 10 values
> between 0.5 and 30.
>
> I have a large dataset and a pc which is not very powerful, so I need to
> tune the parameters using both CPUs of the pc.
>
> Somehow I cannot manage to do it. It seems that both CPUs are fitting the
> model for the same values of cost1, I guess the first 5, but not for the
> last 5.
>
> Please, can anyone help me! :-((

This is pretty easy to do with the train() funciton in the caret
package. From ?train, here is an example for a different data set

> library(caret)
> library(snow)
> library(mlbench)
>
> data(BostonHousing)
>
> mpiCalcs <- function(X, FUN, ...)
+   {
+ theDots <- list(...)
+ parLapply(theDots$cl, X, FUN)
+   }
>
> library(snow)
> cl <- makeCluster(5, "MPI")
>
> ## 50 bootstrap models distributed across 5 workers
> mpiControl <- trainControl(workers = 5,
+number = 50,
+computeFunction = mpiCalcs,
+computeArgs = list(cl = cl))
> set.seed(1)
> usingMPI <-  train(medv ~ .,
+data = BostonHousing,
+"svmLinear",
+tuneGrid = data.frame(.C = seq(.5, 30, length = 10)),
+trControl = mpiControl)
>
> stopCluster(cl)
[1] 1


-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread Karl Ove Hufthammer
On Wed, 18 Nov 2009 08:02:47 -0800 (PST) anna_l  wrote:
> Sorry Charlie, I didn?t understand that tablename=R Results was creating a
> worksheet. But the thing now is that it works very well when I write for the
> first time on the excel file but when I want to rewrite on it it gives the
> error i wrote before saying that Results already exists, is there a way to
> avoid that?

See the help page for 'sqlSave':
?sqlSave

More specifically, take a look at the 'append' and 'safer' arguments.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to install older version of R?

2009-11-18 Thread Pan,
Dear list

This is much like a linux problem, but I can't find any
reference for it. My OS is ubuntu 9.04 and a version of 2.9.2 of R has
been already installed in. Now, I need to install the version of 2.7.1.
I google a lot of websites and it seems like without a painless way
provided me to do it.
If any one offers me some suggestions/reference, I will
appreciate.

 Jia-Chiun Pan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mann-whitney test with more groups

2009-11-18 Thread Peter Ehlers

Sorry, correction below.

Peter Ehlers wrote:


Kim Vanselow wrote:

Dear r-helpers,
I want to test groups of samples for significant differences.
Question: Does Group1 differ significantly from group2.
This is a question to be answered by mann-whitney-u-test.

I know that I can use wilcox.test with 2 samples.

My problem: How can r perform the test automatically if there are more 
than 2 groups in my data frame.

Test group1 vs. 2, 1 vs. 3, 1 vs. 4, etc.


This is my skript:
Deckung <- read.table("Gesamtdeckung.csv", sep=";", header=TRUE, 
dec=",", row.names=1)


x <- Deckung$Gesamtdeckung
y <- Deckung$Klasse

#U-Test
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "2"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "3"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("2", "3"))

Any help would be greatly appreciated!

Thanks
Kim 


This sounds like serious data dredging, but if you're
sure that it's what you want, try the combn() function:

 y <- gl(4, 5)
 x <- rnorm(20)
 m <- cbind(t(combn(4, 2)), NA)
 for(i in 1:nrow(idx))
m[i, 3] <-
  wilcox.test(x ~ y, subset = y %in% idx[i,])$p.value
 m

 -Peter Ehlers


  y <- gl(4, 5)
  x <- rnorm(20)
  m <- cbind(t(combn(4, 2)), NA)
  for(i in 1:nrow(m))  # change 'idx' to 'm'
 m[i, 3] <-
   wilcox.test(x ~ y, subset = y %in% m[i,])$p.value  # ditto


 -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mann-whitney test with more groups

2009-11-18 Thread Peter Ehlers


Kim Vanselow wrote:

Dear r-helpers,
I want to test groups of samples for significant differences.
Question: Does Group1 differ significantly from group2.
This is a question to be answered by mann-whitney-u-test.

I know that I can use wilcox.test with 2 samples.

My problem: How can r perform the test automatically if there are more than 2 
groups in my data frame.
Test group1 vs. 2, 1 vs. 3, 1 vs. 4, etc.


This is my skript:
Deckung <- read.table("Gesamtdeckung.csv", sep=";", header=TRUE, dec=",", 
row.names=1)

x <- Deckung$Gesamtdeckung
y <- Deckung$Klasse

#U-Test
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "2"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "3"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("2", "3"))

Any help would be greatly appreciated!

Thanks
Kim 


This sounds like serious data dredging, but if you're
sure that it's what you want, try the combn() function:

 y <- gl(4, 5)
 x <- rnorm(20)
 m <- cbind(t(combn(4, 2)), NA)
 for(i in 1:nrow(idx))
m[i, 3] <-
  wilcox.test(x ~ y, subset = y %in% idx[i,])$p.value
 m

 -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Sunita Patil
Hello Sir

I have got 150 observations, got 10 posts/ 6 departments/ tasks vary from 5
to 10,

A few of the variables are crossed specially in case of Office boy, where
the tasks are like open the door, put on the lights,

Yes time variable I have used Chron package, so that it works well

My aim for this study is to check the "amount of time and its variability
for groups of tasks"

Its my project work so need to work this out myself if it doesnt work then I
will have to consult a statistician

Thanks for guiding me to put up the question in more clearer way, I will
sure take care next time

Regards

Our Thoughts have the Power to Change our Destiny.
Sunita


On Wed, Nov 18, 2009 at 9:29 PM, hadley wickham  wrote:

> That is not enough information for anyone to suggest a useful plot.
> For a start:
>
>  * How many observations do you have?
>  * How many difference posts/departments/tasks?
>  * Are the variables nested or crossed?
>  * Have you successfully parsed the time representation into something
> R can work with?  Is the representation inconsistent as in your
> example?
>  * What is the purpose of the study?  What do you want to find out?
>
> Maybe you should meet with a local statistical consultant to discuss
> these issues in person. WARNING: you might have to pay - good advice
> is not always/seldom/ever free.
>
> Hadley
>
> On Wed, Nov 18, 2009 at 9:53 AM, Sunita Patil  wrote:
> > Hello Sir
> >
> > I had given a sample of my data, As I cannot disclose whole of my data
> this
> > is just a sample given
> >
> > 1st column: Posts (GM, Secretary, AM, Office Boy)
> > 2nd Column: Dept (Finance, HR, ...)
> > 3rd column: Tasks (Open the door, Fix an appointment, Fill the register,
> > etc.) depending on the post
> > 4th column: Average Time required to do the task
> >
> > So the sample data would look like
> > PostsDeptTask Average time
> > Office Boy  HR   Open the door  00:00:09
> > Office Boy  HR   Switch on the lights  00:00:10
> > Secretary   FinanceFix an appointment   00.00.30
> >     .
> .
> >
> >     .
> .
> >     .
> .
> >
> > in my data the 1st column is the main category say suppose "Secretary"
> the
> > second column is the sub category "HR Dept" the 3rd column is the list of
> > duties performed by the Secretary from HR dept and 4th column is time
> > required to perform the duty
> >
> > so there are many such posts and dept with varied duties and times resp
> >
> >
> > Regards
> >
> > Our Thoughts have the Power to Change our Destiny.
> > Sunita
> >
> >
> > On Wed, Nov 18, 2009 at 9:15 PM, hadley wickham 
> wrote:
> >>
> >> > Yes I tried all the basic ones like box plot, pie chart, etc but the
> >> > data
> >> > representation isnt that clear.
> >>
> >> Given that you have neither provided your data, nor explained what you
> >> are trying to uncover from it, what sort of advice do you expect to
> >> get?
> >>
> >> Hadley
> >>
> >> --
> >> http://had.co.nz/
> >
> >
>
>
>
> --
> http://had.co.nz/
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Steve Lianoglou

Hi (again),

On Nov 18, 2009, at 10:53 AM, Sunita Patil wrote:


Hello Sir

I had given a sample of my data, As I cannot disclose whole of my  
data this

is just a sample given

1st column: Posts (GM, Secretary, AM, Office Boy)
2nd Column: Dept (Finance, HR, ...)
3rd column: Tasks (Open the door, Fix an appointment, Fill the  
register,

etc.) depending on the post
4th column: Average Time required to do the task

So the sample data would look like
*PostsDeptTask Average time*
Office Boy  HR   Open the door  00:00:09
Office Boy  HR   Switch on the lights  00:00:10
Secretary   FinanceFix an appointment   00.00.30
    . .

    . .
    . .

in my data the 1st column is the main category say suppose  
"Secretary" the
second column is the sub category "HR Dept" the 3rd column is the  
list of

duties performed by the Secretary from HR dept and 4th column is time
required to perform the duty

so there are many such posts and dept with varied duties and times  
resp


Fine, we see what your data looks like, but what are you trying to  
plot?! What do you want to show people about this data?


-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread anna_l

Sorry Charlie, I didn´t understand that tablename=R Results was creating a
worksheet. But the thing now is that it works very well when I write for the
first time on the excel file but when I want to rewrite on it it gives the
error i wrote before saying that Results already exists, is there a way to
avoid that?


-
Anna Lippel
new in R so be careful I should be asking a lt of questions!:teeth:
-- 
View this message in context: 
http://old.nabble.com/Writing-a-data-frame-in-an-excel-file-tp26378240p26408421.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Steve Lianoglou

Hi,

(Sorry, I didn't cc the r-help list)

On Nov 18, 2009, at 10:22 AM, Sunita Patil wrote:


I have been using R just very recently, I have gone through this
http://addictedtor.free.fr/graphiques/
a few weeks back but I am not able to understand as to how to choose  
the

graph amongst them? Can anyone guide me regarding this?


I'm not sure what you mean, exactly.

Many of those graphs there aren't just "normal" R functions. They are  
put together using several commands in order to build the final  
picture you see there.


For instance, say you like this graph:

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=145

At the bottom left of the page, you'll find a Source Code section  
under "Requirements".


Click the "view" link there:
http://addictedtor.free.fr/graphiques/graphcode.php?graph=145

And that's the code you need to make the graph (it's quite complex,  
but there are simpler ones.


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 |  Memorial Sloan-Kettering Cancer Center
 |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread hadley wickham
That is not enough information for anyone to suggest a useful plot.
For a start:

 * How many observations do you have?
 * How many difference posts/departments/tasks?
 * Are the variables nested or crossed?
 * Have you successfully parsed the time representation into something
R can work with?  Is the representation inconsistent as in your
example?
 * What is the purpose of the study?  What do you want to find out?

Maybe you should meet with a local statistical consultant to discuss
these issues in person. WARNING: you might have to pay - good advice
is not always/seldom/ever free.

Hadley

On Wed, Nov 18, 2009 at 9:53 AM, Sunita Patil  wrote:
> Hello Sir
>
> I had given a sample of my data, As I cannot disclose whole of my data this
> is just a sample given
>
> 1st column: Posts (GM, Secretary, AM, Office Boy)
> 2nd Column: Dept (Finance, HR, ...)
> 3rd column: Tasks (Open the door, Fix an appointment, Fill the register,
> etc.) depending on the post
> 4th column: Average Time required to do the task
>
> So the sample data would look like
> Posts            Dept        Task                     Average time
> Office Boy      HR           Open the door          00:00:09
> Office Boy  HR   Switch on the lights  00:00:10
> Secretary       Finance    Fix an appointment   00.00.30
>                             .                             .
>
>                             .                             .
>                             .                             .
>
> in my data the 1st column is the main category say suppose "Secretary" the
> second column is the sub category "HR Dept" the 3rd column is the list of
> duties performed by the Secretary from HR dept and 4th column is time
> required to perform the duty
>
> so there are many such posts and dept with varied duties and times resp
>
>
> Regards
>
> Our Thoughts have the Power to Change our Destiny.
> Sunita
>
>
> On Wed, Nov 18, 2009 at 9:15 PM, hadley wickham  wrote:
>>
>> > Yes I tried all the basic ones like box plot, pie chart, etc but the
>> > data
>> > representation isnt that clear.
>>
>> Given that you have neither provided your data, nor explained what you
>> are trying to uncover from it, what sort of advice do you expect to
>> get?
>>
>> Hadley
>>
>> --
>> http://had.co.nz/
>
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] foor loop - undefined columns selected error

2009-11-18 Thread Peter Ehlers



Michela Leonardi wrote:

Dear R-Help Members,

I am trying to read and analyse a set of 100 csv files.
I need work only on some columns in each of those, so I decided to use
a for loop, isolate the
column in each file to work on, but then an error mesage appear:
"undefined columns selected"

Here is my code:

setwd("F:/Data/")
a<-list.files()
for (x in a) {
  u<-read.csv(x, header=T, sep=",", check.names=FALSE)
#it give me the same problem using read.table
  h<-u[,2]
}

Error in `[.data.frame`(u, , 2) : undefined columns selected

It does not give me any problem selecting ane entire row (e.g. u[2,])
or a single value (e.g. [5,2])
If I try to select a column after the for loop I does not show any
problem, e.g.:

a<-list.files()
for (x in a) {
  u<-read.csv(x, header=T, sep=",", check.names=FALSE)
}
  h<-u[,2]

I would appreciate any suggestion or pointer to solve the problem or
to do the same thing in a different way.

Thanks for your consideration



Michela,

What are your loops supposed to accomplish? They
just give you the last file in 'a'.
When you get the error, that indicates that at least
one of your files has only one column.
You don't get the error when you take h<-u[,2] out
of the loop because the last file in 'a' does happen
to have 2 or more columns.
You need to rethink the loop with regard to what
you will do with 'h'. And you need to check what's
in those files.

 -Peter Ehlers


--
Michela Leonardi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating an excel file and manipulating it from R

2009-11-18 Thread Henrique Dallazuanna
Try the RDCOMClient [1] package.

[1]http://www.omegahat.org/RDCOMClient/

On Wed, Nov 18, 2009 at 1:31 PM, anna_l  wrote:
>
> Hello everybody, I´ve been looking for a function that would create an excel
> file in my working directory where I would write my dataframe but I only
> found the functions to write or read in an existing file that you gave me on
> my former post or on some websites. I can´t find either functions to
> manipulate those datas: for example, I would like some lines to be red or
> green according to their value. Thank you in advance!
>
>
> -
> Anna Lippel
> new in R so be careful I should be asking a lt of questions!:teeth:
> --
> View this message in context: 
> http://old.nabble.com/Creating-an-excel-file-and-manipulating-it-from-R-tp26408408p26408408.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Sunita Patil
Hello Sir

I had given a sample of my data, As I cannot disclose whole of my data this
is just a sample given

1st column: Posts (GM, Secretary, AM, Office Boy)
2nd Column: Dept (Finance, HR, ...)
3rd column: Tasks (Open the door, Fix an appointment, Fill the register,
etc.) depending on the post
4th column: Average Time required to do the task

So the sample data would look like
*PostsDeptTask Average time*
Office Boy  HR   Open the door  00:00:09
Office Boy  HR   Switch on the lights  00:00:10
Secretary   FinanceFix an appointment   00.00.30
    . .

    . .
    . .

in my data the 1st column is the main category say suppose "Secretary" the
second column is the sub category "HR Dept" the 3rd column is the list of
duties performed by the Secretary from HR dept and 4th column is time
required to perform the duty

so there are many such posts and dept with varied duties and times resp


Regards

Our Thoughts have the Power to Change our Destiny.
Sunita


On Wed, Nov 18, 2009 at 9:15 PM, hadley wickham  wrote:

> > Yes I tried all the basic ones like box plot, pie chart, etc but the data
> > representation isnt that clear.
>
> Given that you have neither provided your data, nor explained what you
> are trying to uncover from it, what sort of advice do you expect to
> get?
>
> Hadley
>
> --
> http://had.co.nz/
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
Thanks. This is now fixed in the development version so that it gives
an error rather than crashing:

> library(gsubfn)
Loading required package: proto
Loading required package: tcltk
Loading Tcl/Tk interface ... done
> source("http://gsubfn.googlecode.com/svn/trunk/R/gsubfn.R";)
> strapply("test", as.numeric)
Error in as.character(pattern) :
  cannot coerce type 'builtin' to vector of type 'character'


On Wed, Nov 18, 2009 at 8:49 AM, baptiste auguie
 wrote:
> Thanks a lot, both of you.
>
> Incidentally, I made R crash when I forgot the X argument to strapply,
>
> library(gsubfn)
> Loading required package: tcltk
> Loading Tcl/Tk interface ... done
> strapply("test", as.numeric)
>
>  *** caught bus error ***
> address 0x13c, cause 'non-existent physical address'
>
> Traceback:
>  1: .External("dotTclcallback", ..., PACKAGE = "tcltk")
>  2: .Tcl.callback(x, e)
>  3: makeAtomicCallback(x, e)
>  4: makeCallback(get("value", envir = ref), get("envir", envir = ref))
>  5: FUN(X[[3L]], ...)
>  6: lapply(val, val2obj)
>  7: .Tcl.args.objv(...)
>  8: structure(.External("dotTclObjv", objv, PACKAGE = "tcltk"), class
> = "tclObj")
>  9: .Tcl.objv(.Tcl.args.objv(...))
> 10: tcl("set", "e", e)
> 11: strapply1(x, pattern, backref, ignore.case)
> 12: FUN("test"[[1L]], ...)
> 13: lapply(X, FUN, ...)
> 14: sapply(X, ff, simplify = is.logical(simplify) && simplify,
> USE.NAMES = USE.NAMES)
> 15: strapply("test", as.numeric)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
>
> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  grid      methods
> [8] base
>
> other attached packages:
> [1] ggplot2_0.8.3  reshape_0.8.3  plyr_0.1.9     proto_0.3-8    fortunes_1.3-6
>
> 2009/11/18 Gabor Grothendieck :
>> A minor variant might be the following:
>>
>>   library(gsubfn)
>>   strapply(input, "\\d+\\.\\d+E[-+]?\\d+", as.numeric, simplify = rbind)
>>
>> where:
>>
>> - as.numeric is used in place of c in which case we do not need combine
>> - \\d+ matches one or more digits
>> - \\. matches a decimal point
>> - [-+]? matches -, + or nothing (i.e. an optional sign).
>> - parentheses around the regular expression not needed
>>
>> On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna  
>> wrote:
>>> Try this:
>>>
>>> strapply(input, "([0-9]+\\.[0-9]+E-[0-9]+)", c, simplify = rbind,
>>> combine = as.numeric)
>>>
>>> On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
>>>  wrote:
 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input <-
 readLines(textConnection(
 "some text
   =    1.3770E-03      =    3.4644E-07
   =    1.9412E-04      =    4.8840E-08

 other text
    =    1.3770E-03      =    3.4644E-07
    =    1.9412E-04      =    4.8840E-08"))

 ## this is what I want
 results <- c(as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][8]),
             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9]),
             as.numeric(strsplit(grep("", input,val=T), " ")[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing  and  for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, " += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducibl

Re: [R] Presentation of data in Graphical format

2009-11-18 Thread hadley wickham
> Yes I tried all the basic ones like box plot, pie chart, etc but the data
> representation isnt that clear.

Given that you have neither provided your data, nor explained what you
are trying to uncover from it, what sort of advice do you expect to
get?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mann-whitney test with more groups

2009-11-18 Thread Kim Vanselow
Dear r-helpers,
I want to test groups of samples for significant differences.
Question: Does Group1 differ significantly from group2.
This is a question to be answered by mann-whitney-u-test.

I know that I can use wilcox.test with 2 samples.

My problem: How can r perform the test automatically if there are more than 2 
groups in my data frame.
Test group1 vs. 2, 1 vs. 3, 1 vs. 4, etc.


This is my skript:
Deckung <- read.table("Gesamtdeckung.csv", sep=";", header=TRUE, dec=",", 
row.names=1)

x <- Deckung$Gesamtdeckung
y <- Deckung$Klasse

#U-Test
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "2"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("1", "3"))
wilcox.test(x ~ y, paired = FALSE, subset = y %in% c("2", "3"))

Any help would be greatly appreciated!

Thanks
Kim 
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing a data frame in an excel file

2009-11-18 Thread anna_l

Hi Charlie, I´ve been trying to use the sqlSave the way you showed me but it
would give me this error message which I couldn´t solve:
Erro em sqlSave(xlsFile, strategy, tablename = "Result", rownames = FALSE) : 
  table ‘Result’ already exists

I would like to save the data frame in a specified worksheet but I couldn´t
find in the help on sqlSave how to do it. 


cls59 wrote:
> 
> 
> anna_l wrote:
>> 
>> Hello, I am having trouble by using the write.table function to write a
>> data frame of 4 columns and 7530 rows. I don´t  know if I should just use
>> a sep="\n" and change the .xls file into a .csv file. Thanks in advance
>> 
> 
> 
> Base R cannot write .xls files by it's self.  You should output CSV using
> write.csv():
> 
>   write.csv( dataFrame, file = 'results.csv' )
> 
> If you are using R on windows, then the RODBC package provides a mechanism
> for dumping data frames directly to Excel files, possibly with multiple
> sheets:
> 
>   require( RODBC )
> 
>   xlsFile <- odbcConnectExcel( 'results.xls', readOnly = F )
> 
>   sqlSave( xlsFile, dataFrame, tablename = 'R Results', rownames = F )
> 
>   odbcCloseAll()
> 
> 
> The tablename argument to sqlSave allows you to assign a name to the excel
> sheet that will contain the data.frame.
> 
> 
> -Charlie
> 


-
Anna Lippel
new in R so be careful I should be asking a lt of questions!:teeth:
-- 
View this message in context: 
http://old.nabble.com/Writing-a-data-frame-in-an-excel-file-tp26378240p26408412.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Sunita Patil
yes in my data the 1st column is the main category say suppose "Secretary"
the second column is the sub category "HR Dept" the 3rd column is the list
of duties performed by the Secretary from HR dept and 4th column is time
required to perform the duty

so there are many such posts and dept with varied duties and times resp.

Regards

Our Thoughts have the Power to Change our Destiny.
Sunita


On Wed, Nov 18, 2009 at 8:42 PM, Petr PIKAL  wrote:

> Hi
>
> r-help-boun...@r-project.org napsal dne 18.11.2009 16:01:27:
>
> > Yes I tried all the basic ones like box plot, pie chart, etc but the
> data
> > representation isnt that clear.
> >
>
> I agree with Tal. But it partly depends on your data. If you have many
> levels and only few time values in each boxplot would not look well. Maybe
> you could check also ?xtabs or ?table and/or R graph gallery
> http://addictedtor.free.fr/graphiques/ if you find suitable graph.
>
> Regards
> Petr
>
>
>
> >
> > Regards
> >
> > Our Thoughts have the Power to Change our Destiny.
> > Sunita
> >
> >
> > On Wed, Nov 18, 2009 at 7:20 PM, Tal Galili 
> wrote:
> >
> > > I would start with
> > > ?boxplot
> > >
> > >
> > > --
> > >
> > >
> > > My contact information:
> > > Tal Galili
> > > E-mail: tal.gal...@gmail.com
> > > Phone number: 972-52-7275845
> > > FaceBook: Tal Galili
> > > My Blogs:
> > > http://www.talgalili.com (Web and general, Hebrew)
> > > http://www.biostatistics.co.il (Statistics, Hebrew)
> > > http://www.r-statistics.com/ (Statistics,R, English)
> > >
> > >
> > >
> > >
> > > On Wed, Nov 18, 2009 at 2:47 PM, Sunita Patil 
> wrote:
> > >
> > >> Thanx
> > >>
> > >> but I am not able to find a graph that wud suit my data
> > >>
> > >> Regards
> > >>
> > >> Our Thoughts have the Power to Change our Destiny.
> > >> Sunita
> > >>
> > >>
> > >> On Sun, Nov 15, 2009 at 8:54 PM, milton ruser  > >> >wrote:
> > >>
> > >> > Google "R graph grallery"
> > >> > Google "R ggplot2"
> > >> > Google "R lattice"
> > >> >
> > >> > and good luck
> > >> >
> > >> > milton
> > >> > On Sun, Nov 15, 2009 at 7:48 AM, Sunita22 
> wrote:
> > >> >
> > >> >>
> > >> >> Hello
> > >> >>
> > >> >> My data contains following columns:
> > >> >>
> > >> >> 1st column: Posts (GM, Secretary, AM, Office Boy)
> > >> >> 2nd Column: Dept (Finance, HR, ...)
> > >> >> 3rd column: Tasks (Open the door, Fix an appointment, Fill the
> > >> register,
> > >> >> etc.) depending on the post
> > >> >> 4th column: Average Time required to do the task
> > >> >>
> > >> >> So the sample data would look like
> > >> >> PostsDeptTask   Average
> time
> > >> >> Office Boy  HR   Open the door  00:00:09
> > >> >> Secretary   FinanceFix an appointment00.00.30
> > >> >> .  . .
> > >> >>
> > >> >> I am trying to represent this data in Graphical format, I tried
> graphs
> > >> >> like
> > >> >> Mosaic plot, etc. But it does not represent the data correctly. My
> aim
> > >> is
> > >> >> to
> > >> >> check the "amount of time and its variability for groups of tasks"
> > >> >>
> > >> >> Thank you in advance
> > >> >> Regards
> > >> >> Sunita
> > >> >>
> > >> >> --
> > >> >> View this message in context:
> > >> >>
> > >> http://old.nabble.com/Presentation-of-data-in-Graphical-format-
> > tp26358857p26358857.html
> > >> >> Sent from the R help mailing list archive at Nabble.com.
> > >> >>
> > >> >> __
> > >> >> R-help@r-project.org mailing list
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html<
> > >> http://www.r-project.org/posting-guide.html>
> > >>
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >> >
> > >> >
> > >>
> > >>[[alternative HTML version deleted]]
> > >>
> > >>
> > >> __
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > >
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating an excel file and manipulating it from R

2009-11-18 Thread anna_l

Hello everybody, I´ve been looking for a function that would create an excel
file in my working directory where I would write my dataframe but I only
found the functions to write or read in an existing file that you gave me on
my former post or on some websites. I can´t find either functions to
manipulate those datas: for example, I would like some lines to be red or
green according to their value. Thank you in advance!


-
Anna Lippel
new in R so be careful I should be asking a lt of questions!:teeth:
-- 
View this message in context: 
http://old.nabble.com/Creating-an-excel-file-and-manipulating-it-from-R-tp26408408p26408408.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A combinatorial optimization problem: finding the best permutation of a complex vector

2009-11-18 Thread Erwin Kalvelagen
See also:
http://yetanothermathprogrammingconsultant.blogspot.com/2009/11/assignment-problem.html


Erwin Kalvelagen
Amsterdam Optimization Modeling Group
er...@amsterdamoptimization.com
http://amsterdamoptimization.com



On Wed, Nov 18, 2009 at 9:49 AM, Ravi Varadhan  wrote:

> I just saw that Cplex is a commercial software from ILOG/IBM, and that
> there
> is an R interface, Rcplex, for it.  While this is bad news, it is still
> encouraging to know that the LSAP problem can be solved faster.  I will
> keep
> looking for better/faster open source algorithms.
>
> Ravi.
>
>
> 
> ---
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvarad...@jhmi.edu
>
> Webpage:
>
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml
>
>
>
>
> 
> 
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Ravi Varadhan
> Sent: Wednesday, November 18, 2009 9:39 AM
> To: 'Erwin Kalvelagen'; r-h...@stat.math.ethz.ch
> Subject: Re: [R] A combinatorial optimization problem: finding the best
> permutation of a complex vector
>
> Hi Erwin,
>
> Thank you for the information about Cplex.  It seems quite impressive.  Is
> it a proprietary software?  I saw that there is a Matlab interface to it.
> Is there an R interface?
>
>
> Thanks,
> Ravi.
>
>
> 
> ---
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvarad...@jhmi.edu
>
> Webpage:
>
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml
>
>
>
>
> 
> 
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Erwin Kalvelagen
> Sent: Wednesday, November 18, 2009 12:20 AM
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] A combinatorial optimization problem: finding the best
> permutation of a complex vector
>
> Ravi Varadhan  jhmi.edu> writes:
> >
> >
> > When I increased N = 1000, the time was about 1400 seconds!
> >
>
> Not sure of this is important for you: This can be solved much faster. A
> good
> solver can solve the n=1000 problem in less than 2 seconds. The Cplex
> network
> code shows:
>
> Network - Optimal:  Objective =   1.6173194067e+003
> Network time =1.58 sec.  Iterations = 209126 (102313)
>
> Even solved as an LP this takes about 150 seconds.
>
> (The solutions are the same as reported by solve_LSAP).
>
>
> 
> Erwin Kalvelagen
> Amsterdam Optimization Modeling Group
> er...@amsterdamoptimization.com
> http://amsterdamoptimization.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Presentation of data in Graphical format

2009-11-18 Thread Sunita Patil
I have been using R just very recently, I have gone through this
http://addictedtor.free.fr/graphiques/
a few weeks back but I am not able to understand as to how to choose the
graph amongst them? Can anyone guide me regarding this?

Thanks in advance
Regards

Our Thoughts have the Power to Change our Destiny.
Sunita


On Wed, Nov 18, 2009 at 8:42 PM, Petr PIKAL  wrote:

> Hi
>
> r-help-boun...@r-project.org napsal dne 18.11.2009 16:01:27:
>
> > Yes I tried all the basic ones like box plot, pie chart, etc but the
> data
> > representation isnt that clear.
> >
>
> I agree with Tal. But it partly depends on your data. If you have many
> levels and only few time values in each boxplot would not look well. Maybe
> you could check also ?xtabs or ?table and/or R graph gallery
> http://addictedtor.free.fr/graphiques/ if you find suitable graph.
>
> Regards
> Petr
>
>
>
> >
> > Regards
> >
> > Our Thoughts have the Power to Change our Destiny.
> > Sunita
> >
> >
> > On Wed, Nov 18, 2009 at 7:20 PM, Tal Galili 
> wrote:
> >
> > > I would start with
> > > ?boxplot
> > >
> > >
> > > --
> > >
> > >
> > > My contact information:
> > > Tal Galili
> > > E-mail: tal.gal...@gmail.com
> > > Phone number: 972-52-7275845
> > > FaceBook: Tal Galili
> > > My Blogs:
> > > http://www.talgalili.com (Web and general, Hebrew)
> > > http://www.biostatistics.co.il (Statistics, Hebrew)
> > > http://www.r-statistics.com/ (Statistics,R, English)
> > >
> > >
> > >
> > >
> > > On Wed, Nov 18, 2009 at 2:47 PM, Sunita Patil 
> wrote:
> > >
> > >> Thanx
> > >>
> > >> but I am not able to find a graph that wud suit my data
> > >>
> > >> Regards
> > >>
> > >> Our Thoughts have the Power to Change our Destiny.
> > >> Sunita
> > >>
> > >>
> > >> On Sun, Nov 15, 2009 at 8:54 PM, milton ruser  > >> >wrote:
> > >>
> > >> > Google "R graph grallery"
> > >> > Google "R ggplot2"
> > >> > Google "R lattice"
> > >> >
> > >> > and good luck
> > >> >
> > >> > milton
> > >> > On Sun, Nov 15, 2009 at 7:48 AM, Sunita22 
> wrote:
> > >> >
> > >> >>
> > >> >> Hello
> > >> >>
> > >> >> My data contains following columns:
> > >> >>
> > >> >> 1st column: Posts (GM, Secretary, AM, Office Boy)
> > >> >> 2nd Column: Dept (Finance, HR, ...)
> > >> >> 3rd column: Tasks (Open the door, Fix an appointment, Fill the
> > >> register,
> > >> >> etc.) depending on the post
> > >> >> 4th column: Average Time required to do the task
> > >> >>
> > >> >> So the sample data would look like
> > >> >> PostsDeptTask   Average
> time
> > >> >> Office Boy  HR   Open the door  00:00:09
> > >> >> Secretary   FinanceFix an appointment00.00.30
> > >> >> .  . .
> > >> >>
> > >> >> I am trying to represent this data in Graphical format, I tried
> graphs
> > >> >> like
> > >> >> Mosaic plot, etc. But it does not represent the data correctly. My
> aim
> > >> is
> > >> >> to
> > >> >> check the "amount of time and its variability for groups of tasks"
> > >> >>
> > >> >> Thank you in advance
> > >> >> Regards
> > >> >> Sunita
> > >> >>
> > >> >> --
> > >> >> View this message in context:
> > >> >>
> > >> http://old.nabble.com/Presentation-of-data-in-Graphical-format-
> > tp26358857p26358857.html
> > >> >> Sent from the R help mailing list archive at Nabble.com.
> > >> >>
> > >> >> __
> > >> >> R-help@r-project.org mailing list
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html<
> > >> http://www.r-project.org/posting-guide.html>
> > >>
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >> >
> > >> >
> > >>
> > >>[[alternative HTML version deleted]]
> > >>
> > >>
> > >> __
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > >
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >