from:"Kevin Wright"

[R] How to auto-scale cex of y-axis labels in lattice dotplot?

2007-07-25 Thread Kevin Wright

When I create a dotplot in lattice, I frequently observe overplotting
of the labels along the vertical axis.  On my screen, this illustrates
overplotting of the letters:

windows()
reps=6
dat=data.frame(let=rep(letters,each=reps), grp=rep(1:reps, 26),
  y=runif(26*reps))
dotplot(let~y|grp, dat)

Is there a way to automatically scale the labels so that they are not
over-plotted?

I currently do something like this:
Calculate or guess the number of panel rows: NumPanelRows
cexLab <- min(1, .9*par()$pin[2]/
  (nlevels(dat$let)*NumPanelRows*strheight("A",units="in")))
dotplot(..., scales=list(y=list(cex=cexLab))

Is there an easier way?

Is there a function that I can call which calculates the layout of the
panels that will be used in the dotplot?

Any tips will be appreciated.

K Wright

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Confidence interval for coefficient of variation

2007-06-14 Thread Kevin Wright

This is a function I coded a few years ago to calculate a confidence
interval for a coefficient of variation.  The code is based on a paper
by Mark Vangel in The American Statistician.  I have not used the
function much, but it could be useful for comparing cv's from
different groups.

Kevin Wright


confint.cv <- function(x,alpha=.05, method="modmckay"){
  # Calculate the confidence interval of the cv of the vector x
  # Author: Kevin Wright
  # See: Vangel, Mark.  Confidence Intervals for a Normal Coefficient
  # of Variation. American Statistician, Vol 15, No1, p. 21--26.
  # x <- c(326,302,307,299,329)
  # confint.cv(x,.05,"modmckay")

  x <- na.omit(x)
  v <- length(x)-1
  mu <- mean(x)
  sigma <- sqrt(var(x))
  k <- sigma/mu
  # CV > .33 may give poor results, so warn the user
  if(k>.33) warning("Confidence interval may be very approximate.")

  method <- casefold(method) # In case we see "McKay"

  if(method=="mckay"){
# McKay method.  See equation 15.
t1 <- qchisq(1-alpha/2,v)/v
t2 <- qchisq(alpha/2,v)/v
u1 <- v*t1
u2 <- v*t2
lower <- k/sqrt(( u1/(v+1) -1)*k*k + u1/v)
upper <- k/sqrt(( u2/(v+1) -1)*k*k + u2/v)
  } else if (method=="naive"){
# Naive method.  See equation 17.
t1 <- qchisq(1-alpha/2,v)/v
t2 <- qchisq(alpha/2,v)/v
lower <- k/sqrt(t1)
upper <- k/sqrt(t2)
  } else {
# Modified McKay method. See equation 16.
u1 <- qchisq(1-alpha/2,v)
u2 <- qchisq(alpha/2,v)
lower <- k/sqrt(( (u1+2)/(v+1) -1)*k*k + u1/v)
upper <- k/sqrt(( (u2+2)/(v+1) -1)*k*k + u2/v)
  }
  ci <- c(lower,upper)
  attr(ci,"CV") <- k
  attr(ci,"alpha") <- alpha
  return(ci)
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting dataframe by different columns

2007-06-08 Thread Kevin Wright

On the R wiki site there is a general-purpose function
(sort.data.frame) that allows you to do this:

sort(df, by=~ x-z)

See: http://wiki.r-project.org/rwiki/doku.php?id=tips:data-frames:sort

Regards,

Kevin

On 6/8/07, Gunther Höning <[EMAIL PROTECTED]> wrote:
> Dear list,
>
> I have a very short question,
> Suggest a dataframe of four columns.
>
> df <- data.frame(w,x,y,z)
>
> I want this ordered the following way:
> first by :x, decreasing = FALSE
> and
> secondly by: z, decreasing =TRUE
>
> How can this be done ?
>
> Thanks
>
> Gunther
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sorting data in R

2007-04-20 Thread Kevin Wright

Now that "sort" is a generic function, I have modified my original
sort.data.frame function to be s 'data.frame' method to sort.  (You
now have to specify the formula as the 'by' argument).

See the R Wiki:
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-frames:sort&s=sort%20data

There are also some other tips on that page for sorting.

Kevin Wright


On 4/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> Best function for sorting which i have used and many in the community :-)
> HTH Cheers
>
> sort.data.frame(Oats, ~ -nitro + Variety)
>
> Feedback and improvements are welcome.
>
> sort.data.frame <- function(form,dat){
>   # Author: Kevin Wright
>   # Some ideas from Andy Liaw
>   #   http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
>
>   # Use + for ascending, - for decending.
>   # Sorting is left to right in the formula
>
>   # Useage is either of the following:
>   # library(nlme); data(Oats)
>   # sort.data.frame(~-Variety+Block,Oats) # Note: levels(Oats$Block)
>   # sort.data.frame(Oats,~nitro-Variety)
>
>   # If dat is the formula, then switch form and dat
>   if(inherits(dat,"formula")){
> f=dat
> dat=form
> form=f
>   }
>   if(form[[1]] != "~")
> stop("Formula must be one-sided.")
>
>   # Make the formula into character and remove spaces
>   formc <- as.character(form[2])
>   formc <- gsub(" ","",formc)
>   # If the first character is not + or -, add +
>   if(!is.element(substring(formc,1,1),c("+","-")))
> formc <- paste("+",formc,sep="")
>
>   # Extract the variables from the formula
>   if(exists("is.R") && is.R()){
> vars <- unlist(strsplit(formc, "[\\+\\-]"))
>   }
>   else{
> vars <- unlist(lapply(unpaste(formc,"-"),unpaste,"+"))
>   }
>   vars <- vars[vars!=""] # Remove spurious "" terms
>
>   # Build a list of arguments to pass to "order" function
>   calllist <- list()
>   pos=1 # Position of + or -
>   for(i in 1:length(vars)){
> varsign <- substring(formc,pos,pos)
> pos <- pos+1+nchar(vars[i])
> if(is.factor(dat[,vars[i]])){
>   if(varsign=="-")
> calllist[[i]] <- -rank(dat[,vars[i]])
>   else
> calllist[[i]] <- rank(dat[,vars[i]])
> }
> else {
>   if(varsign=="-")
> calllist[[i]] <- -dat[,vars[i]]
>   else
> calllist[[i]] <- dat[,vars[i]]
> }
>   }
>   dat[do.call("order",calllist),]
>
> }
>
>
>
>
>
>
>
>
>
>
> elyakhlifi mustapha <[EMAIL PROTECTED]>
> Sent by: [EMAIL PROTECTED]
> 20-04-07 03:30 PM
>
> To
> R-help@stat.math.ethz.ch
> cc
>
> Subject
> [R] sorting data in R
>
>
>
>
>
>
> hello,
> I'd  like  know how to sort a data frame in R for example how I should do
> to sort by Catholic with swiss data frame like below
> thanks
>
>  Fertility Agriculture Examination Education Catholic
> Infant.Mortality
> Courtelary80.217.0  1512 9.9622.2
> Delemont  83.145.1   6 984.8422.2
> Franches-Mnt  92.539.7   5 593.4020.2
> Moutier   85.836.5  12 733.7720.3
> Neuveville76.943.5  1715 5.1620.6
> Porrentruy76.135.3   9 790.5726.6
> Broye 83.870.2  16 792.8523.6
> Glane 92.467.8  14 897.1624.9
> Gruyere   82.453.3  12 797.6721.0
> Sarine82.945.2  161391.3824.4
> Veveyse   87.164.5  14 698.6124.5
> Aigle 64.162.0  2112 8.5216.5
> Aubonne   66.967.5  14 7 2.2719.1
> Avenches  68.960.7  1912 4.4322.7
> Cossonay  61.769.3  22 5 2.8218.7
> Echallens 68.372.6  18 224.2021.2
> Grandson  71.734.0  17 8 3.3020.0
> Lausanne  55.719.4  262812.1120.2
> La Vallee 54.315.2  3120 2.1510.8
> Lavaux65.173.0  19 9 2.8420.0
> M

Re: [R] using alpha transparency for lines in levelplot - SUMMARY

2007-03-29 Thread Kevin Wright

I reported a similar issue with Adobe Reader in a thread starting here:
http://tolstoy.newcastle.edu.au/R/e2/devel/06/10/0706.html

K Wright



On 3/27/07, Michael Sumner <[EMAIL PROTECTED]> wrote:
> Hello, thanks to Deepayan Sarkar for sorting me out on this one.
>
> The problem with transparent lines affecting region colour in lattice
> plot appears
> when using Adobe Reader (v 8 in my case). I've only viewed the file on
> Windows XP.
>
> I've tried using Foxit Reader to view the file and there's no problem.
>
> Cheers, Mike.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Next useR conference date/location

2006-11-14 Thread Kevin Wright

Are there any tentative plans for the next useR conference?

I ask because I will soon need to develop an expense budget for 2007
and wondered if the conference is switching to an annual meeting or
will continue every two years.

Thanks for any information

K Wright

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lme4 and lmeSplines

2006-08-02 Thread Kevin Wright

I'm trying to use the lmeSplines package together with lme4.

Below is (1) an example of lmeSplines together with nlme (2) an
attempt to use lmeSplines with lme4 (3) then a comparison of the
random effects from the two different methods.

(1)

require(lmeSplines)
data(smSplineEx1)
dat <- smSplineEx1
dat.lo <- loess(y~time, data=dat)
plot(dat.lo)
dat$all <- rep(1,nrow(dat))
times20 <- seq(1,100,length=20)
Zt20 <- smspline(times20)
dat$Zt20 <- approx.Z(Zt20, times20, dat$time)
fit1.20 <- lme(y~time, data=dat, random=list(all=pdIdent(~Zt20-1)))
# Loess model
dat.lo <- loess(y~time, data=dat)
plot(dat.lo)
# Spline model
with(dat, lines(fitted(fit1.20)~time, col="red"))
# Save random effects for later
ranef.nlme <- unlist(ranef(fit1.20))

(2) Now an attempt to use lme4:

library(lmeSplines)
detach(package:nlme)
library(lme4)
data(smSplineEx1)
# Use 20 spline in lme4
dat <- smSplineEx1
times20 <- seq(1,100,length=20)
Zt20 <- smspline(times20)
dat <- cbind(dat, approx.Z(Zt20, times20, dat$time))
names(dat)[4:21] <- paste("Zt",names(dat)[4:21],sep="")
dat$all <- rep(1, nrow(dat))
fit1.20 <- lmer(y~time
 
+(-1+Zt1|all)+(-1+Zt2|all)+(-1+Zt3|all)+(-1+Zt4|all)+(-1+Zt5|all)+(-1+Zt6|all)
 
+(-1+Zt7|all)+(-1+Zt8|all)+(-1+Zt9|all)+(-1+Zt10|all)+(-1+Zt11|all)+(-1+Zt12|all)
 
+(-1+Zt13|all)+(-1+Zt14|all)+(-1+Zt15|all)+(-1+Zt16|all)+(-1+Zt17|all)+(-1+Zt18|all),
 data=dat)
#summary(fit1)
# Plot the data and loess fit
dat.lo <- loess(y~time, data=dat)
plot(dat.lo)
# Fitting with splines
with(dat, lines(fitted(fit1.20)~time, col="red"))
ranef.lme4 <- unlist(ranef(fit1.20))

(3) Compare nlme lme4 random effects

plot(ranef.nlme~ranef.lme4)

The plot of fitted values from lme4 is visually appealing, but the
random effects from lme4 are peculiar--three are non-zero and the rest
are essentially zero.

Any help in getting lme4 + lmeSplines working would be appreciated.
It is not unlikely that I have the lmer syntax wrong.

Kevin Wright

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Mixing grid and base graphics--need help understanding this quirk

2006-06-27 Thread Kevin Wright

My setup: Windows 2000, R 2.3.1

When I start a brand new session of R and paste the code below into R,
the graphic device shows "Some text" in the lower left corner.  If I
paste the code into the command window again, then "Some text" does
not appear in the lower left corner.  Why is this?

require(grid)
par(mfrow=c(1,2))
plot(1:10)
plot(-10:1)
par(mfrow=c(1,1))
pushViewport(viewport(.04, .04, width=stringWidth("Some text"),
height=unit(2,"lines"),
  name="pagenum", gp=gpar(fontsize=10)))
grid.text("Some text", gp=gpar(col="gray30"),
  just=c("left","bottom"))
popViewport()


Kevin Wright

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] DSC 2005 proceedings?

2006-06-13 Thread Kevin Wright

This page:
http://depts.washington.edu/dsc2005/
says the proceedings for DSC 2005 will be published online and that
the papers were due in final form by September 2005.

Anyone know if the proceedings have been published yet or when (if?)
they will be published?

Thanks.

Kevin Wright

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] RODBC and Excel: Wrong Data Type Assumed on Import

2005-11-03 Thread Kevin Wright

>From my experience (somewhat of a guess):

1.

Excel uses the first 16 rows of data to determine if a column is numeric or
character. The data type which is most common in the first 16 rows will then
be used for the whole column. If you sort the data so that at least the
first 9 rows have character data, you may find this allows the data to be
interpreted as character. There is supposedly a registy setting that can
control how many lines to use (instead of 16), but I have not had success
with the setting. I suspect that ODBC uses JET4, which may be the real
source of the problem. See more here:
http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/

2.

The gregmisc bundle has a different read.xls function that uses a Perl
script (xls2csv) and seems to be safer with mixed-type columns.
Requires a working version of Perl.

Best,

Kevin Wright



The first column in my Excel sheet has mostly numbers but I need to treat it
as character data:

> library(RODBC)
<http://tolstoy.newcastle.edu.au/R/help/05/09/11324.html#14938qlink1>
*> channel <- odbcConnectExcel("U:/efg/lab/R/Plasmid/construct list.xls") *
*> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE) *
*> odbcClose(channel) *

> names(plasmid)

[1] "Plasmid Number" "Plasmid" "Concentration" "Comments" "Lost"

# How is the type decided? I need a character type.
> class(plasmid$"Plasmid Number")

[1] "numeric"
> typeof(plasmid$"Plasmid Number")

[1] "double"

> plasmid$"Plasmid Number"[273:276]

[1] 274 NA NA 276

The two NAs are supposed to be 275a and 275b. I tried the "as.is=TRUE" but
that didn't help.

I consulted Section 4, Relational databases, in the R Data Import/Export
document (for Version 2.2.0).

Section 4.2.2, Data types, was not helpful. In particular, this did not seem
helpful: "The more comprehensive of the R interface packages hide the type
conversion issues from the user."

Section 4.3.2, Package RODBC, provided a "simple example of using ODBC ..
with a(sic) Excel spreadsheet" but is silent on how to control the data type
on import. Could the documentation be expanded to address this issue?

I really need to show "Plasmid 275a" and "Plasmid 275b" instead of "Plasmid
NA".

Thanks for any help with this.

efg

--
Earl F. Glynn
Scientific Programmer
Bioinformatics Department

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] How to create a new data.frame with all possible observations

2005-10-24 Thread Kevin Wright

I would like to use something like expand.grid to get a data.frame with all
possible combinations of model terms from a formula. For example:

R> dat=data.frame(y=rnorm(8),
+ trt=rep(c("A","B"),4),
+ state=c(rep(c('IA','NE'),each=4)),
+ county=c('P','P','S','S','J','J','N','N'))
R> dat=dat[-1,]
R> dat
y trt state county
2 0.36 B IA P
3 0.44 A IA S
4 2.26 B IA S
5 -0.12 A NE J
6 0.84 B NE J
7 -2.11 A NE N
8 -0.69 B NE N

If my model is
model=formula(y~trt+state + county%in%state)

Then I would like to create a data.frame that includes the missing
observation for
A + IA + P %in IA (trt + state + county %in% state)

I currently use expand.grid to get all possible combinations of variables,
then delete impossible combinations like "J %in IA". This works, but is a
bit clumsy and I wonder if there is a more general solution.

Ideas welcome.

Kevin Wright

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Function sort.data.frame

2004-09-24 Thread Kevin Wright


I can never remember how to use "order" to sort the rows of a data frame,
so like any good, lazy programmer, I decided to write my own function.

The idea is to specify a data.frame and a one-sided formula with +/- 
indicating ascending/descending.  For example:
  sort.data.frame(~ +nitro -Variety, Oats)

Since sorting of a data.frame is an oft-asked question on this list, I am
posting my function in hopes that others may find it useful.  

Computing 'on the language' (formulas) is not my strongest point, so the
function can probably be improved.  A similar idea could be used for
matrix objects.  Feedback is welcome.

Kevin Wright




sort.data.frame <- function(form,dat){
  # Author: Kevin Wright
  # Some ideas from Andy Liaw
  #   http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html

  # Use + for ascending, - for decending.  
  # Sorting is left to right in the formula
  
  # Useage is either of the following:
  # sort.data.frame(~Block-Variety,Oats)
  # sort.data.frame(Oats,~-Variety+Block)
  
  # If dat is the formula, then switch form and dat
  if(inherits(dat,"formula")){
f=dat
dat=form
form=f
  }
  if(form[[1]] != "~")
stop("Formula must be one-sided.")

  # Make the formula into character and remove spaces
  formc <- as.character(form[2]) 
  formc <- gsub(" ","",formc) 
  # If the first character is not + or -, add +
  if(!is.element(substring(formc,1,1),c("+","-")))
formc <- paste("+",formc,sep="")
  # Extract the variables from the formula
  vars <- unlist(strsplit(formc, "[\\+\\-]"))
  vars <- vars[vars!=""] # Remove spurious "" terms

  # Build a list of arguments to pass to "order" function
  calllist <- list()
  pos=1 # Position of + or -
  for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
  if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
  else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
  if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
  else
calllist[[i]] <- dat[,vars[i]]
}
  }
  dat[do.call("order",calllist),]

}


d = data.frame(b=factor(c("Hi","Med","Hi","Low"),levels=c("Low","Med","Hi"),
   ordered=TRUE),
   x=c("A","D","A","C"),y=c(8,3,9,9),z=c(1,1,1,2))
sort.data.frame(~-z-b,d)
sort.data.frame(~x+y+z,d)
sort.data.frame(~-x+y+z,d)
sort.data.frame(d,~x-y+z)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Bookmarklet for searching R documentation

2004-06-21 Thread Kevin Wright

I recently posted a request for a Mozilla search engine plugin for the R 
java search applet.  Having recieved no response, I pursued this myself and 
came up with an alternative.  I found a "bookmarklet" that was used to 
submit searches to Amazon.  I modified the code to submit searches to the R 
search engine.  I tested the following using Firefox 0.9 on Windows 2000.

Create a bookmark called "search R doc" (or whatever) with the following 
text as the "location".  (Note, this is one long line)

javascript:(function(){ function getSearchString (promptString) { s = null;if 
(document.selection && document.selection.createRange) { s 
=document.selection.createRange().text; } else if (document.getSelection) { s= 
document.getSelection(); } if (! (s && s.length)) { s =prompt(promptString,''); } 
return s; } searchString = getSearchString('Search R help:'); if (searchString != 
null) { if(searchString.length) { location 
='file://c:/R/rw1090/doc/html/search/SearchObject.html?'+escape(searchString); } else 
{ location ='file://c:/R/rw1090/doc/html/search/SearchEngine.html'; } }  })();
 
 
 
Useage notes:

0. Modify the path (in the script) to SearchObject.html and 
SearchString.html depending on your operating system and R version.

1. If text is highlighted when the bookmark is clicked, the selected text 
is submitted to the search engine without any prompt.

2. If no text is selected, a small prompt dialog requests you to enter a 
search string.  Pressing 'enter' will simply open the search page while entering a 
search word will send the word to the search engine.

3. I prefer to modify one line in the SearchObject.html file so that 
titles, keywords, names are included in the search.  Use this line:
line = line + document.SearchEngine.search (searchstring,true,true,true);
(Note, with a fresh install of R 1.9.0 the only place "SearchObject.html" i
s used is in lattice's lset.html file).  Attention developers: would it 
make sense to modify this file in the CVS tree?  Maybe there is a 
better method for quick searches?

4. The search is a little slow the first time as the java applet loads.

5. Note: Firefox loses hyperlinks when clicking a hyperlink and then 
clicking 'back'.  This has been discussed on the email list.

6. Sit back, relax, and enjoy!  Your mileage may vary, but I like this a lot. 
Thanks to the creator of the original "Search Amazon" bookmarklet.
 
Best,  Kevin Wright

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Mozilla search engine plugin for R java search applet?

2004-06-04 Thread Kevin Wright

I'm interested in adding a toolbar search for my locally installed R 
documentation.  I've pursued doing it myself, but have gotten in over my 
head and thought to ask here if anybody else has done something like this
before I continue this path.

Some things that might be relevant:

Creating Mozilla search plugins
http://mycroft.mozdev.org/deepdocs/quickstart.html

An example of an R search engine
http://www.statslab.cam.ac.uk/~djw1005/Stats/Interests/searchtemplate.html


I'm using Firefox on Windows, but I suspect that's not relevant.

Javascript bookmarklets might also be relevant.

Kevin Wright

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Outlier identification according to Hardin & Rocke (1999)

2004-05-26 Thread Kevin Wright

I'm trying to use a paper by Hardin & Rocke: 
http://handel.cipic.ucdavis.edu/~dmrocke/Robdist5.pdf
as a guide for a function to identify outliers in multivariate data.  Attached below 
is a function that is my attempt to reproduce their method and also a test to see what 
fraction of the data are identified as outliers.  Using this function I am able to 
reproduce their results regarding the asymptotic chi-square method, but not their new 
method using an asymptotic F method.  In particular, the asymptotic F method (and 
adjusted F method) seem to give critical distances that are too large and therefore no 
(or few) points are identified as outliers.

I have tried unsuccessfully to contact the authors of the paper to seek further 
information and / or numerical examples.

I would be most interested if anybody has used the method of Hardin & Rocke or can 
take the function I have provided below and modify it to reproduce their results.  
Note: The mvtnorm package is required.

Best,

Kevin Wright




outliers.id <- function(x, alpha=.05){
  # See: Hardin & Rocke (1999), "The Distribution of Robust Distances"
  # http://handel.cipic.ucdavis.edu/~dmrocke/Robdist5.pdf

  # See: Hardin & Rocke (2002), "Outlier Detection in the Multiple Cluster
  # Setting Using the Minimum Covariance Determinant Estimator"
  # http://bioinfo.cipic.ucdavis.edu/publications/print_pub?pub_id=736&category=1
  
  # Drop factors first
  factors <- names(x)[sapply(x,is.factor)] 
  if(length(factors)>0)
x <- x[-factors]

  # Get the robust location/scale estimates
  require(MASS)
  covResult <- cov.rob(x)
  
  # Calculate the mahalanobis distance for each datum
  distance <- mahalanobis(x,covResult$center,covResult$cov)
  
  n <- nrow(x)
  p <- ncol(x)
  h <- floor((n+p+1)/2)

  # Asymptotic chi-square method (page 11)
  # Often identifies too many points as outliers
  critical.chi <- qchisq(1-alpha, p)
  cat("Chi square critical distance:", critical.chi,"\n")

  # Now the approximate F method.  First estimate c (page 19)
  c <- pchisq(qchisq(1-h/n, p),p+2) / (h/n)
  
  # Now to estimate m (page 22)
  a <- (n-h)/n
  qa <- qchisq(1-a,p)
  ca <- (1-a)/pchisq(qa,p+2)
  c2 <- -pchisq(qa,p+2)/2
  c3 <- -pchisq(qa,p+4)/2
  c4 <- 3 * c3
  b1 <- ca * (c3 - c4) / (1-a)
  b2 <- 0.5 + ca/(1-a) * (c3 - qa/p * (c2 + (1-a)/2))
  v1 <- (1-a) * b1^2 * (a * (ca * qa/p -1)^2 -1) -
2 * c3 * ca^2 * (3* (b1 - p * b2)^2 + (p+2) * b2 * (2*b1 - p*b2))
  v2 <- n * (b1 * (b1 - p*b2) * (1-a))^2 * ca^2
  v <- v1/v2
  m <- 2/(ca^2 * v)

  # (page 17)
  critical.F.asy <- p*m * qf(1-alpha, p, m-p+1) / (c * (m-p+1))
  cat("Asymptotic F critical distance:",critical.F.asy,"\n")
  
  # The small-sample (hundreds of points) adjustment to m.
  # Hardin & Rocke, 2002, page 631
  m <- m * exp(0.725 - 0.00663*p -0.0780 * log(n))
  # Finally, the critical point (using adjusted m)
  critical.F.adj <- p*m*qf(1-alpha, p, m-p+1) / (c * (m-p+1))
  cat("Adjusted asymptotic F critical distance:",critical.F.adj,"\n")
  
  #outliers <- as.integer(distance > critical)
  x$Distances <- distance
  #x$Outliers <- outliers
  attr(x,"critical.chi") <- critical.chi
  attr(x,"critical.F.asy") <- critical.F.asy
  attr(x,"critical.F.adj") <- critical.F.adj
  return(x)
}

# Try to reproduce the tables in the paper
if(FALSE){
  require(mvtnorm)
  # Simulate data
  n <- 500;p <- 5
  dat <- rmvnorm(n,rep(0,p),diag(p))

  # Identify outliers
  dat.out <- outliers.id(as.data.frame(dat),alpha=.05)
  
  # Now see what percent are identified as outliers
  cat("Chi-square \n")
  100*sum(dat.out$Distances>attr(dat.out,"critical.chi"))/n
  cat("Approximate F asymptotic \n")
  100*sum(dat.out$Distances>attr(dat.out,"critical.F.asy"))/n
  cat("Approximate F adjusted \n")
  100*sum(dat.out$Distances>attr(dat.out,"critical.F.adj"))/n
}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Opening help pages in new tab of existing Mozilla Firebird

2004-05-26 Thread Kevin Wright


Subject pretty much says it all.  I currently have options()$browser
set to open help pages in Mozilla Firebird, but it starts a new window 
each time and I would like a new 'tab' in an existing window.

Sorry if this is obvious, but I can't find anything.

Kevin Wright

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] PROC MIXED vs lme. Split-plot with heterogeneous variances.

2003-09-04 Thread Kevin Wright

> A curious difference between SAS and R.  I wonder if anyone can explain it.
> 
> Basic idea: Split-plot design (Male = whole plot, Trt = Sub plot).  Rep is random, 
> Rep*Male variance component is 0 and deleted.  Heterogeneous variances - each Trt 
> has different variance.
> 
> The model with only Male and Trt as fixed effects is the same in SAS and R.  
> 
> When I add Male:Trt interaction, the results of the tests of fixed effects are no 
> longer the same (comparing SAS and R) for Male, but are the same for Trt and 
> Male:Trt.
> 
> Is my model specification incorrect?
> 
> Kevin Wright.  Details follow.
> 
> 
> Model with Male, Trt as fixed effects
> 
> proc mixed data=pollen ;
>   class Trt Rep Male;
>   model Yield = Male Trt;
>   random Rep;
>   repeated / group = Trt;
>   lsmeans Trt Male;
> run;
> 
> Type 3 Tests of Fixed Effects
> 
>   Num Den
> Effect DF  DFF ValuePr > F
> 
> Male1  47   3.640.0624
> Trt 9  47   3.800.0012
> 
> 
> 
> > pollen.hetero<-lme(Yield~Male+Trt,pollen,random=list(Rep=~1),
> +weights=varIdent(form=~1|Trt))
> > 
> > anova(pollen.hetero)
> numDF denDF   F-value p-value
> (Intercept) 147 14.613222  0.0004
> Male147  3.640521  0.0625
> Trt 947  3.797328  0.0012
> 
> 
> 
> Now add Male:Trt interaction as a fixed effect.
> 
> proc mixed data=pollen ;
>   class Trt Rep Male;
>   model Yield = Male Trt Male*Trt;
>   random Rep;
>   repeated / group = Trt;
>   lsmeans Trt Male;
> run;
> 
> 
> Type 3 Tests of Fixed Effects
> 
>   Num Den
> Effect DF  DFF ValuePr > F
> 
> Male1  38   0.390.5384
> Trt 9  38   8.40<.0001
> Trt*Male9  38   2.970.0090
> 
> > pollen.hetero<-lme(Yield~Male+Trt+Male:Trt,pollen,random=list(Rep=~1),
> +weights=varIdent(form=~1|Trt))
> > 
> > anova(pollen.hetero)
> numDF denDF   F-value p-value
> (Intercept) 138 27.007016  <.0001
> Male138  8.431796  0.0061
> Trt 938  8.396913  <.0001
> Male:Trt938  2.964672  0.0090
> 
> 
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Looking for graphics/par cheat sheet

2003-07-03 Thread Kevin Wright


I've searched for a while and have not been able to find a succinct, one-page
guide to the graphics parameters that control the layout of plots.  I was
thinking about creating one, but thought I would see if this has already
been done.  

I'm thinking more about a visual guide to the pare parameters than a text
description.

Kevin Wright

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] How to auto-scale cex of y-axis labels in lattice dotplot?

[R] Confidence interval for coefficient of variation

Re: [R] Sorting dataframe by different columns

Re: [R] sorting data in R

Re: [R] using alpha transparency for lines in levelplot - SUMMARY

[R] Next useR conference date/location

[R] lme4 and lmeSplines

[R] Mixing grid and base graphics--need help understanding this quirk

[R] DSC 2005 proceedings?

Re: [R] RODBC and Excel: Wrong Data Type Assumed on Import

[R] How to create a new data.frame with all possible observations

[R] Function sort.data.frame

[R] Bookmarklet for searching R documentation

[R] Mozilla search engine plugin for R java search applet?

[R] Outlier identification according to Hardin & Rocke (1999)

[R] Opening help pages in new tab of existing Mozilla Firebird

[R] PROC MIXED vs lme. Split-plot with heterogeneous variances.

[R] Looking for graphics/par cheat sheet

18 matches

Site Navigation

Mail list logo

Footer information