Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
Hello Greg, Kevin, Jim and other R-help members, Regarding text spacing: Drew Conway published today a fascinating post about "Building a Better Word Cloud". I don't know if his function can help you (or if either of you might help him with his code). But either way, I think it's worth reading his post: http://www.drewconway.com/zia/?p=2624 Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Jan 28, 2011 at 11:37 AM, Jim Lemon wrote: > On 01/28/2011 07:57 AM, Greg Snow wrote: > >> Try: >> >> library(TeachingDemos) >> >> plot(Sepal.Length~Sepal.Width, data=iris) >> >> tmp.y<- iris$Sepal.Length >> for( i in unique(iris$Sepal.Width) ) { >>tmp<- iris$Sepal.Width == i >>tmp.y[ tmp ]<- spread.labs( tmp.y[tmp], .6*strheight('A'), >>maxiter=1000 ) >> } >> >> # optional >> with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) ) >> >> with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) ) >> >> >> There is also thigmophobe.labels in the plotrix package which is simpler >> and works well for some plots >> >> Alas, I tried thigmophobe.labels and there are just too many points. > The best I could do was this: > > irisxy<-cluster.overplot(iris$Sepal.Width,iris$Sepal.Length) > plot(irisxy) > text(irisxy$x,irisxy$y-0.04,labels=1:150,cex=0.5) > > which, sad to say, ain't too good. > > Jim > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
On 01/28/2011 07:57 AM, Greg Snow wrote: Try: library(TeachingDemos) plot(Sepal.Length~Sepal.Width, data=iris) tmp.y<- iris$Sepal.Length for( i in unique(iris$Sepal.Width) ) { tmp<- iris$Sepal.Width == i tmp.y[ tmp ]<- spread.labs( tmp.y[tmp], .6*strheight('A'), maxiter=1000 ) } # optional with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) ) with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) ) There is also thigmophobe.labels in the plotrix package which is simpler and works well for some plots Alas, I tried thigmophobe.labels and there are just too many points. The best I could do was this: irisxy<-cluster.overplot(iris$Sepal.Width,iris$Sepal.Length) plot(irisxy) text(irisxy$x,irisxy$y-0.04,labels=1:150,cex=0.5) which, sad to say, ain't too good. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
Try: library(TeachingDemos) plot(Sepal.Length~Sepal.Width, data=iris) tmp.y <- iris$Sepal.Length for( i in unique(iris$Sepal.Width) ) { tmp <- iris$Sepal.Width == i tmp.y[ tmp ] <- spread.labs( tmp.y[tmp], .6*strheight('A'), maxiter=1000 ) } # optional with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) ) with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) ) There is also thigmophobe.labels in the plotrix package which is simpler and works well for some plots Also look at dynIdentify (Windows only) and TkIdentify (all platforms) in the TeachingDemosp package for a way to interactively place the labels (little more work, but labels end up where you think they look best). I have experimented with spreading simultaneously in 2 directions, but what works well for one case does lousy in another and what ends up working for the other doesn't work in the first case. But I would argue against labeling all the points in a plot of that many points, they make it too busy and distract more than help. HWidentify (windows) and HTKidentify (all platforms) in TeachingDemos give another option. Sometimes just using different colors/symbols/etc. for groups of points gives more useful information than labels. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: Kevin Wright [mailto:kw.s...@gmail.com] > Sent: Thursday, January 27, 2011 10:27 AM > To: Tal Galili > Cc: Greg Snow; r-help@r-project.org > Subject: Re: [R] boxplot - code for labeling outliers - any suggestions > for improvements? > > My colleagues that use one of the .Net languages/libraries can make > scatter plots that look better than R's because they have better > spreading of the labels. > > If someone could spread this labels on the following graph, I would be > impressed. > > plot(Sepal.Length~Sepal.Width, data=iris) > with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5)) > > Kevin > > > On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili > wrote: > > Thanks again for the pointer to spread.labs Greg. > > > > I implemented it into the function and also extended it to deal with > > formulas so it could behave just like boxplot. > > Code and examples are available here: > > http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in- > a-boxplot/ > > > > I'd be happy for any suggestions on how to improve it. > > > > Best, > > Tal > > > > > > > > Contact > > Details:--- > > Contact me: tal.gal...@gmail.com | 972-52-7275845 > > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il > (Hebrew) | > > www.r-statistics.com (English) > > - > - > > > > > > > > > > On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow > wrote: > > > >> For the last point (cluttered text), look at spread.labels in the > plotrix > >> package and spread.labs in the TeachingDemos package (I favor the > later, but > >> could be slightly biased as well). Doing more than what those 2 > functions > >> do becomes really complicated really fast. > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> Statistical Data Center > >> Intermountain Healthcare > >> greg.s...@imail.org > >> 801.408.8111 > >> > >> > >> > -Original Message- > >> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > >> > project.org] On Behalf Of Tal Galili > >> > Sent: Wednesday, January 26, 2011 4:05 PM > >> > To: r-help@r-project.org > >> > Subject: [R] boxplot - code for labeling outliers - any > suggestions for > >> > improvements? > >> > > >> > Hello all, > >> > I wrote a small function to add labels for outliers in a boxplot. > >> > This function will only work on a simple boxplot/formula command > (e.g: > >> > something like boxplot(y~x)). > >> > > >> > Code + example follows in this e-mail. > >> > > >> > I'd be happy for any suggestions on how to improve this code, for > >> > example: > >> > > >> > - Handle boxplot.matrix (which shouldn't be too hard to do) > >> > - Handle cases of complex functions (e.g: boxplot(y~a*b)) > >> > - Handle
Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
My colleagues that use one of the .Net languages/libraries can make scatter plots that look better than R's because they have better spreading of the labels. If someone could spread this labels on the following graph, I would be impressed. plot(Sepal.Length~Sepal.Width, data=iris) with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5)) Kevin On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili wrote: > Thanks again for the pointer to spread.labs Greg. > > I implemented it into the function and also extended it to deal with > formulas so it could behave just like boxplot. > Code and examples are available here: > http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/ > > I'd be happy for any suggestions on how to improve it. > > Best, > Tal > > > > Contact > Details:--- > Contact me: tal.gal...@gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > -- > > > > > On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow wrote: > >> For the last point (cluttered text), look at spread.labels in the plotrix >> package and spread.labs in the TeachingDemos package (I favor the later, but >> could be slightly biased as well). Doing more than what those 2 functions >> do becomes really complicated really fast. >> >> -- >> Gregory (Greg) L. Snow Ph.D. >> Statistical Data Center >> Intermountain Healthcare >> greg.s...@imail.org >> 801.408.8111 >> >> >> > -Original Message- >> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >> > project.org] On Behalf Of Tal Galili >> > Sent: Wednesday, January 26, 2011 4:05 PM >> > To: r-help@r-project.org >> > Subject: [R] boxplot - code for labeling outliers - any suggestions for >> > improvements? >> > >> > Hello all, >> > I wrote a small function to add labels for outliers in a boxplot. >> > This function will only work on a simple boxplot/formula command (e.g: >> > something like boxplot(y~x)). >> > >> > Code + example follows in this e-mail. >> > >> > I'd be happy for any suggestions on how to improve this code, for >> > example: >> > >> > - Handle boxplot.matrix (which shouldn't be too hard to do) >> > - Handle cases of complex functions (e.g: boxplot(y~a*b)) >> > - Handle cases where there are many outliers leading to a clutter of >> > text >> > (to this I have no idea how to systematically solve) >> > >> > >> > Best, >> > Tal >> > -- >> > >> > >> > # the function >> > boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name) >> > { >> > >> > >> > boxplot.outlier.data <- function(xx, y_name) >> > { >> > y <- xx[,y_name] >> > boxplot_range <- range(boxplot.stats(y)$stats) >> > ss <- (y < boxplot_range[1]) | (y > boxplot_range[2]) >> > return(xx[ss,]) >> > } >> > >> > require(plyr) >> > txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data, >> > y_name >> > = y_name)", sep = "") >> > ourlier_df <- eval(parse(text = txt_to_run)) >> > # head(ourlier_df) >> > txt_to_run <- paste("formula(",y_name,"~", x_name,")") >> > formu <- eval(parse(text = txt_to_run)) >> > boxdata <- boxplot(formu , data = DATA, plot = F) >> > boxdata_group_name <- boxdata$names[boxdata$group] >> > boxdata_outlier_df <- data.frame(group = boxdata_group_name, y = >> > boxdata$out, x = boxdata$group) >> > for(i in seq_len(dim(boxdata_outlier_df)[1])) >> > { >> > ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) & >> > (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y) >> > current_label <- ourlier_df[ss,label_name] >> > temp_x <- boxdata_outlier_df[i,"x"] >> > temp_y <- boxdata_outlier_df[i,"y"] >> > text(temp_x, temp_y, current_label,pos=4) >> > } >> > >> > list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df) >> > } >> > >> > # example: >> > boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col = >> > "bisque") >> > boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease", >> > "colpos") >> > >> > >> > >> > >> > Contact >> > Details:--- >> > Contact me: tal.gal...@gmail.com | 972-52-7275845 >> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) >> > | >> > www.r-statistics.com (English) >> > --- >> > --- >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting- >> > guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >
Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
Thanks again for the pointer to spread.labs Greg. I implemented it into the function and also extended it to deal with formulas so it could behave just like boxplot. Code and examples are available here: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/ I'd be happy for any suggestions on how to improve it. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow wrote: > For the last point (cluttered text), look at spread.labels in the plotrix > package and spread.labs in the TeachingDemos package (I favor the later, but > could be slightly biased as well). Doing more than what those 2 functions > do becomes really complicated really fast. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > project.org] On Behalf Of Tal Galili > > Sent: Wednesday, January 26, 2011 4:05 PM > > To: r-help@r-project.org > > Subject: [R] boxplot - code for labeling outliers - any suggestions for > > improvements? > > > > Hello all, > > I wrote a small function to add labels for outliers in a boxplot. > > This function will only work on a simple boxplot/formula command (e.g: > > something like boxplot(y~x)). > > > > Code + example follows in this e-mail. > > > > I'd be happy for any suggestions on how to improve this code, for > > example: > > > >- Handle boxplot.matrix (which shouldn't be too hard to do) > >- Handle cases of complex functions (e.g: boxplot(y~a*b)) > >- Handle cases where there are many outliers leading to a clutter of > > text > >(to this I have no idea how to systematically solve) > > > > > > Best, > > Tal > > -- > > > > > > # the function > > boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name) > > { > > > > > > boxplot.outlier.data <- function(xx, y_name) > > { > > y <- xx[,y_name] > > boxplot_range <- range(boxplot.stats(y)$stats) > > ss <- (y < boxplot_range[1]) | (y > boxplot_range[2]) > > return(xx[ss,]) > > } > > > > require(plyr) > > txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data, > > y_name > > = y_name)", sep = "") > > ourlier_df <- eval(parse(text = txt_to_run)) > > # head(ourlier_df) > > txt_to_run <- paste("formula(",y_name,"~", x_name,")") > > formu <- eval(parse(text = txt_to_run)) > > boxdata <- boxplot(formu , data = DATA, plot = F) > > boxdata_group_name <- boxdata$names[boxdata$group] > > boxdata_outlier_df <- data.frame(group = boxdata_group_name, y = > > boxdata$out, x = boxdata$group) > > for(i in seq_len(dim(boxdata_outlier_df)[1])) > > { > > ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) & > > (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y) > > current_label <- ourlier_df[ss,label_name] > > temp_x <- boxdata_outlier_df[i,"x"] > > temp_y <- boxdata_outlier_df[i,"y"] > > text(temp_x, temp_y, current_label,pos=4) > > } > > > > list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df) > > } > > > > # example: > > boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col = > > "bisque") > > boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease", > > "colpos") > > > > > > > > > > Contact > > Details:--- > > Contact me: tal.gal...@gmail.com | 972-52-7275845 > > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) > > | > > www.r-statistics.com (English) > > --- > > --- > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?
For the last point (cluttered text), look at spread.labels in the plotrix package and spread.labs in the TeachingDemos package (I favor the later, but could be slightly biased as well). Doing more than what those 2 functions do becomes really complicated really fast. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Tal Galili > Sent: Wednesday, January 26, 2011 4:05 PM > To: r-help@r-project.org > Subject: [R] boxplot - code for labeling outliers - any suggestions for > improvements? > > Hello all, > I wrote a small function to add labels for outliers in a boxplot. > This function will only work on a simple boxplot/formula command (e.g: > something like boxplot(y~x)). > > Code + example follows in this e-mail. > > I'd be happy for any suggestions on how to improve this code, for > example: > >- Handle boxplot.matrix (which shouldn't be too hard to do) >- Handle cases of complex functions (e.g: boxplot(y~a*b)) >- Handle cases where there are many outliers leading to a clutter of > text >(to this I have no idea how to systematically solve) > > > Best, > Tal > -- > > > # the function > boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name) > { > > > boxplot.outlier.data <- function(xx, y_name) > { > y <- xx[,y_name] > boxplot_range <- range(boxplot.stats(y)$stats) > ss <- (y < boxplot_range[1]) | (y > boxplot_range[2]) > return(xx[ss,]) > } > > require(plyr) > txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data, > y_name > = y_name)", sep = "") > ourlier_df <- eval(parse(text = txt_to_run)) > # head(ourlier_df) > txt_to_run <- paste("formula(",y_name,"~", x_name,")") > formu <- eval(parse(text = txt_to_run)) > boxdata <- boxplot(formu , data = DATA, plot = F) > boxdata_group_name <- boxdata$names[boxdata$group] > boxdata_outlier_df <- data.frame(group = boxdata_group_name, y = > boxdata$out, x = boxdata$group) > for(i in seq_len(dim(boxdata_outlier_df)[1])) > { > ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) & > (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y) > current_label <- ourlier_df[ss,label_name] > temp_x <- boxdata_outlier_df[i,"x"] > temp_y <- boxdata_outlier_df[i,"y"] > text(temp_x, temp_y, current_label,pos=4) > } > > list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df) > } > > # example: > boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col = > "bisque") > boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease", > "colpos") > > > > > Contact > Details:--- > Contact me: tal.gal...@gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) > | > www.r-statistics.com (English) > --- > --- > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.