Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-28 Thread Jim Lemon

On 01/28/2011 07:57 AM, Greg Snow wrote:

Try:

library(TeachingDemos)

plot(Sepal.Length~Sepal.Width, data=iris)

tmp.y- iris$Sepal.Length
for( i in unique(iris$Sepal.Width) ) {
tmp- iris$Sepal.Width == i
tmp.y[ tmp ]- spread.labs( tmp.y[tmp], .6*strheight('A'),
maxiter=1000 )
}

# optional
with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) )

with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) )


There is also thigmophobe.labels in the plotrix package which is simpler and 
works well for some plots


Alas, I tried thigmophobe.labels and there are just too many points.
The best I could do was this:

irisxy-cluster.overplot(iris$Sepal.Width,iris$Sepal.Length)
plot(irisxy)
text(irisxy$x,irisxy$y-0.04,labels=1:150,cex=0.5)

which, sad to say, ain't too good.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-28 Thread Tal Galili
Hello Greg, Kevin, Jim and other R-help members,

Regarding text spacing:
Drew Conway published today a fascinating post about Building a Better Word
Cloud.
I don't know if his function can help you (or if either of you might help
him with his code).
But either way, I think it's worth reading his post:
http://www.drewconway.com/zia/?p=2624


Cheers,
Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Jan 28, 2011 at 11:37 AM, Jim Lemon j...@bitwrit.com.au wrote:

 On 01/28/2011 07:57 AM, Greg Snow wrote:

 Try:

 library(TeachingDemos)

 plot(Sepal.Length~Sepal.Width, data=iris)

 tmp.y- iris$Sepal.Length
 for( i in unique(iris$Sepal.Width) ) {
tmp- iris$Sepal.Width == i
tmp.y[ tmp ]- spread.labs( tmp.y[tmp], .6*strheight('A'),
maxiter=1000 )
 }

 # optional
 with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) )

 with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) )


 There is also thigmophobe.labels in the plotrix package which is simpler
 and works well for some plots

  Alas, I tried thigmophobe.labels and there are just too many points.
 The best I could do was this:

 irisxy-cluster.overplot(iris$Sepal.Width,iris$Sepal.Length)
 plot(irisxy)
 text(irisxy$x,irisxy$y-0.04,labels=1:150,cex=0.5)

 which, sad to say, ain't too good.

 Jim


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-27 Thread Tal Galili
Thanks again for the pointer to spread.labs Greg.

I implemented it into the function and also extended it to deal with
formulas so it could behave just like boxplot.
Code and examples are available here:
http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/

I'd be happy for any suggestions on how to improve it.

Best,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow greg.s...@imail.org wrote:

 For the last point (cluttered text), look at spread.labels in the plotrix
 package and spread.labs in the TeachingDemos package (I favor the later, but
 could be slightly biased as well).  Doing more than what those 2 functions
 do becomes really complicated really fast.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Tal Galili
  Sent: Wednesday, January 26, 2011 4:05 PM
  To: r-help@r-project.org
  Subject: [R] boxplot - code for labeling outliers - any suggestions for
  improvements?
 
  Hello all,
  I wrote a small function to add labels for outliers in a boxplot.
  This function will only work on a simple boxplot/formula command (e.g:
  something like boxplot(y~x)).
 
  Code + example follows in this e-mail.
 
  I'd be happy for any suggestions on how to improve this code, for
  example:
 
 - Handle boxplot.matrix (which shouldn't be too hard to do)
 - Handle cases of complex functions (e.g: boxplot(y~a*b))
 - Handle cases where there are many outliers leading to a clutter of
  text
 (to this I have no idea how to systematically solve)
 
 
  Best,
  Tal
  --
 
 
  # the function
  boxplot.add.outlier.text - function(DATA, x_name, y_name, label_name)
  {
 
 
  boxplot.outlier.data - function(xx, y_name)
  {
   y - xx[,y_name]
  boxplot_range - range(boxplot.stats(y)$stats)
  ss - (y  boxplot_range[1]) | (y  boxplot_range[2])
   return(xx[ss,])
  }
 
  require(plyr)
  txt_to_run - paste(ddply(DATA, .(,x_name,), boxplot.outlier.data,
  y_name
  = y_name), sep = )
   ourlier_df - eval(parse(text = txt_to_run))
  # head(ourlier_df)
   txt_to_run - paste(formula(,y_name,~, x_name,))
   formu - eval(parse(text = txt_to_run))
  boxdata - boxplot(formu , data = DATA, plot = F)
   boxdata_group_name - boxdata$names[boxdata$group]
  boxdata_outlier_df - data.frame(group = boxdata_group_name, y =
  boxdata$out, x = boxdata$group)
   for(i in seq_len(dim(boxdata_outlier_df)[1]))
  {
   ss - (ourlier_df[,x_name]  %in% boxdata_outlier_df[i,]$group) 
  (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
  current_label - ourlier_df[ss,label_name]
   temp_x - boxdata_outlier_df[i,x]
  temp_y - boxdata_outlier_df[i,y]
   text(temp_x, temp_y, current_label,pos=4)
  }
 
  list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
  }
 
  # example:
  boxplot(decrease ~ treatment, data = OrchardSprays, log = y, col =
  bisque)
  boxplot.add.outlier.text(OrchardSprays, treatment, decrease,
  colpos)
 
 
 
 
  Contact
  Details:---
  Contact me: tal.gal...@gmail.com |  972-52-7275845
  Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
  |
  www.r-statistics.com (English)
  ---
  ---
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-27 Thread Kevin Wright
My colleagues that use one of the .Net languages/libraries can make
scatter plots that look better than R's because they have better
spreading of the labels.

If someone could spread this labels on the following graph, I would be
impressed.

plot(Sepal.Length~Sepal.Width, data=iris)
with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5))

Kevin


On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili tal.gal...@gmail.com wrote:
 Thanks again for the pointer to spread.labs Greg.

 I implemented it into the function and also extended it to deal with
 formulas so it could behave just like boxplot.
 Code and examples are available here:
 http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/

 I'd be happy for any suggestions on how to improve it.

 Best,
 Tal



 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow greg.s...@imail.org wrote:

 For the last point (cluttered text), look at spread.labels in the plotrix
 package and spread.labs in the TeachingDemos package (I favor the later, but
 could be slightly biased as well).  Doing more than what those 2 functions
 do becomes really complicated really fast.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Tal Galili
  Sent: Wednesday, January 26, 2011 4:05 PM
  To: r-help@r-project.org
  Subject: [R] boxplot - code for labeling outliers - any suggestions for
  improvements?
 
  Hello all,
  I wrote a small function to add labels for outliers in a boxplot.
  This function will only work on a simple boxplot/formula command (e.g:
  something like boxplot(y~x)).
 
  Code + example follows in this e-mail.
 
  I'd be happy for any suggestions on how to improve this code, for
  example:
 
     - Handle boxplot.matrix (which shouldn't be too hard to do)
     - Handle cases of complex functions (e.g: boxplot(y~a*b))
     - Handle cases where there are many outliers leading to a clutter of
  text
     (to this I have no idea how to systematically solve)
 
 
  Best,
  Tal
  --
 
 
  # the function
  boxplot.add.outlier.text - function(DATA, x_name, y_name, label_name)
  {
 
 
  boxplot.outlier.data - function(xx, y_name)
  {
   y - xx[,y_name]
  boxplot_range - range(boxplot.stats(y)$stats)
  ss - (y  boxplot_range[1]) | (y  boxplot_range[2])
   return(xx[ss,])
  }
 
  require(plyr)
  txt_to_run - paste(ddply(DATA, .(,x_name,), boxplot.outlier.data,
  y_name
  = y_name), sep = )
   ourlier_df - eval(parse(text = txt_to_run))
  # head(ourlier_df)
   txt_to_run - paste(formula(,y_name,~, x_name,))
   formu - eval(parse(text = txt_to_run))
  boxdata - boxplot(formu , data = DATA, plot = F)
   boxdata_group_name - boxdata$names[boxdata$group]
  boxdata_outlier_df - data.frame(group = boxdata_group_name, y =
  boxdata$out, x = boxdata$group)
   for(i in seq_len(dim(boxdata_outlier_df)[1]))
  {
   ss - (ourlier_df[,x_name]  %in% boxdata_outlier_df[i,]$group) 
  (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
  current_label - ourlier_df[ss,label_name]
   temp_x - boxdata_outlier_df[i,x]
  temp_y - boxdata_outlier_df[i,y]
   text(temp_x, temp_y, current_label,pos=4)
  }
 
  list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
  }
 
  # example:
  boxplot(decrease ~ treatment, data = OrchardSprays, log = y, col =
  bisque)
  boxplot.add.outlier.text(OrchardSprays, treatment, decrease,
  colpos)
 
 
 
 
  Contact
  Details:---
  Contact me: tal.gal...@gmail.com |  972-52-7275845
  Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
  |
  www.r-statistics.com (English)
  ---
  ---
 
        [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Kevin Wright

__
R-help@r-project.org 

Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-27 Thread Greg Snow
Try:

library(TeachingDemos)

plot(Sepal.Length~Sepal.Width, data=iris)

tmp.y - iris$Sepal.Length
for( i in unique(iris$Sepal.Width) ) {
tmp - iris$Sepal.Width == i
tmp.y[ tmp ] - spread.labs( tmp.y[tmp], .6*strheight('A'),
maxiter=1000 )
}

# optional
with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) )

with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) )


There is also thigmophobe.labels in the plotrix package which is simpler and 
works well for some plots

Also look at dynIdentify (Windows only) and TkIdentify (all platforms) in the 
TeachingDemosp package for a way to interactively place the labels (little more 
work, but labels end up where you think they look best).

I have experimented with spreading simultaneously in 2 directions, but what 
works well for one case does lousy in another and what ends up working for the 
other doesn't work in the first case.

But I would argue against labeling all the points in a plot of that many 
points, they make it too busy and distract more than help.  HWidentify 
(windows) and HTKidentify (all platforms) in TeachingDemos give another option. 
 Sometimes just using different colors/symbols/etc. for groups of points gives 
more useful information than labels.

Hope this helps,





-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Kevin Wright [mailto:kw.s...@gmail.com]
 Sent: Thursday, January 27, 2011 10:27 AM
 To: Tal Galili
 Cc: Greg Snow; r-help@r-project.org
 Subject: Re: [R] boxplot - code for labeling outliers - any suggestions
 for improvements?
 
 My colleagues that use one of the .Net languages/libraries can make
 scatter plots that look better than R's because they have better
 spreading of the labels.
 
 If someone could spread this labels on the following graph, I would be
 impressed.
 
 plot(Sepal.Length~Sepal.Width, data=iris)
 with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5))
 
 Kevin
 
 
 On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili tal.gal...@gmail.com
 wrote:
  Thanks again for the pointer to spread.labs Greg.
 
  I implemented it into the function and also extended it to deal with
  formulas so it could behave just like boxplot.
  Code and examples are available here:
  http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-
 a-boxplot/
 
  I'd be happy for any suggestions on how to improve it.
 
  Best,
  Tal
 
 
 
  Contact
  Details:---
  Contact me: tal.gal...@gmail.com |  972-52-7275845
  Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il
 (Hebrew) |
  www.r-statistics.com (English)
  -
 -
 
 
 
 
  On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow greg.s...@imail.org
 wrote:
 
  For the last point (cluttered text), look at spread.labels in the
 plotrix
  package and spread.labs in the TeachingDemos package (I favor the
 later, but
  could be slightly biased as well).  Doing more than what those 2
 functions
  do becomes really complicated really fast.
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
   -Original Message-
   From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
   project.org] On Behalf Of Tal Galili
   Sent: Wednesday, January 26, 2011 4:05 PM
   To: r-help@r-project.org
   Subject: [R] boxplot - code for labeling outliers - any
 suggestions for
   improvements?
  
   Hello all,
   I wrote a small function to add labels for outliers in a boxplot.
   This function will only work on a simple boxplot/formula command
 (e.g:
   something like boxplot(y~x)).
  
   Code + example follows in this e-mail.
  
   I'd be happy for any suggestions on how to improve this code, for
   example:
  
      - Handle boxplot.matrix (which shouldn't be too hard to do)
      - Handle cases of complex functions (e.g: boxplot(y~a*b))
      - Handle cases where there are many outliers leading to a
 clutter of
   text
      (to this I have no idea how to systematically solve)
  
  
   Best,
   Tal
   --
  
  
   # the function
   boxplot.add.outlier.text - function(DATA, x_name, y_name,
 label_name)
   {
  
  
   boxplot.outlier.data - function(xx, y_name)
   {
    y - xx[,y_name]
   boxplot_range - range(boxplot.stats(y)$stats)
   ss - (y  boxplot_range[1]) | (y  boxplot_range[2])
    return(xx[ss,])
   }
  
   require(plyr)
   txt_to_run - paste(ddply(DATA, .(,x_name,),
 boxplot.outlier.data,
   y_name
   = y_name), sep = )
    ourlier_df - eval(parse(text = txt_to_run))
   # head(ourlier_df)
    txt_to_run - paste(formula(,y_name,~, x_name,))
    formu - eval(parse(text = txt_to_run))
   boxdata - boxplot(formu , data = DATA, plot = F

Re: [R] boxplot - code for labeling outliers - any suggestions for improvements?

2011-01-26 Thread Greg Snow
For the last point (cluttered text), look at spread.labels in the plotrix 
package and spread.labs in the TeachingDemos package (I favor the later, but 
could be slightly biased as well).  Doing more than what those 2 functions do 
becomes really complicated really fast.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Tal Galili
 Sent: Wednesday, January 26, 2011 4:05 PM
 To: r-help@r-project.org
 Subject: [R] boxplot - code for labeling outliers - any suggestions for
 improvements?
 
 Hello all,
 I wrote a small function to add labels for outliers in a boxplot.
 This function will only work on a simple boxplot/formula command (e.g:
 something like boxplot(y~x)).
 
 Code + example follows in this e-mail.
 
 I'd be happy for any suggestions on how to improve this code, for
 example:
 
- Handle boxplot.matrix (which shouldn't be too hard to do)
- Handle cases of complex functions (e.g: boxplot(y~a*b))
- Handle cases where there are many outliers leading to a clutter of
 text
(to this I have no idea how to systematically solve)
 
 
 Best,
 Tal
 --
 
 
 # the function
 boxplot.add.outlier.text - function(DATA, x_name, y_name, label_name)
 {
 
 
 boxplot.outlier.data - function(xx, y_name)
 {
  y - xx[,y_name]
 boxplot_range - range(boxplot.stats(y)$stats)
 ss - (y  boxplot_range[1]) | (y  boxplot_range[2])
  return(xx[ss,])
 }
 
 require(plyr)
 txt_to_run - paste(ddply(DATA, .(,x_name,), boxplot.outlier.data,
 y_name
 = y_name), sep = )
  ourlier_df - eval(parse(text = txt_to_run))
 # head(ourlier_df)
  txt_to_run - paste(formula(,y_name,~, x_name,))
  formu - eval(parse(text = txt_to_run))
 boxdata - boxplot(formu , data = DATA, plot = F)
  boxdata_group_name - boxdata$names[boxdata$group]
 boxdata_outlier_df - data.frame(group = boxdata_group_name, y =
 boxdata$out, x = boxdata$group)
  for(i in seq_len(dim(boxdata_outlier_df)[1]))
 {
  ss - (ourlier_df[,x_name]  %in% boxdata_outlier_df[i,]$group) 
 (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
 current_label - ourlier_df[ss,label_name]
  temp_x - boxdata_outlier_df[i,x]
 temp_y - boxdata_outlier_df[i,y]
  text(temp_x, temp_y, current_label,pos=4)
 }
 
 list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
 }
 
 # example:
 boxplot(decrease ~ treatment, data = OrchardSprays, log = y, col =
 bisque)
 boxplot.add.outlier.text(OrchardSprays, treatment, decrease,
 colpos)
 
 
 
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
 |
 www.r-statistics.com (English)
 ---
 ---
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.