Re: [R] compare histograms
On Fri, Oct 15, 2010 at 2:47 AM, Michael Bedward michael.bedw...@gmail.comwrote: Hi Rainer, Great - many thanks for that. Yes, I'm using OSX I initially tried to use install.packages to get get a pre-built binary of earthmovdist from Rforge, but it failed with... In getDependencies(pkgs, dependencies, available, lib) : package earthmovdist is not available Yes - we had some problems with getting the package build for OSX, but we (more specifically Dirk) are working on that. When I tried installing with type=source this was also failing. However, after reading your post I looked at the error messages properly and it turned out to be a trivial problem. The .First function defined in my .Rprofile was printing some text to the console with cat() which was being incorrectly picked up by the package build as if it was a makefile argument. When I commented out the call to cat the package installed successfully. I haven't had this problem installing other packages from source so I think there must be a little problem with your setup (?) Thanks for letting us know - could you send us the offending entry in your .Rprofile (or the whole .Rprofile?), so that we can see if it is an OSX or general problem? Now that it's installed I look forward to trying it out shortly. Great - please give us some feedback on what you think about it. Cheers, Rainer Thanks again. Michael On 15 October 2010 03:17, Rainer M Krug r.m.k...@gmail.com wrote: On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward michael.bedw...@gmail.com wrote: Hi Juan, Yes, you can use EMD to quantify the difference between any pair of histograms regardless of their shape. The only constraint, at least the way that I've done it previously, is to have compatible bins. The original application of EMD was to compare images based on colour histograms which could have all sorts of shapes. I looked at the package that Dennis alerted me to on RForge but unfortunately it seems to be inactive No - well, it depends how you define inactive: the functionality we wanted to include is included, therefore no further development was necessary. and the nightly builds are broken. I've downloaded the source code and will have a look at it sometime in the next few days. Thanks for alerting us - we will look into that. But just don't use the nightly builds, as they are not different to the last release. Just download the package for your system (I assume Windows or mac, as I just installed from source without problems under Linux). Let me know if it doesn't work, Cheers, Rainer Meanwhile, let me know if you want a copy of my own code. It uses the lpSolve package. Michael On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote: Hi Michael, I have the same challenge, can you use this earth movers distance it to compare bimodal distributions? Thanks cheers, Juan On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be
Re: [R] compare histograms
On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward michael.bedw...@gmail.comwrote: Hi Juan, Yes, you can use EMD to quantify the difference between any pair of histograms regardless of their shape. The only constraint, at least the way that I've done it previously, is to have compatible bins. The original application of EMD was to compare images based on colour histograms which could have all sorts of shapes. I looked at the package that Dennis alerted me to on RForge but unfortunately it seems to be inactive No - well, it depends how you define inactive: the functionality we wanted to include is included, therefore no further development was necessary. and the nightly builds are broken. I've downloaded the source code and will have a look at it sometime in the next few days. Thanks for alerting us - we will look into that. But just don't use the nightly builds, as they are not different to the last release. Just download the package for your system (I assume Windows or mac, as I just installed from source without problems under Linux). Let me know if it doesn't work, Cheers, Rainer Meanwhile, let me know if you want a copy of my own code. It uses the lpSolve package. Michael On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote: Hi Michael, I have the same challenge, can you use this earth movers distance it to compare bimodal distributions? Thanks cheers, Juan On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] compare histograms
Hi Rainer, Great - many thanks for that. Yes, I'm using OSX I initially tried to use install.packages to get get a pre-built binary of earthmovdist from Rforge, but it failed with... In getDependencies(pkgs, dependencies, available, lib) : package ‘earthmovdist’ is not available When I tried installing with type=source this was also failing. However, after reading your post I looked at the error messages properly and it turned out to be a trivial problem. The .First function defined in my .Rprofile was printing some text to the console with cat() which was being incorrectly picked up by the package build as if it was a makefile argument. When I commented out the call to cat the package installed successfully. I haven't had this problem installing other packages from source so I think there must be a little problem with your setup (?) Now that it's installed I look forward to trying it out shortly. Thanks again. Michael On 15 October 2010 03:17, Rainer M Krug r.m.k...@gmail.com wrote: On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward michael.bedw...@gmail.com wrote: Hi Juan, Yes, you can use EMD to quantify the difference between any pair of histograms regardless of their shape. The only constraint, at least the way that I've done it previously, is to have compatible bins. The original application of EMD was to compare images based on colour histograms which could have all sorts of shapes. I looked at the package that Dennis alerted me to on RForge but unfortunately it seems to be inactive No - well, it depends how you define inactive: the functionality we wanted to include is included, therefore no further development was necessary. and the nightly builds are broken. I've downloaded the source code and will have a look at it sometime in the next few days. Thanks for alerting us - we will look into that. But just don't use the nightly builds, as they are not different to the last release. Just download the package for your system (I assume Windows or mac, as I just installed from source without problems under Linux). Let me know if it doesn't work, Cheers, Rainer Meanwhile, let me know if you want a copy of my own code. It uses the lpSolve package. Michael On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote: Hi Michael, I have the same challenge, can you use this earth movers distance it to compare bimodal distributions? Thanks cheers, Juan On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare
Re: [R] compare histograms
Hi: This recent thread revealed that a package on R-forge for calculating earth movers distance is available: http://r.789695.n4.nabble.com/Measure-Difference-Between-Two-Distributions-td2712281.html#a2713505 HTH, Dennis On Tue, Oct 12, 2010 at 7:39 PM, Michael Bedward michael.bedw...@gmail.comwrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare histograms
Ah, that's interesting. I'll have a look because it's bound to be better than my effort. Many thanks Dennis. Michael On 13 October 2010 22:36, Dennis Murphy djmu...@gmail.com wrote: Hi: This recent thread revealed that a package on R-forge for calculating earth movers distance is available: http://r.789695.n4.nabble.com/Measure-Difference-Between-Two-Distributions-td2712281.html#a2713505 HTH, Dennis On Tue, Oct 12, 2010 at 7:39 PM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare histograms
Hi Juan, Yes, you can use EMD to quantify the difference between any pair of histograms regardless of their shape. The only constraint, at least the way that I've done it previously, is to have compatible bins. The original application of EMD was to compare images based on colour histograms which could have all sorts of shapes. I looked at the package that Dennis alerted me to on RForge but unfortunately it seems to be inactive and the nightly builds are broken. I've downloaded the source code and will have a look at it sometime in the next few days. Meanwhile, let me know if you want a copy of my own code. It uses the lpSolve package. Michael On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote: Hi Michael, I have the same challenge, can you use this earth movers distance it to compare bimodal distributions? Thanks cheers, Juan On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward michael.bedw...@gmail.com wrote: Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare histograms
That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare histograms
Just to add to Greg's comments: I've previously used 'Earth Movers Distance' to compare histograms. Note, this is a distance metric rather than a parametric statistic (ie. not a test) but it at least provides a consistent way of quantifying similarity. It's relatively easy to implement the metric in R (formulating it as a linear programming problem). Happy to dig out the code if needed. Michael On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote: That depends a lot on what you mean by the histograms being equivalent. You could just plot them and compare visually. It may be easier to compare them if you plot density estimates rather than histograms. Even better would be to do a qqplot comparing the 2 sets of data rather than the histograms. If you want a formal test then the ks.test function can compare 2 datasets. Note that the null hypothesis is that they come from the same distribution, a significant result means that they are likely different (but the difference may not be of practical importance), but a non-significant test could mean they are the same, or that you just do not have enough power to find the difference (or the difference is hard for the ks test to see). You could also use a chi-squared test to compare this way. Another approach would be to use the vis.test function from the TeachingDemos package. Write a small function that will either plot your 2 histograms (density plots), or permute the data between the 2 groups and plot the equivalent histograms. The vis.test function then presents you with an array of plots, one of which is the original data and the rest based on permutations. If there is a clear meaningful difference in the groups you will be able to spot the plot that does not match the rest, otherwise it will just be guessing (might be best to have a fresh set of eyes that have not seen the data before see if they can pick out the real plot). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of solafah bh Sent: Monday, October 11, 2010 4:02 PM To: R help mailing list Subject: [R] compare histograms Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.