Re: [R] How to compare stacked histograms/datasets
Hi, Sure, you could do a qqplot for each variable between two datasets. In a 2d graph, it will be hard to reasonably compare more than 2 datasets (you can put many such graphs on a single page, but it would be pairwise sets of comparisons, I think. Perhaps you could plots multiple qqplots on top of each other varying the points by colour for the different data sets? I have not seen anything like this before, so I suppose it depends what helps you understand your data. Cheers, Josh On Sat, Jul 7, 2012 at 3:25 PM, Atulkakrana atulkakr...@gmail.com wrote: Hello Joshua, Thanks for taking time out to help me with problem. Actually the comparison is to be done among two (if possible, more than two) datasets and not within the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each position (total 21) that make a single dataset or stacked histogram (Plot in original post). Initially I was comparing datasets by plotting stacked histograms for each and analyzing them visually. But that doesn't give a statistical idea of how similar or different the datasets are. Therefore, I want to evaluate the datasets in order to quantify their difference/similarity. So, end result would be a plot showing similarity/difference among two or more datasets. Example datasets: http://pastebin.com/iYj1RNvt Does the method you explained can be applied to multiple datasets? Can a qqplot be obtained in such a case? Awaiting your reply Thanks Atul -- View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668p4635744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compare stacked histograms/datasets
Hi, Probably easier to work with the raw data, but whatever. If your data is in a data frame, dat, ## create row index dat$x - 1:21 ## load packages require(ggplot2) require(reshape2) ## melt the data frame to be long, long dat, ldat for short ldat - melt(dat, id.vars=x) ## plot the distributions ggplot(ldat, aes(x, value, colour = variable)) + geom_line() ## they don't really look on the same scale ## we could scale the data first to have equal mean and variance dat2 - as.data.frame(scale(dat)) ## remake index so it is not scaled dat2$x - 1:21 ldat2 - melt(dat2, id.vars=x) ggplot(ldat2, aes(x, value, colour = variable)) + geom_line() which yields the attached PDF (maybe scrubbed on the official list as most file extensions are, but should go through to you personally via gmail). I'm not sure it's the greatest approach ever, but it gives you a sense if they go up and down together or at different points. Cheers, Josh On Fri, Jul 6, 2012 at 1:55 PM, Atulkakrana atulkakr...@gmail.com wrote: Hello All, I have a couple of stacked histograms which I need to compare/evaluate for similarity or difference. http://r.789695.n4.nabble.com/file/n4635668/Selection_011.png I believe rather than evaluating histograms is will be east to work with dataset used to plot these stacked histograms, which is in format: RED PURPLE BLUE GREY YELLOW 22.0640569395 16.9483985765 0 60.9875444840 8.18505338088.85231316730 82.9626334520 6.85053380786.89501779360.756227758 85.4982206406 0.5338078292 6.76156583635.24911032031.645907473386.3434163701 0.6672597865 5.82740213527.384341637 2.135231316784.6530249111.1565836299 7.87366548046.628113879 1.556939501883.9412811388 1.2010676157 7.16192170828.18505338081.245551601483.4074733096 1.3790035587 5.560498220610.2758007117 1.067615658483.0960854093 1.0231316726 7.11743772247.60676156580.711743772284.5640569395 0.756227758 7.87366548043.95907473310.667259786587.50.3113879004 7.65124555167.87366548040.533807829283.9412811388 0.5338078292 7.60676156588.98576512461.467971530281.9395017794 0.3558718861 8.94128113888.00711743771.379003558781.6725978648 0.5782918149 19.0836298932 9.20818505342.135231316769.5729537367 1.3790035587 14.9911032028 11.0765124555 3.202846975170.7295373665 1.0676156584 15.3914590747 10.8985765125 3.024911032 70.6850533808 1.2900355872 17.4822064057 12.5444839858 2.491103202867.4822064057 1.334519573 15.8362989324 13.0338078292 2.001779359469.1281138791.334519573 17.03736654810.4537366548 2.402135231370.1067615658 1.2010676157 20.2846975089 10.0088967972 0 69.7064056941.0676156584 28.7366548043 12.6334519573 0 58.6298932384 0 Is there any possible way I can compare such dataset from multiple experiments (n=8) and visually show (plot) that these datasets are in consensus or differ from each other? Awaiting reply, Atul -- View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ plots.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compare stacked histograms/datasets
Hello Joshua, Thanks for taking time out to help me with problem. Actually the comparison is to be done among two (if possible, more than two) datasets and not within the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each position (total 21) that make a single dataset or stacked histogram (Plot in original post). Initially I was comparing datasets by plotting stacked histograms for each and analyzing them visually. But that doesn't give a statistical idea of how similar or different the datasets are. Therefore, I want to evaluate the datasets in order to quantify their difference/similarity. So, end result would be a plot showing similarity/difference among two or more datasets. Example datasets: http://pastebin.com/iYj1RNvt Does the method you explained can be applied to multiple datasets? Can a qqplot be obtained in such a case? Awaiting your reply Thanks Atul -- View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668p4635744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to compare stacked histograms/datasets
Hello All, I have a couple of stacked histograms which I need to compare/evaluate for similarity or difference. http://r.789695.n4.nabble.com/file/n4635668/Selection_011.png I believe rather than evaluating histograms is will be east to work with dataset used to plot these stacked histograms, which is in format: RED PURPLE BLUE GREY YELLOW 22.0640569395 16.9483985765 0 60.9875444840 8.18505338088.85231316730 82.9626334520 6.85053380786.89501779360.756227758 85.4982206406 0.5338078292 6.76156583635.24911032031.645907473386.3434163701 0.6672597865 5.82740213527.384341637 2.135231316784.6530249111.1565836299 7.87366548046.628113879 1.556939501883.9412811388 1.2010676157 7.16192170828.18505338081.245551601483.4074733096 1.3790035587 5.560498220610.2758007117 1.067615658483.0960854093 1.0231316726 7.11743772247.60676156580.711743772284.5640569395 0.756227758 7.87366548043.95907473310.667259786587.50.3113879004 7.65124555167.87366548040.533807829283.9412811388 0.5338078292 7.60676156588.98576512461.467971530281.9395017794 0.3558718861 8.94128113888.00711743771.379003558781.6725978648 0.5782918149 19.0836298932 9.20818505342.135231316769.5729537367 1.3790035587 14.9911032028 11.0765124555 3.202846975170.7295373665 1.0676156584 15.3914590747 10.8985765125 3.024911032 70.6850533808 1.2900355872 17.4822064057 12.5444839858 2.491103202867.4822064057 1.334519573 15.8362989324 13.0338078292 2.001779359469.1281138791.334519573 17.03736654810.4537366548 2.402135231370.1067615658 1.2010676157 20.2846975089 10.0088967972 0 69.7064056941.0676156584 28.7366548043 12.6334519573 0 58.6298932384 0 Is there any possible way I can compare such dataset from multiple experiments (n=8) and visually show (plot) that these datasets are in consensus or differ from each other? Awaiting reply, Atul -- View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.