[R] ggplot2 barplot: extra markers in graph
Dear List, (self-contained example + version info at the bottom) I'm having trouble producing a barplot using the functions in ggplot2. When I use the position=dodge option, the bars are plotted but also a number of spurious markers. More specifically, a number of black dots are plotted in the graph that should not be there. This behaviour is not seen when calling the same functions without the position=dodge. Can someone shed some light on this? How can I avoid this? #self-contained example: library(ggplot2) D-runif(30) N-rep(c(1:10),3) C-rep(c(1:3),10) DT-data.frame(D=D,N=N,C=C) #works ok qplot(DT$N,DT$D,fill=factor(DT$C))+ geom_bar(stat = identity) #in the resulting plot, a number of black markers are added that should not be there qplot(DT$N,DT$D,fill=factor(DT$C))+ geom_bar(stat = identity, position=dodge) #end of example version info: Windows xp 64 R version 2.11.1 (64 bit) ggplot2 version 0.8.8 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 barplot: extra markers in graph
Thanks, this indeed solved the problem. Regards, Dieter On 4/08/2010 15:21, Shentu wrote: The reason you see the exra markers is that the first part of the command qplot(DT$N,DT$D,fill=factor(DT$C)) already plots the individual points. You didn't see it with geom_bar(stat = identity) simply because the stacked bars made the previous layer invisible. To see this you can use the ggplot function to reproduce your graph (with the points): p-ggplot(data=DT,aes(x=N,y=D))+geom_point()+geom_bar(stat=identity,aes(fill=factor(C)),position=dodge) print(p) It then becomes obvious that once you omit the geom_point(), the points are gone. This is IMO a feature of the ggplot2 system, not necessarily a bug. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing mixture models (mix function in mixdist package)
Dear List, We are using the mix() function in the mixdist package to fit mixture models to some of our data. The package provides a function to compare the fits of nested models using an ANOVA function. However, we were wondering whether there are methods that can be used to compare models that differ in the number of distributions fitted and/or the shape of the distributions. Is there a way, using likelihoods maybe, to compare this kinds of fits in a statistical meaningfull way? Regards, Dieter Vanderelst __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal's MDS results
A few people suggested taking a look at Ripley's book MASS. I know the formula listed there. The point is that the manual for the isoMDS function says it's stress output is in percent. Does this mean, the stress reported by isoMDS is just the stress value in MASS (which ranges from 0 to 1) value multiplied by 100? I've haven't been able to find any resource that expresses stress in values from 0 to 100. So, this would be a convention introduced by the authors of the package? In general, I think the R manuals could do with a bit more explaining on the output of the functions. I understand that some knowledge of statistics is assumed when working with R but sometimes the documentation on the returned values is really sparse. Even when familiar with the domain, there are several different conventions followed by different authors. This should be clear when reading the manual. I know a lot of hard work gets into writing software, but it seems sometimes people are less keen on documenting their hard work properly. stephen sefick wrote: You can look in MASS 4 for this formula on page 308 . Go to the source and ask the horse he'll give you an answer that you endorse. On Thu, Apr 16, 2009 at 8:13 AM, Bob Green bgr...@dyson.brisnet.org.au wrote: Dieter, You could always try the Classification, clustering, and phylogeny estimation list which often includes posts regarding MDS: http://lists.sunysb.edu/index.cgi?A0=CLASS-L regards Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal's MDS results
Thank you for clearing this out. Jari Oksanen wrote: Dieter Vanderelst Dieter.Vanderelst at ua.ac.be writes: The point is that the manual for the isoMDS function says it's stress output is in percent. Does this mean, the stress reported by isoMDS is just the stress value in MASS (which ranges from 0 to 1) value multiplied by 100? I've haven't been able to find any resource that expresses stress in values from 0 to 100. So, this would be a convention introduced by the authors of the package? A comment about novelty of using percentages. I also had a look at some NMDS resources, and the first I found were two Kruskal's papers that happened to be on my desk (Psychometrika 29, 1-27 and Psychometrika 29, 115-129, both from 1964). Both of these expressed stress in percents. Certainly this is not a convention introduced by the authors of the package, since they are much too young to have done that prior to 1964. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kruskal's MDS results
Dear List, I'm trying to interpret the results of the Kruskal's Non-metric Multidimensional Scaling algorithm (isoMDS, MASS package). The 'goodness of fit' is reported as The final stress achieved (in percent). What does this mean exactly? I've tried to google for an answer but I've not come up with a definitive answer. Regards, Dieter -- Dieter Vanderelst PhD Student Active Perception Lab University of Antwerp http://batbits.webnode.com/ Postal Address: Prinsstraat 13 B-2000 Antwerp Belgium __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal's MDS results
Hi Michael, Thanks for the reply. I understand that the stress is a measure of how good the algorithm managed to represent the ordinal distances between items. And I also see why it's dependent on the number of dimensions. I was hoping someone could tell me exactly what the formula for the percentual stress is. To me it's not clear how this metric is calculated. Regards, Dieter Michael Denslow wrote: Hi Dieter, I'll take a shot at this. As I understand it, the stress is telling you how the ordination distances compare with original dissimilarities that you calculated. It is a measure how well your ordination has done in representing the relationship of your sites. Note that the stress will differ depending on how many dimensions are used. I believe the default is k = 2 in isoMDS. Hope this helps, Michael Dear List, I'm trying to interpret the results of the Kruskal's Non-metric Multidimensional Scaling algorithm (isoMDS, MASS package). The 'goodness of fit' is reported as The final stress achieved (in percent). What does this mean exactly? I've tried to google for an answer but I've not come up with a definitive answer. Regards, Dieter -- Dieter Vanderelst PhD Student Active Perception Lab University of Antwerp http://batbits.webnode.com/ Postal Address: Prinsstraat 13 B-2000 Antwerp Belgium Michael Denslow Graduate Student I.W. Carpenter Jr. Herbarium [BOON] Department of Biology Appalachian State University Boone, North Carolina U.S.A. -- AND -- Communications Manager Southeast Regional Network of Expertise and Collections sernec.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interp() function output not continue
Dear List, I'm using interp() to prepare 3d data for plotting with the contour() function. If have x,y and z data. All are arrays. X and Y are sampled in an orderly fashion on a grid (a circular sub-area of a grid - see plot). I'm trying to use interp() to get x and y arrays and a z matrix that can be fed to contour(). This is the command: interp(x,y,z,extrap=F,linear=FALSE,duplicate='mean') In the result there are, consistently, some discontinuities. This happens always in the 'middle' of the data. I've uploaded a plot that might clarify the problem: http://examples.attic.sent.com/example.png As you can see the middle of the plot is discontinue. When I look at the data, there is no particular reason why this should happen. The problem seems to be a single row in the z matrix returned by interp() right in the middle of the matrix (line 30 of 60). Replacing this line with the mean of row 29 and 31 seems to solve the problem. This results in this plot: http://examples.attic.sent.com/example_fix.png. This works, but it is not nice of course. Is this something that looks familiar to someone? Can I replace the interp() function with something else? Could this be due to the particular way my data is sampled? Regards, Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spherical plots?
Dear List, I'm wondering whether it is possible to use R to make spherical plots. I have 3d data: azimuth, elevation and a certain variable Y. I want to plot Y in terms of azimuth and elevation such that it seems to be a contour plot overlaid on a sphere (but projected on a plane, of course). If my explanation is not clear, you can find an example of what I'm after here: http://www.mediafire.com/?in0fmnikzmg (created using matlab). Regards, Dieter Vanderelst -- Dieter Vanderelst PhD Student Active Perception Lab University of Antwerp Koningstraat 8 B-2000 Antwerp Belgium __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lineplot.CI problem
Hi, Thank you very much for this very rapid and helpful reply. I'm giving the code a spin around the block. It looks to be working fine. There is actually also something else on my lineplot.CI wish list. If I might be so bold to ask: When plotting multiple data traces with relative high standard deviations, the whiskers tend to overlap. This hampers the interpretation of the data somewhat. Is there a way to let the function plot data points with some horizontal displacement to prevent this? Regards, Dieter Manuel Morales wrote: Here's an updated version of lineplot.CI that will succeed even for cases where data are not present in all factor combinations. Also, this version has the option x.cont to specify that the x axis represents a continuous variable with proportional spacing. A new version of sciplot with these changes will be posted soon. ## Examples: source(lineplot.CI.R) ## Generate data time=c(rep(c(21:30),3),rep(c(1:10),3)) y - time+rnorm(60,0,1) factors - rep(c(1:2),each=30) ## Proportional spacing lineplot.CI(resp=y, x.factor=time, group=factors, x.cont=TRUE) ## Factorial spacing lineplot.CI(resp=y, x.factor=time, group=factors) Manuel On Fri, 2008-02-15 at 15:18 +0100, Dieter Vanderelst wrote: Hi List, I have a problem plotting data using the lineplot.CI command in the sciplot package. I want to plot the data of 2 experimental cases using different lines (traces). Time is on the X-axis. The tricky thing is that the data collection in the second case started later than for the first case. This is to say: the first n data points for the second case are missing. So far so good. However, when I plot the data using lineplot.CI, the standard error bars are not aligned correctly with the markers. I know that this might be difficult to imagine. Here you can find an example: http://i254.photobucket.com/albums/hh115/MarkerMe/example.png So, has anybody experienced this problem and solved it before? I think I could try padding the data of the second case with zeros to eliminate the missing data. But I hope there is a better solution. Regards, Dieter Dieter Vanderelst dieter dot vanderelst at emailengine dot org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lineplot.CI problem
Hi List, I have a problem plotting data using the lineplot.CI command in the sciplot package. I want to plot the data of 2 experimental cases using different lines (traces). Time is on the X-axis. The tricky thing is that the data collection in the second case started later than for the first case. This is to say: the first n data points for the second case are missing. So far so good. However, when I plot the data using lineplot.CI, the standard error bars are not aligned correctly with the markers. I know that this might be difficult to imagine. Here you can find an example: http://i254.photobucket.com/albums/hh115/MarkerMe/example.png So, has anybody experienced this problem and solved it before? I think I could try padding the data of the second case with zeros to eliminate the missing data. But I hope there is a better solution. Regards, Dieter Dieter Vanderelst dieter dot vanderelst at emailengine dot org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rearrange data: one line per subject, one column per condition
Dear R-list, Is there a way to convert the typical long R data-format to a 1-line per subject format? I have data formatted as: Group subjcondition variable 1 1 1 746.36625 2 2 1 1076.152857 1 3 1 1076.152857 2 4 1 657.4263636 1 5 1 854.127 2 6 1 1191.676154 1 7 1 1028.175385 1 1 2 46.36625 2 2 2 76.152857 1 3 2 76.152857 2 4 2 57.4263636 1 5 2 54.127 2 6 2 191.676154 1 7 2 028.175385 ... Here, one line equals the value of one subjects VARIABLE in function of the GROUP and the CONDITION. However, I would like to rearrange the data so that the columns of my data equal the 2 conditions and the lines the subjects. This is something like: subjgroup condition1 condition2 1 1 746.36625 46.36625 2 2 1076.152857 76.152857 ... I know its possible the other way around. But that's not what I need (this time). Before anyone asks: Yes, I want to do some analysis on my data in SPSS, so I need the rearranged format. Regards and Thanks, Dieter -- Dieter Vanderelst dieter _ vanderelst AT emailengine DOT org d DOT vanderelst AT tue DOT nl Eindhoven University of Technology Faculty of Industrial Design Designed Intelligence Group Den Dolech 2 5612 AZ Eindhoven The Netherlands Tel +31 40 247 91 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ylim in barplot2 function?
Hi list, I'm using barplot2 form the gplots package to plot a few numbers (I want to add SD bars later). However, I would like the y-axis not to start from 0 but 500. When I add the parameters YLIM, something goes wrong. The graph is not 'cut off' at 500. Instead the bars seems to sink trough the bottom of the graph. Because its a little hard to explain, here is a self-containing example: library(gplots) ABrt-c(588,589,593,588) Wrt-c(580,583,592,612) RT-rbind(ABrt,Wrt) barplot2(RT,beside=T,col=c('black','white'),ylim=c(500,1000)) Does anybody know of a solution? Regards, Dieter -- Dieter Vanderelst dieter _ vanderelst AT emailengine DOT org d DOT vanderelst AT tue DOT nl Eindhoven University of Technology Faculty of Industrial Design Designed Intelligence Group Den Dolech 2 5612 AZ Eindhoven The Netherlands Tel +31 40 247 91 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster Analysis
take a look at hclust() Dieter Katia Freire wrote: Dear all, I would like to know if I can do a hierarchical cluster analysis in R using my own similarity matrix and how. Thanks. Katia Freire. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding aggregate data to data frames
Dear List, I have a data frame containing reaction times of participants in some experiment. As usual each line is single trial in the experiment. Two factors denote the conditions in the experiment. Each participant completes different trials for each condition. Now, the question: I want to calculate per participant, per condition the mean reaction time and its standard deviation. I can do this using AGGREGATE(). However, I want to merge this info with the original data frame. This is, I want each line to contain the mean and SD of the reaction time for the participant and condition on that line. I have tried to solve this by looping trough data frame. For each line, I select using SUBSET() the lines that belong to the same participant and condition. Then I calculate the average/SD. But this takes a long time. BYTW: I find that finding proper subject for r-help list mails, is very hard. So, if any one knows a set of better keywords... Any ideas? Thanks, Dieter Vanderelst -- Dieter Vanderelst dieter _ vanderelst AT emailengine DOT org d DOT vanderelst AT tue DOT nl Eindhoven University of Technology Faculty of Industrial Design Designed Intelligence Group Den Dolech 2 5612 AZ Eindhoven The Netherlands Tel +31 40 247 91 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cumulative frequency plots for factors
Dear list, I have a data frame with a number of events (factor) and the times at which they occurred (continuous variable): event time A 10 A 12 B 15 A 17 C 13 ... Is it possible in R to make a plot against time of the cumulative frequency of occurrence of each event? This would be, a raising line for each factor. Regards, Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making a table: collapsing across sub-strings
Hi, A sub string can occur anywhere in the main string. I think I could use TABLE and than add the numbers. But I don't know how to access the numbers in the result of table. Another problem is that there might be a hierarchy in the strings. This is, string a might be a subset of b while b might be a subset of c. So, when checking the strings, I would have to start with the longest string and find all subsets of that one. An than I should check the second longest string and so on... But I cannot find a way of ordering strings on their length. Regards, Dieter jim holtman wrote: How do you determine if one string is a subset of another? Does it only match at the beginning, or anywhere? How large is your set of strings? Can you use table as you describe and then determine what the groupings of subsets are and then just add the numbers together? You can use grep/regexpr to determine if one string is a subset of another. On 10/3/07, Dieter Vanderelst [EMAIL PROTECTED] wrote: Hi list, I'm currently processing textual data and I would really appreciate some help with one off my problems. I have a set of strings and I want to count how often each of this strings appears in this set. This is not very difficult and can be done as: TB-table(my_set) plot(TB) However, I also want to collapse across sub-strings. This is, I want a sub-string ss of string S to be counted as an occurrence of string S. So, 'abab' should be included in the count of 'ababaaa' and should not be listed as a separate entry in the frequency table. Does somebody has a pointer to a way to do this? I have been checking out the CRAN packages for handling DNA sequences, but this has not really brought me closer to a solution. Thanks, Dieter Vanderelst -- Dieter Vanderelst Eindhoven University of Technology Faculty of Industrial Design Designed Intelligence Group Den Dolech 2 5612 AZ Eindhoven The Netherlands Tel +31 40 247 91 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Making a table: collapsing across sub-strings
Hi list, I'm currently processing textual data and I would really appreciate some help with one off my problems. I have a set of strings and I want to count how often each of this strings appears in this set. This is not very difficult and can be done as: TB-table(my_set) plot(TB) However, I also want to collapse across sub-strings. This is, I want a sub-string ss of string S to be counted as an occurrence of string S. So, 'abab' should be included in the count of 'ababaaa' and should not be listed as a separate entry in the frequency table. Does somebody has a pointer to a way to do this? I have been checking out the CRAN packages for handling DNA sequences, but this has not really brought me closer to a solution. Thanks, Dieter Vanderelst -- Dieter Vanderelst Eindhoven University of Technology Faculty of Industrial Design Designed Intelligence Group Den Dolech 2 5612 AZ Eindhoven The Netherlands Tel +31 40 247 91 11 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.