[R] Calculating Minimum Absolute Difference of Two Numeric Vectors
Good day, What is a fast and efficient way to calculate the minimum absolute difference between two vectors of numbers? The two vectors have unequal length. I would also like to know the index of the first vector and the second vector which results in the minimum absolute difference. For example: x <- rpois(500, 100) y <- rpois(300, 30) Is there a much faster way than a nested for loop without resorting to Rcpp? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stats reshape non-Longitudinal Example Correctness
Good day, In the Examples section of the reshape function documentation is an example that reshapes that data frame state.x77. However, the resulting long data frame doesn't seem correct. It has three columns; Characteristic, Population and state. The second column probably shouldn't be named Population because it stores all of the values for all of the variables and only one of the variables in the dataset is Population. I wouldn't expect to see values for Frost in the Population column, for example. Is it a bug? I think that a column name such as Value would be appropriate. > sessionInfo() R Under development (unstable) (2018-01-07 r74096) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find Crossover Points of Two Spline Functions
Good day, I have two probability densities, each with a function determined by splinefun(densityResult[['x']], densityResult[['y']], "natural"), where densityResult is the output of the density function in stats. How can I determine all of the x values at which the densities cross ? ------ Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by Function Result Factor Levels
Good day, Yes, exactly. I found that aggregate is another alternative which doesn't require a package dependency, although the column formatting is less suitable, always prepending x. aggregate(warpbreaks[, 1], warpbreaks[, 2:3], function(breaks) c(Min = min(breaks), Med = median(breaks), Max = max(breaks))) wool tension x.Min x.Med x.Max 1A L255170 2B L142944 3A M122136 4B M162842 5A H102443 6B H131728 -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] by Function Result Factor Levels
Good day, How is it possible to get a data.frame of factor levels used for obtaining each element of the result of the by function ? For example, result <- by(warpbreaks[, 1], warpbreaks[, -1], summary) > result wool: A tension: L Min. 1st Qu. MedianMean 3rd Qu.Max. 25.00 26.00 51.00 44.56 54.00 70.00 ... I'd like to obtain a data.frame of the two columns, wool and tension, specifying the level of each factor that corresponds to each element of result. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vennerable Plots for Publications
That is an adequate solution. It's always better if R package authors don't hard-code graphics parameters, though. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vennerable Plots for Publications
Does anyone make Venn diagrams for publication using Vennerable ? I found that the font size is too big when the plot is created at 300 DPI, and there's no option to change it, even when the point size argument to the device is changed. aVenn - Venn(Sets = list(A = 1:5, B = 3:6)) png(forPublication.png, units = in, h = 2.55, w = 2.4, res = 300) # Changing pointsize to a smaller number has no effect on size of the text. plot(aVenn) dev.off() -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] yaxs Causes Boundary Line Colour to Change
Why is the bottom boundary plotted in a different colour to the other three sides ? set.seed() data - rpois(10, 2) plot(density(data), ann = FALSE, yaxs = 'i') # Grey bottom boundary. plot(density(data), ann = FALSE) # All boundaries are black. Ideally, there would be black lines on all four sides. The documentation doesn't say the colour will change. sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit) -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] yaxs Causes Boundary Line Colour to Change
Thanks for drawing my attention to the zero.line argument. I had only checked the help page for par. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rmixmod Memory Leak
Hello, I would like to provide a helpful bug report to the maintainer of Rmixmod, but I'm not skilled in memory profiling. The following example illustrates the problem : library(Rmixmod) genes - matrix(rnorm(5000*50, 9, 2), nrow = 5000, ncol = 50) selected - sample(5000, 25) columns - split(1:50, rep(1:10, each = 5)) lapply(1:100, function(index) # 100 resamples with replacement { lapply(1:5, function(fold) # 5-fold cross validation { apply(genes[selected, columns[[fold]]], 1, function(aGene) mixmodCluster(aGene, nbCluster = 1:3)) return(NULL) }) }) Even though no data was assigned to any variables, even if I do gc() after the loop, 5 GB of RAM is used. This makes the software unusable in a loop, because the server freezes when it runs out of RAM. May someone who is an expert help me ? -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector of Numbers Not Output to Screen
It's a plausible use-case. For example, in the example section of a help file. if(require(aPackage)) { # Do computations. # Show beginning of first result vector. # Show beginning of second result vector. } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vector of Numbers Not Output to Screen
Hello, I have a block of code that has two head calls at the end, but only the second is shown on screen. If I manually execute the statement which is not showing, it works. I thought that if statements are not functions. It is behaving as one. if(1 2) + { + x-rnorm(100) + y - rpois(10, 5) + head(x) + head(y) + } [1] 4 4 5 4 8 3 head(x) [1] -1.89083874 0.42442102 0.96114276 0.48004716 1.94358108 -0.02654324 sessionInfo() R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector of Numbers Not Output to Screen
The example in the question was not inside a user function. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping on a Distance Matrix
Hello, I'm looking for a function that groups elements below a certain distance threshold, based on a distance matrix. In other words, I'd like to group samples without using a standard clustering algorithm on the distance matrix. For example, let the distance matrix be : A B C D A 0 0.03 0.77 1.12 B 0.03 0 1.59 1.11 C 0.77 1.59 0 0.09 D 1.12 1.11 0.09 0 Two clusters would be found with a cutoff of 0.1. The first contains A,B. The second has C,D. Is there an efficient function that does this ? I can think of how to do this recursively, but am hoping it's already been considered. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting hetmap.2 Color Key Range Outside of Data Limits
Hello, There are many questions about making the limit of the colour key smaller than the data range, but I have the opposite problem. Assume one heatmap has data in the range 6 to 12 and another has data in the range 6 to 9. By providing the same breaks argument to both plots, the heatmaps are coloured as it should be, but for the second heatmap, the range of the colour key is just from 6 to 9. I'd like to force the second colour key to go up to 12 also. How can this be achieved ? My use case is that I have identified a number of clusters in a gene expression dataset, and I would like to avoid plotting them in one large heatmap, but as multiple smaller heatmaps. Also, unless key = FALSE, having a heatmap with values in only one colour bin causes Error in axis(1, at = xv, labels = lv) : no locations are finite. Perhaps this could also be handled more gracefully. I am using R 3.02. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tab Separated File Reading Error
Hello, I have a seemingly simple problem that a tab-delimited file can't be read in. annoTranscripts - read.table(matched.txt, sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. lines - readLines(matched.txt) tabsPosns - gregexpr(\t, lines) table(sapply(tabsPosns, length)) 11 367274 system(wc -l matched.txt) 367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ? sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.0.1 -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Declare BASH Array Using R System Function
Hello, It is difficult searching for previous posts about this since the keywords are short and ambiguous, so I hope this is not a duplicate question. I can easily declare an array on the command line. $ names=(X Y) $ echo ${names[0]} X I am unable to do the same from within R. system(names=(X Y)) sh: Syntax error: ( unexpected Reading the documentation for the system function, it appears to only be relevant for executing commands. What can I do instead to declare a BASH array ? Thanks. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Declare BASH Array Using R System Function
Thank you. This answers my question. I am using Linux, too. From: arun [smartpink...@yahoo.com] Sent: Monday, 29 July 2013 11:11 PM To: Dario Strbenac Cc: R help Subject: Re: [R] Declare BASH Array Using R System Function Hi, system(names=(X Y); echo ${names[0]}) #sh: 1: Syntax error: ( unexpected #this worked for me: system(bash -c 'names=(X Y); echo ${names[0]}') #X A.K. - Original Message - From: Dario Strbenac dstr7...@uni.sydney.edu.au To: r-help@r-project.org r-help@r-project.org Cc: Sent: Sunday, July 28, 2013 10:00 PM Subject: [R] Declare BASH Array Using R System Function Hello, It is difficult searching for previous posts about this since the keywords are short and ambiguous, so I hope this is not a duplicate question. I can easily declare an array on the command line. $ names=(X Y) $ echo ${names[0]} X I am unable to do the same from within R. system(names=(X Y)) sh: Syntax error: ( unexpected Reading the documentation for the system function, it appears to only be relevant for executing commands. What can I do instead to declare a BASH array ? Thanks. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assignment Operator in mclapply
I sometimes need to return multiple items from a loop. Is it possible to have the - operator work the same for mclapply as for lapply ? extra - list() squares - mclapply(1:10, function(x){extra[[x]] - x; x^2;}) extra list() squares - lapply(1:10, function(x){extra[[x]] - x; x^2;}) extra [[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3 [[4]] [1] 4 [[5]] [1] 5 [[6]] [1] 6 [[7]] [1] 7 [[8]] [1] 8 [[9]] [1] 9 [[10]] [1] 10 My question is like that of http://tolstoy.newcastle.edu.au/R/e6/help/09/03/8329.html which is not answered. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PAM Clustering Ignores Cluster Number Parameter
I am using PAM with k = 10 clusters, but I only get one cluster ID for all my observations. I couldn't find any discussion about this in the help file, or mailing lists. Is there a reasonable explanation for this result ? cIDs - pam(all, 10, cluster.only = TRUE, do.swap = FALSE) table(cIDs) cIDs 0 16671 The matrix of observations can be found at : http://129.94.136.7/file_dump/dario/all.obj I'm using R version 2.13.0 (2011-04-13) on Platform: x86_64-unknown-linux-gnu (64-bit) and have cluster_1.13.3. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gridBase Base Plot Positioning
Thanks for this clarification. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gridBase Base Plot Positioning
Hello, I'm trying to follow the documentation of how to use gridBase, and I've reached the minimal code example below as my best effort. Can someone explain how to keep the column of boxplots on the same page as the rectangles (even though I've tried new = TRUE) ? Also, would it be hard / possible to match up the middle of each boxplot to the middle of each rectangle ? pdf(tmp.pdf, h = 6, w = 10) pushViewport(plotViewport(c(5, 5, 4, 2))) pushViewport(viewport(layout = grid.layout(4, 6))) for(i in 1:5) { for(i2 in 1:4) { pushViewport(viewport(layout.pos.row = i2, layout.pos.col = i)) grid.rect() popViewport() } } pushViewport(viewport(layout.pos.col = 6)) plot.new() par(plt = gridPLT(), new = TRUE) randData - lapply(1:4, function(x) sample(10, 10, TRUE)) boxplot(randData, horizontal = TRUE) dev.off() I'm using gridBase_0.4-3. Thanks, Dario. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave Executes Package Functions Twice
Hi, I've got something strange going on. I'm trying to compile a vignette using Sweave(vignette.Rnw), and in the first code chunk that illustrates an example, I noticed from the output text I have inside the function, that it is running it twice, because the sequence of message() statements is output on screen twice, and takes twice as long to do. e.g. Processing sample 1 Processing sample 2 Processing sample 3 Processing sample 1 Processing sample 2 Processing sample 3 If I open up a new R session, and copy and paste the lines of code from the .Rnw one by one, the function isn't called twice - only one complete set of progress outputs show. I tried using debug() to get more of an idea of what was happening. The function is called enrichmentPlot, so I did debug(enrichmentPlot), then Sweave(vignette.Rnw). Execution pauses when it gets to the right code chunk, then I type in n at the Browse[2] prompt, but I don't get any output from debug, like debugging in:, but it runs for a while and I get all of the messages from within my function, then execution pauses again. But just before it pauses I get the debug output : debugging in: enrichmentPlot(samples.list, 300) debug: standardGeneric(enrichmentPlot) Browse[2] Then I enter n, and it runs the same code all over again. ... ... # Same progress outputs from inside my function. exiting from: enrichmentPlot(samples.list, 300) I have the current 2.13.0 version of R. I'm sure I've done something wrong, I just can't figure out what. Thanks for any help. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
If your data frame is called myDF, max(myDF[myDF[, condition] == GPR119a, responce]) Original message Date: 01 Feb 2011 19:05:46 + From: r-help-boun...@r-project.org (on behalf of A. Ramzan ar...@cam.ac.uk) Subject: [R] (no subject) To: r-help@r-project.org Hello I am trying to find a way to find the max value, for only a subset of a dataframe, depending on how the data is grouped for example, How would I find the maxmium responce, for all the GPR119a condition below: responce,mouce,condition 0.105902,KO,con 0.232018561,KO,con 0.335008375,KO,con 0.387025433,KO,GPR119a 0.576769897,KO,GPR119a 0.645120419,KO,GPR119a 0.2538608,KO,GPR119b 0.183061952,KO,GPR119b 0.824035587,KO,GPR119b 0.399201597,KO,GPR119c 0.417006618,KO,GPR119c 0.572958834,KO,GPR119c 0.229467444,KO,GPR119d 0.294089745,KO,GPR119d 0.309964445,KO,GPR119d 0.30474325,KO,GPR119e 0.159374839,KO,GPR119e 0.467726848,KO,GPR119e 1.01841912,KO,GPR119f 0.423028621,KO,GPR119f 0.223588597,KO,GPR119f Thank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] agnes clustering and NAs
Hello, Thankyou for the clarification about the NAs. For your interest, thankfully my end goal was not to plot a dendrogram with 23371 elements, but just to use the output of the clustering to re-order the rows of a matrix before plotting it with image(). Since clara() and pam() are partitioning based approaches, I suppose I could instead stay with hclust() after removing the offending rows, so that I have the ordering position of each gene, not its cluster membership. I have 12 GB RAM on my 64-bit system, so the time it takes to run should be my only problem. - Dario. Original message Date: Fri, 28 Jan 2011 12:34:26 +0100 From: Martin Maechler maech...@stat.math.ethz.ch Subject: Re: [R] agnes clustering and NAs To: gavin.simp...@ucl.ac.uk Cc: d.strbe...@garvan.org.au, r-help@r-project.org, Uwe Ligges lig...@statistik.tu-dortmund.de Gavin Simpson gavin.simp...@ucl.ac.uk on Fri, 28 Jan 2011 09:23:05 + writes: On Fri, 2011-01-28 at 10:00 +1100, Dario Strbenac wrote: Hello, Yes, that's right, it is a values matrix. Not a dissimilarity matrix. i.e. str(iMatrix) num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:56] -8100 -7900 -7700 -7500 ... Ok, so in the end you want to draw a dendrogram for 23'371 observational units, really ? I think I would not use a hierarchical clustering method for so many units, but rather clara() or maybe pam() or then model based or other methods, rather than fully hierarchical ones ... but yes, that's not the issue here, and see further down ... BTW: The object 'iMatrix' you provided for download has only 50 columns, not 56... For the snippet of checking for NAs, I get all TRUEs, so I have at least one NA in each column. GS Sorry, my bad. Try this: GS apply(iMatrix, 1, function(x) all(is.na(x))) GS will check that you have no fully `NA` rows. GS Also look at str(iMatrix) for potential problems. GS Finally, try: GS out - dist(iMatrix) any(is.na(out)) GS should repeat what agnes is doing to compute the GS dissimilarity matrix. If that returns TRUE, go and find GS which samples are giving NA dissimilarity and why. GS The issue is not NA in the input data, but that your GS input data is leading to NA in the computed GS dissimilarities. This might be due to NA's in your input GS data, where a pair of samples has no common set of data GS for example. Yes, that's right on spot, thank you Gavin. This is indeed to true: It *does* allow for NA's (in the data matrix), but if the pattern of NA's is such that the dissimilarity between two observations becomes undefined, namely e.g. if they have no common non-missings, then ``that's too much''. In general, I'd recommend to use dm - daisy(,...) trying methods, that are better with NAs, e.g. Gower's metric, until dm() has {nearly} no NAs, and then figure out some imputation to replace all NA's in dm by reasonable values, then do clustering with the resulting dissimilarity matrix dm. HOWEVER, in your case, dm would correspond to 23371 x 23371 dissimilarity matrix, stored as a double precision matrix (on a 64-bit platform) that's an object of size 4.4 GBytes, not very convenient to work with. as dissimilarity object it will only be about half of that size, but that's still ``a bit large''.. As I said above, for such data, I would never do fully hierarchical clustering, but rather something else. Martin Maechler, ETH Zurich GS HTH GS G The part of the agnes documentation I was referring to is : In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed. So, I'm under the impression it handles NAs on its own ? - Dario. Original message Date: Thu, 27 Jan 2011 12:53:27 + From: Gavin Simpson gavin.simp...@ucl.ac.uk Subject: Re: [R] agnes clustering and NAs To: Uwe Ligges lig...@statistik.tu-dortmund.de Cc: d.strbe...@garvan.org.au, r-help@r-project.org On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote: On 27.01.2011 05:00, Dario Strbenac wrote: Hello, In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like : m- matrix(c( 1, 1, 1, 2, 1, NA, 1, 1, 1, 2, 2, 2), nrow = 3, byrow = TRUE) agnes(m) Call:agnes(x = m) Agglomerative coefficient: 0.1614168 Order of objects: [1] 1 2 3 Height (summary): Min. 1st Qu. MedianMean 3rd Qu.Max. 1.155 1.247 1.339 1.339 1.431 1.524 Available components: [1] order
[R] Adding image to plotting area
Hello, I've drawn a black rectangle over the plotting area, and when I add an image() heatmap, it doesn't take up all the area, but is set inward from the black rectangle. Can anyone suggest how to make it stretch out to the entire area ? Minimal example : y - matrix(runif(2000*20), nrow = 2000) y[100:200, 10] = NA plot.new() usr - par('usr') rect(usr[1], usr[3], usr[2], usr[4], col=black) image(t(y), add = TRUE) Thanks, Dario. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] agnes clustering and NAs
Hello, Yes, that's right, it is a values matrix. Not a dissimilarity matrix. i.e. str(iMatrix) num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:56] -8100 -7900 -7700 -7500 ... For the snippet of checking for NAs, I get all TRUEs, so I have at least one NA in each column. The part of the agnes documentation I was referring to is : In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed. So, I'm under the impression it handles NAs on its own ? - Dario. Original message Date: Thu, 27 Jan 2011 12:53:27 + From: Gavin Simpson gavin.simp...@ucl.ac.uk Subject: Re: [R] agnes clustering and NAs To: Uwe Ligges lig...@statistik.tu-dortmund.de Cc: d.strbe...@garvan.org.au, r-help@r-project.org On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote: On 27.01.2011 05:00, Dario Strbenac wrote: Hello, In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like : m- matrix(c( 1, 1, 1, 2, 1, NA, 1, 1, 1, 2, 2, 2), nrow = 3, byrow = TRUE) agnes(m) Call:agnes(x = m) Agglomerative coefficient: 0.1614168 Order of objects: [1] 1 2 3 Height (summary): Min. 1st Qu. MedianMean 3rd Qu.Max. 1.155 1.247 1.339 1.339 1.431 1.524 Available components: [1] order height ac merge diss call method data But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it runs for about a minute, then gives an error : agnes(iMatrix) Error in agnes(iMatrix) : No clustering performed, NA-values in the dissimilarity matrix. I've also tried getting rid of rows with all NAs in them, and it still gave me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim made by its documentation. I haven't looked in the file, but you need to get rid of all NA, or in other words, all rows that contain *any* NA values. If one believes the documentation, then that only applies to the case where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw data matrix or data frame. The only way the OP could have gotten that error with the call shown is if iMatrix were not a dissimilarity matrix inheriting from class dist, so `NA`s should be allowed. My guess would be that the OP didn't get rid of all the `NA`s. Dario: what does: sapply(iMatrix, function(x) any(is.na(x))) or if iMatrix is a matrix: apply(iMatrix, 2, function(x) any(is.na(x))) say? G Uwe Ligges The matrix I'm using can be obtained here : http://129.94.136.7/file_dump/dario/iMatrix.obj -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply puzzlement
R works by going down the columns. If you make the rows into columns, it then does what you want. You just have to make the columns back into rows to get the original shape of your matrix. So the code in one line is : t(t(z) - means) Original message Date: Fri, 28 Jan 2011 01:16:45 +0100 From: r-help-boun...@r-project.org (on behalf of nfdi...@gmail.com (Ernest Adrogué i Calveras)) Subject: [R] sapply puzzlement To: r-help@r-project.org Hi, I have this data.frame with two variables in it, z V1 V2 1 10 8 2 NA 18 3 9 7 4 3 NA 5 NA 10 6 11 12 7 13 9 8 12 11 and a vector of means, means - apply(z, 2, function (col) mean(na.omit(col))) means V1V2 9.67 10.714286 My intention was substracting means from z, so instictively I tried z-means V1 V2 1 0.333 -1.667 2 NA 7.2857143 3 -0.667 -2.667 4 -7.7142857 NA 5 NA 0.333 6 0.2857143 1.2857143 7 3.333 -0.667 8 1.2857143 0.2857143 But this is completely wrong. sapply() gives the same result: sapply(z, function(row) row - means) V1 V2 [1,] 0.333 -1.667 [2,] NA 7.2857143 [3,] -0.667 -2.667 [4,] -7.7142857 NA [5,] NA 0.333 [6,] 0.2857143 1.2857143 [7,] 3.333 -0.667 [8,] 1.2857143 0.2857143 So, what is going on here? The following appears to work z-matrix(means,ncol=2)[rep(1, dim(z)[1]),] V1 V2 1 0.333 -2.7142857 2 NA 7.2857143 3 -0.667 -3.7142857 4 -6.667 NA 5 NA -0.7142857 6 1.333 1.2857143 7 3.333 -1.7142857 8 2.333 0.2857143 but I think it's rather cumbersome, surely there must be a cleaner way to do it. -- Ernest __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] agnes clustering and NAs
Hello, In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like : m - matrix(c( 1, 1, 1, 2, 1, NA, 1, 1, 1, 2, 2, 2), nrow = 3, byrow = TRUE) agnes(m) Call:agnes(x = m) Agglomerative coefficient: 0.1614168 Order of objects: [1] 1 2 3 Height (summary): Min. 1st Qu. MedianMean 3rd Qu.Max. 1.155 1.247 1.339 1.339 1.431 1.524 Available components: [1] order height ac merge diss call method data But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it runs for about a minute, then gives an error : agnes(iMatrix) Error in agnes(iMatrix) : No clustering performed, NA-values in the dissimilarity matrix. I've also tried getting rid of rows with all NAs in them, and it still gave me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim made by its documentation. The matrix I'm using can be obtained here : http://129.94.136.7/file_dump/dario/iMatrix.obj -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nested layout()
Hello, Is it possible to call a graphing function that uses layout() multiple times and layout those outputs ? Here's a minimal example : myplot - function() { layout(matrix(1:2, nrow=1), widths = c(1, 1)) plot(1:10) plot(10:1) } layout(matrix(1:2), heights = c(1, 2)) myplot() myplot() -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] S4 Method Rd Warning
Thanks for this. I was thinking the spaces rule only applied within the \alias{} statements. I'm not sure why I thought this. Original message Date: Mon, 30 Aug 2010 18:43:35 +0100 (BST) From: Prof Brian Ripley rip...@stats.ox.ac.uk Subject: Re: [Rd] S4 Method Rd Warning To: Duncan Murdoch murdoch.dun...@gmail.com Cc: d.strbe...@garvan.org.au On Mon, 30 Aug 2010, Duncan Murdoch wrote: On 30/08/2010 1:00 AM, Dario Strbenac wrote: Hello, I am using R 2.11.0. I have a curious problem where I get a warning in R CMD check which is seemingly not relevant to my Rd file. 2.11.0 isn't the current release, and there have been fixes to this stuff since 2.11.1 was released. Could you try 2.11.1-patched or the devel version of 2.12.0 and see if you still get the warnings? But there are incorrect spaces at e.g. {GRanges, BSgenome}. Classes in signatures are comma-separated, not comma+space separated. If we've changed it, we are now accepting errorneous files AFAICS. Duncan Murdoch The warning says : * checking Rd \usage sections ... WARNING Bad \usage lines found in documentation object 'enrichmentCalc': unescaped bkslS4method{enrichmentCalc}{GenomeDataList, BSgenome}(rs, organism, seqLen=NULL, ...) unescaped bkslS4method{enrichmentCalc}{GenomeData, BSgenome}(rs, organism, seqLen=NULL, do.warn=FALSE) unescaped bkslS4method{enrichmentCalc}{GRanges, BSgenome}(rs, organism, seqLen=NULL) Functions with \usage entries need to have the appropriate \alias entries, and all their arguments documented. The \usage entries must correspond to syntactically valid R code. See the chapter 'Writing R documentation files' in manual 'Writing R Extensions'. But, I have documented all the arguments, and I have all the aliases. What else could it be warning me about ? The Rd file contents are : \name{enrichmentCalc} \alias{enrichmentCalc} \alias{enrichmentCalc,GenomeDataList,BSgenome-method} \alias{enrichmentCalc,GenomeData,BSgenome-method} \alias{enrichmentCalc,GRanges,BSgenome-method} \title{Calculate sequencing enrichment} \description{A description} \usage{ \S4method{enrichmentCalc}{GenomeDataList, BSgenome}(rs, organism, seqLen=NULL, ...) \S4method{enrichmentCalc}{GenomeData, BSgenome}(rs, organism, seqLen=NULL, do.warn=FALSE) \S4method{enrichmentCalc}{GRanges, BSgenome}(rs, organism, seqLen=NULL) } \arguments{ \item{rs}{jjj} \item{organism}{ghi} \item{seqLen}{def} \item{do.warn}{abc} \item{...}{xyz} } \details{ Details. } \value{ Text here. } \author{A Person} \examples{ #See the manual } Thanks, Dario. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ r-de...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ r-de...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Documenting S4 Methods
I'm in the process of converting some S3 methods to S4 methods. I have this function : setGeneric(enrichmentCalc, function(rs, organism, seqLen, ...){standardGeneric(enrichmentCalc)}) setMethod(enrichmentCalc, c(GenomeDataList, BSgenome), function(rs, organism, seqLen, ...) { ... ... ... }) setMethod(enrichmentCalc, c(GenomeData, BSgenome), function(rs, organism, seqLen=NULL, do.warn=FALSE) { ...... ... }) setMethod(enrichmentCalc, c(GRanges, BSgenome), function(rs, organism, seqLen=NULL) { ...... ... } and a part of my Rd file is : \name{enrichmentCalc} \docType{methods} \alias{enrichmentCalc,GenomeDataList,BSgenome-method} \alias{enrichmentCalc,GenomeData,BSgenome-method} \alias{enrichmentCalc,GRanges,BSgenome-method} ...... ... \usage{ enrichmentCalc(rs, organism, seqLen, ...) enrichmentCalc(rs, organism, seqLen=NULL, do.warn=FALSE) enrichmentCalc(rs, organism, seqLen=NULL) } ......... Can anyone suggest why I'm seeing this error : * checking for code/documentation mismatches ... WARNING Codoc mismatches from documentation object 'enrichmentCalc': enrichmentCalc Code: function(rs, organism, seqLen, ...) Docs: function(rs, organism, seqLen = NULL, do.warn = FALSE) Argument names in code not in docs: ... Argument names in docs not in code: do.warn Mismatches in argument names: Position: 4 Code: ... Docs: do.warn Mismatches in argument default values: Name: 'seqLen' Code: Docs: NULL enrichmentCalc Code: function(rs, organism, seqLen, ...) Docs: function(rs, organism, seqLen = NULL) Argument names in code not in docs: ... Mismatches in argument default values: Name: 'seqLen' Code: Docs: NULL * checking Rd \usage sections ... WARNING Objects in \usage without \alias in documentation object 'enrichmentCalc': enrichmentCalc Also, what is the difference between ... ... ... \docType{methods} ... ... ... \alias{methodName,class-method} ... ... ... \usage{methodName(arg1)} ... ... ... and ... ... ... \alias{methodName,class-method} ... ... ... \usage { \S4method{methodName}{class}(arg1) } ... ... ... I've seen both ways used for S4 methods and don't know what is the underlying difference. I haven't been able to find any good tutorials for the new S4 architecture (written post 2006), so I'm not sure where to start with S4. Thanks, Dario. -- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] identical() mystery
Hello, I have 2 vectors of the same mode and the same contents but I still get FALSE. Any ideas ? reference - c(11, 14, 16, 5, 4, 2, 0, 15, 9, 0) reference [1] 11 14 16 5 4 2 0 15 9 0 cpgDensity [1] 11 14 16 5 4 2 0 15 9 0 identical(cpgDensity, reference) [1] FALSE mode(cpgDensity) [1] numeric mode(reference) [1] numeric cpgDensity == reference [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE Thanyou, Dario. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flattening Graphics
Hello, This question is a nightmare to search for, as I get so many irrelevant results. What I'm interested in doing if I have many pages of plots and I want to keep them together in the same document, say a PDF, is there a way to flatten all the dot plots and graphics, so that they don't take a long time to load on a slow computer in Adobe Reader, without using external programs outside of R ? Thanks, Dario. - Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.