[R] IRanges::unlist in package
Dear all, I am writing a package with some of my favorite custom functions so that I can share them with others. I do not have a lot of experience building these packages and I apologize if this is a trivial question. The issue I am having is with the generic function unlist used to unlist GRangesList object (unlist(GRL) from the IRanges package) I have a function A in myPkg calling function B (myPkg::A{myPkg::B; }), which is in the same package and call the unlist function of a GRangesList object (myPkg::B{ unlist(GRL); }). For some reason, if I have the two function on the top level namespace, everything works, but when loaded from a package (library(myPkg); A(GRL)) it breaks at the unlist() step. However, if I fully qualify the unlist function in myPkgB (myPkg::B{IRanges::unlist(GRL); } ), then calling A(GRL) after loading the myPkg library works. So, are we expected to always fully qualify the unlist() function? (i.e. Calling it with it's package name myPkg::B{ IRanges::unlist(GRL) } ). I have been trying all strategy of Depends: and Imports: in my DESCRIPTION file and nothing works unless I fully qualify this function. What is the best practice? I tried using only Imports: as suggested by Chambers but it breaks. Using Depends does not help. Am I having clashing namespace? Here is my Depends: (or Imports:) line: Depends: Rsamtools, GenomicFeatures, parallel, rtracklayer, edgeR Am I simply missing something? Thanks -- Marco Blanchette, Ph.D. Stowers Institute for Medical Research 1000 East 50th Street Kansas City MO 64110 www.stowers.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Following progress in a lapply() function
Dear all, I am processing a very long and complicated list using lapply through a custom function and I would like to generate some sort of progress report. For instance, print a dot on the screen every time 1000 item have been process. Or even better, reporting the percent of the list that have been process every 10%. However, I can't seem to figure out a way to achieve that. For instance, I have a list of 50,000 slots: aList - replicate(5,list(rnorm(50))) That need to be process through the following custom function: myFnc - function(x){ tTest - t.test(x) return(list(p.value=tTest$p.value,t.stat=tTest$stat)) } Using an lapply statement, as in: myResults - lapply(aList, myFnc) The goal would be to report on the progress of the lapply() function during processing. Any suggestion would be greatly appreciated. Thanks Marco -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] command files
Try source('myFirstScript.R') Where myFirstScript.R as the following line x - rnorm(100) y - rnorm(100) plot(x,y) You could also use a editor like emacs with the ess-mode where one buffer can be your script with a live R session in a second buffer. Good luck On 12/2/08 7:21 AM, b g [EMAIL PROTECTED] wrote: Since I'm a SAS programmer, I'm used to creating command files in an editor for submission later. Is there a way to do this in R? I'd need to retain an ouput listing and a log to check for errors. _ Send e-mail faster without improving your typing skills. d_122008 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with the Rmpi package
Dear all, I just started to use the snow package to send multiple jobs on our cluster using MPI and the Rpmi package as the communication method. However, the Rmpi package have been behaving strangely. When I try to detach the Rmpi package I get the following error message: library(Rmpi) detach() Error in dyn.unload(file.path(libpath, libs, paste(Rmpi, .Platform$dynlib.ext, : dynamic/shared library '/Users/mab/Library/R/2.8/library/Rmpi/libs/Rmpi.so' was not loaded Following that error, the snow package seems to be unable to initiate a new cluster, whatever method is used. The fix is to kill and restart my R session. Any suggestion as to what is the problem? -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow and multi-processing
I think I found a solution. I do not like to use global variable by fear of unpredictable side-effects but, I think that in this case I don't have to much chance. Here is a mock function that pushes the content of a variable evaluated within a function to the nodes on the cluster, do some computation on the nodes using that variable and then return the result after cleaning up the newly created global variable. Let me know what you people think: aTest - function(x,n.nodes=2){ library(snow) #initialize a cluster makeCluster(rep('locahost',n.nodes),type='SOCK') #create a global variable y - x #export the variable to the cluster clusterExport(cl,'y') #do some computation on the cluster c - clusterEvalQ(cl,y+2) #remove the variable from the global environment rm(y, envir=.GlobalEnv) #stop the cluster stopCluster(cl) #exit and return the computation return(c) } On 11/29/08 6:59 PM, Marco Blanchette [EMAIL PROTECTED] wrote: Dear R gurus, I have a very embarrassingly parallelizable job that I am trying to speed up with snow on our local cluster. Basically, I am doing ~50,000 t.test for a series of micro-array experiments, one gene at a time. Thus, I can easily spread the load across multiple processors and nodes. So, I have a master list object that tells me what rows to pick up for each genes to do the t.test from series of microarray experiments containing ~500,000 rows and x columns per experiments. While trying to optimize my function using parLapply(), I quickly realized that I was not gaining any speed because every time a test was done on one of the item in the list, the 500,000 line by x column matrix had to be shipped along with the item in the list and the traffic time was actually longer than the computing time. However, if I export the 500,000 object first across the spawned processes as in this mock script cl - makeCluster(nnodes,method) mArrayData - getData(experiments) clusterExport(cl, 'mArrayData') Results - parLapply(cl, theMapList, function(x) t.testFnc(x)) With a function that define the mArrayData argument as a default parameter as in t.testFnc - function(probeList, array=mArrayData){ x - array[probeList$A,] y - array[probeList$B,] res - doSomeTest(x,y) return(res) } Using this strategy, I was able to gain full advantage of my cluster and reduce the analysis time by the number of nodes I have in our cluster. The large data matrix was resident in each processes and didn't have to travel on the network every time a item from the list was pass to the function t.testFnc() However, I quickly realized that this works (the call to clusterExport() ) only when I run the script one line at a time. When the process is enclosed in a function, the object mArrayData is not exported, presumably because it's not a global object from the Master process. So, what is the alternative to push the content of an object to the slaves? The documentation in the snow package is a bit light and I couldn't find good example out there. I don't want to have the function getData() evaluated on each nodes because the argument to that functions are humongous and that would cause way too much traffic on the network. I want the result of the function getData(), the object mArrayData, propagated to the cluster only once and be available to downstream functions. Hope this is clear and that a solution will be possible. Many thanks Marco -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Snow and multi-processing
Dear R gurus, I have a very embarrassingly parallelizable job that I am trying to speed up with snow on our local cluster. Basically, I am doing ~50,000 t.test for a series of micro-array experiments, one gene at a time. Thus, I can easily spread the load across multiple processors and nodes. So, I have a master list object that tells me what rows to pick up for each genes to do the t.test from series of microarray experiments containing ~500,000 rows and x columns per experiments. While trying to optimize my function using parLapply(), I quickly realized that I was not gaining any speed because every time a test was done on one of the item in the list, the 500,000 line by x column matrix had to be shipped along with the item in the list and the traffic time was actually longer than the computing time. However, if I export the 500,000 object first across the spawned processes as in this mock script cl - makeCluster(nnodes,method) mArrayData - getData(experiments) clusterExport(cl, 'mArrayData') Results - parLapply(cl, theMapList, function(x) t.testFnc(x)) With a function that define the mArrayData argument as a default parameter as in t.testFnc - function(probeList, array=mArrayData){ x - array[probeList$A,] y - array[probeList$B,] res - doSomeTest(x,y) return(res) } Using this strategy, I was able to gain full advantage of my cluster and reduce the analysis time by the number of nodes I have in our cluster. The large data matrix was resident in each processes and didn't have to travel on the network every time a item from the list was pass to the function t.testFnc() However, I quickly realized that this works (the call to clusterExport() ) only when I run the script one line at a time. When the process is enclosed in a function, the object mArrayData is not exported, presumably because it's not a global object from the Master process. So, what is the alternative to push the content of an object to the slaves? The documentation in the snow package is a bit light and I couldn't find good example out there. I don't want to have the function getData() evaluated on each nodes because the argument to that functions are humongous and that would cause way too much traffic on the network. I want the result of the function getData(), the object mArrayData, propagated to the cluster only once and be available to downstream functions. Hope this is clear and that a solution will be possible. Many thanks Marco -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] editor for MacOS X
Carbon emacs ( http://homepage.mac.com/zenitani/emacs-e.html) using the ess-mode ( http://ess.r-project.org/). Amazingly good integration of different buffer types for different tasks. You can have your R session running in a buffer, a .R buffer where you edit your functions/sources and with very simple key strokes you can send/run lines, functions or full buffer into the running R session. Great integration of the help pages too. This as become my central environment where I do all my computing work, programming (perl, python, etc...), shell work, R jobs, MySQL work etc... In addition, your Mac can be configure to run emacs remotely as a client on any other type of machine. For instance, at home, on my PC or on my Mac, I can fire up an SSH connection from either X11 or PuTTY to the Mac desktop in my office, then fire up emacs from the terminal, et voila! I am running jobs on the computer in my office (which as 8 core and 32Mb of RAM) from the same environment as I normally used in my office (can be a bit bandwidth intensive though). You should also check the noweb mode ( http://www.cs.tufts.edu/~nr/noweb/) for integrating codes and documentation, pretty cool. Cheers, Marco On 11/28/08 7:55 AM, John Fox [EMAIL PROTECTED] wrote: Dear Bunny, I've been using Eclipse with the StatET plug-in http://www.walware.de/goto/statet under both Windows and Mac OS X. Eclipse + StatET provides much more than a code editor, such as the ability to check and build packages and to interact with an svn archive. On the downside, it requires quite a bit of configuration. I hope this helps, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bunny, lautloscrew.com Sent: November-28-08 6:16 AM To: r-help@r-project.org Subject: [R] editor for MacOS X Hi all, just wondered again, if there is some R editor for Mac OS X comparable to TINN-R on windows. thx in advance.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 64bit R for Mac
Dear R gurus, On the CRAN website, it says that a 64bit version for Mac OS Tiger would be release shortly. Do we know what are the expected dates? Will the packages be also compiled for 64bit? We are running large microarray analysis and we keep hitting the 3Gb memory limit. I saw that there is a version available on the development mirrors, but I am not too excited to replace our very stable and reliable 32bit version with a 64bit binary that might not be that stable and with packages that would need to be 64bit compiled on site... Cheers -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataframe with single level column
Dear all, I have a dataframe with multiple observations and the levels as the last column, as in: d - data.frame(A=sample(1:100,12),B=sample(1:100,12),levels=c(rep('A',4),rep('B',4),rep('C',4))) d A B levels 1 77 40 A 2 14 18 A 3 56 7 A 4 46 27 A 5 63 35 B 6 80 21 B 7 3 54 B 8 93 76 B 9 5 46 C 10 16 53 C 11 40 17 C 12 25 31 C I need to run anova analyis on the group in levels against the merge data in the first two columns. I can manually split and join the different columns as in d.t - rbind(data.frame(value=d[,1],ind=d[,3]),data.frame(value=d[,2],ind=d[,3])) but I was wondering if there would be a more elegant and easy way than that that would prevent me from hard coding the different vectors making the data frame. Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suppression anova message
Dear all, I am running anova(lm()) on a series of different data frame and I am getting the following message Using dataFrame$levels as id variables 1. Why am I getting that message 2. How do I suppress it (or correct it). Thanks Marco -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suppression anova message
Dear all, I am running anova(lm()) on a series of different data frame and I am getting the following message Using dataFrame$levels as id variables 1. Why am I getting that message 2. How do I suppress it (or correct it). Thanks Marco -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] More list to vector puzzle
Many thanks for the answers on my previous question, it got me started. Indeed, stack() was the function I was vaguely remembering. However, I didn¹t get very far because my data set is way more complicated then I expected. In fact I have a mixture of levels and lists within a list. Basically, it resemble the following list (named data) made of the levels H and the list of lists A and T. for each level, the T[x]s are the same but the A[x]s are different. H - c(rep('H1',3),rep('H2',3),rep('H3',3)) A - list(A1=round(runif(3,100,1000)), + A2=round(runif(3,100,1000)), + A3=round(runif(3,100,1000)), + A4=round(runif(3,100,1000)), + A5=round(runif(3,100,1000)), + A6=round(runif(3,100,1000)), + A7=round(runif(3,100,1000)), + A8=round(runif(3,100,1000)), + A9=round(runif(3,100,1000)) + ) T1 - round(runif(7,1,10)) T2 - round(runif(5,1,10)) T3 - round(runif(6,1,10)) T - list(T1,T1,T1,T2,T2,T2,T3,T3,T3) data - list(H=H,A=A,T=T) Basically, it can be represented as the following data structure: H A T H1458 255 160 4 8 10 8 9 9 3 H1343 424 298 4 8 10 8 9 9 3 H1608 831 544 4 8 10 8 9 9 3 H2616 266 413 7 3 5 4 5 H2687 796 752 7 3 5 4 5 H2814 921 228 7 3 5 4 5 H3789 558 400 8 3 3 7 6 5 H3845 298 855 8 3 3 7 6 5 H3725 366 621 8 3 3 7 6 5 My goal is to get for each level of H a data frame of the value of As with an indices representing what level of A it is coming and a single representation of the Ts with a corresponding level. And so for every Hs. My goal is to apply a linear model of value~ind for each H (of course, the data are fake here) followed by an anova analysis for each H. Thus, for each level of H I need something similar to: $H1 value ind 458 A1 255 A1 160 A1 343 A2 424 A2 298 A2 608 A3 831 A3 544 A3 4 T 8 T 10 T 8 T 9 T 9 T 3 T ... As you might have guess, we have several tens of thousand of Hs, thus, I cannot just do it manually one at a time. I tried breaking down the problem into small pieces but ended up not very far. I was very excited when I got the following call to produce the expected result: a - tapply(data$A,data$H,function(x) stack(x)) t - tapply(data$T,data$H,function(x) x[1]) tt - lapply(t,function(x) data.frame(values=unlist(x), + ind=rep(1:length(x),sapply(x,length a $H1 values ind 1458 A1 2255 A1 3160 A1 4343 A2 5424 A2 6298 A2 7608 A3 8831 A3 9544 A3 ... tt $H1 values ind 1 4 1 2 8 2 3 10 3 4 8 4 5 9 5 6 9 6 7 3 7 ... However, I tried to rbind the list in a and tt (which represent the H level) using lapply or sapply without any success. I am in need of some guru advices on this one... Also, I am not sure this is the most elegant want to produce the data structure I am trying to build. Any advice? Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transforming a list to a vector with associated levels
I am pretty sure that I came across a function that creates a vector of levels from a list but I just can't remember. Basically, I have something like t - list(A=c(4,1,4),B=c(3,7,9,2)) t $A [1] 4 1 4 $B [1] 3 7 9 2 And I would like to get something like the following: t levels 4 1 1 1 4 1 3 2 7 2 9 2 2 2 I tried unlist without success. I also do remember that there is a corresponding function that create a list of t's according to the level from the matrix a draw but no luck to remember it. Thanks -- Marco Blanchette, Ph.D. Assistant Investigator Stowers Institute for Medical Research 1000 East 50th St. Kansas City, MO 64110 Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.