[R] IRanges::unlist in package

2013-08-07 Thread Blanchette, Marco
Dear all,

I am writing a package with some of my favorite custom functions so that I can 
share them with others. I do not have a lot of experience building these 
packages and I apologize if this is a trivial question.

The issue I am having is with the generic function unlist used to unlist 
GRangesList object (unlist(GRL) from the IRanges package)

I have a function A in myPkg calling function B (myPkg::A{myPkg::B; …}), which 
is in the same package and call the unlist function of a GRangesList object 
(myPkg::B{ unlist(GRL); … }). For some reason, if I have the two function on 
the top level namespace, everything works, but when loaded from a package 
(library(myPkg); A(GRL)) it breaks at the unlist() step. However, if I fully 
qualify the unlist function in myPkgB (myPkg::B{IRanges::unlist(GRL); …} ), 
then calling A(GRL) after loading the myPkg library works.

So, are we expected to always fully qualify the unlist() function? (i.e. 
Calling it with it's package name myPkg::B{ IRanges::unlist(GRL) } ). I have 
been trying all strategy of Depends: and Imports: in my DESCRIPTION file and 
nothing works unless I fully qualify this function.

What is the best practice? I tried using only Imports: as suggested by Chambers 
but it breaks. Using Depends does not help.
Am I having clashing namespace? Here is my Depends: (or Imports:) line: 
Depends: Rsamtools, GenomicFeatures, parallel, rtracklayer, edgeR

Am I simply missing something?

Thanks

--  Marco Blanchette, Ph.D.
Stowers Institute for Medical Research
1000 East 50th Street
Kansas City MO 64110
www.stowers.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Following progress in a lapply() function

2009-03-22 Thread Blanchette, Marco
Dear all,

I am processing a very long and complicated list using lapply through a custom 
function and I would like to generate some sort of progress report. For 
instance, print a dot on the screen every time 1000 item have been process. Or 
even better, reporting the percent of the list that have been process every 
10%. However, I can't seem to figure out a way to achieve that.

For instance, I have a list of 50,000 slots:

aList - replicate(5,list(rnorm(50)))

That need to be process through the following custom function:

myFnc - function(x){
 tTest - t.test(x)
return(list(p.value=tTest$p.value,t.stat=tTest$stat))
}

Using an lapply statement, as in:

myResults - lapply(aList, myFnc)

The goal would be to report on the progress of the lapply() function during 
processing.

Any suggestion would be greatly appreciated.

Thanks

Marco


--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] command files

2008-12-02 Thread Blanchette, Marco
Try

 source('myFirstScript.R')

Where myFirstScript.R as the following line

 x - rnorm(100)
 y - rnorm(100)
 plot(x,y)

You could also use a editor like emacs with the ess-mode where one buffer can 
be your script with a live R session in a second buffer.

Good luck


On 12/2/08 7:21 AM, b g [EMAIL PROTECTED] wrote:



Since I'm a SAS programmer, I'm used to creating command files in an editor for 
submission later.  Is there a way to do this in R?  I'd need to retain an ouput 
listing and a log to check for errors.
_
Send e-mail faster without improving your typing skills.

d_122008
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with the Rmpi package

2008-12-02 Thread Blanchette, Marco
Dear all,

I just started to use the snow package to send multiple jobs on our cluster 
using MPI and the Rpmi package as the communication method.

However, the Rmpi package have been behaving strangely. When I try to detach 
the Rmpi package I get the following error message:

 library(Rmpi)
 detach()
Error in dyn.unload(file.path(libpath, libs, paste(Rmpi, 
.Platform$dynlib.ext,  :
  dynamic/shared library '/Users/mab/Library/R/2.8/library/Rmpi/libs/Rmpi.so' 
was not loaded

Following that error, the snow package seems to be unable to initiate a new 
cluster, whatever method is used. The fix is to kill  and restart my R session.

Any suggestion as to what is the problem?

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Snow and multi-processing

2008-11-30 Thread Blanchette, Marco
I think I found a solution. I do not like to use global variable by fear of 
unpredictable side-effects but, I think that in this case I don't have to much 
chance.

Here is a mock function that pushes the content of a variable evaluated within 
a function to the nodes on the cluster, do some computation on the nodes using 
that variable and then return the result after cleaning up the newly created 
global variable.

Let me know what you people think:

aTest - function(x,n.nodes=2){
  library(snow)

  #initialize a cluster
  makeCluster(rep('locahost',n.nodes),type='SOCK')

  #create a global variable
  y - x

  #export the variable to the cluster
  clusterExport(cl,'y')

  #do some computation on the cluster
  c - clusterEvalQ(cl,y+2)

  #remove the variable from the global environment
  rm(y, envir=.GlobalEnv)

  #stop the cluster
  stopCluster(cl)

  #exit and return the computation
  return(c)
}


On 11/29/08 6:59 PM, Marco Blanchette [EMAIL PROTECTED] wrote:

Dear R gurus,

I have a very embarrassingly parallelizable job that I am trying to speed up 
with snow on our local cluster. Basically, I am doing ~50,000 t.test for a 
series of micro-array experiments, one gene at a time. Thus, I can easily 
spread the load across multiple processors and nodes.

So, I have a master list object that tells me what rows to pick up for each 
genes to do the t.test from series of microarray experiments containing 
~500,000 rows and x columns per experiments.

While trying to optimize my function using parLapply(), I quickly realized that 
I was not gaining any speed because every time a test was done on one of the 
item in the list, the 500,000 line by x column matrix had to be shipped along 
with the item in the list and the traffic time was actually longer than the 
computing time.

However, if I export the 500,000 object first across the spawned processes as 
in this mock script

cl - makeCluster(nnodes,method)
mArrayData - getData(experiments)
clusterExport(cl, 'mArrayData')

Results - parLapply(cl, theMapList, function(x) t.testFnc(x))

With a function that define the mArrayData argument as a default parameter as in

t.testFnc - function(probeList, array=mArrayData){
x - array[probeList$A,]
y - array[probeList$B,]
 res - doSomeTest(x,y)
return(res)
}

Using this strategy, I was able to gain full advantage of my cluster and reduce 
the analysis time by the number of nodes I have in our cluster. The large data 
matrix was resident in each processes and didn't have to travel on the network 
every time a item from the list was pass to the function t.testFnc()

However, I quickly realized that this works (the call to clusterExport() ) only 
when I run the script one line at a time. When the process is enclosed in a 
function, the object mArrayData is not exported, presumably because it's not a 
global object from the Master process.

So, what is the alternative to push the content of an object to the slaves? The 
documentation in the snow package is a bit light and I couldn't find good 
example out there. I don't want to have the function getData() evaluated on 
each nodes because the argument to that functions are humongous and that would 
cause way too much traffic on the network. I want the result of the function 
getData(), the object mArrayData, propagated to the cluster only once and be 
available to downstream functions.

Hope this is clear and that a solution will be possible.

Many thanks

Marco

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Snow and multi-processing

2008-11-29 Thread Blanchette, Marco
Dear R gurus,

I have a very embarrassingly parallelizable job that I am trying to speed up 
with snow on our local cluster. Basically, I am doing ~50,000 t.test for a 
series of micro-array experiments, one gene at a time. Thus, I can easily 
spread the load across multiple processors and nodes.

So, I have a master list object that tells me what rows to pick up for each 
genes to do the t.test from series of microarray experiments containing 
~500,000 rows and x columns per experiments.

While trying to optimize my function using parLapply(), I quickly realized that 
I was not gaining any speed because every time a test was done on one of the 
item in the list, the 500,000 line by x column matrix had to be shipped along 
with the item in the list and the traffic time was actually longer than the 
computing time.

However, if I export the 500,000 object first across the spawned processes as 
in this mock script

cl - makeCluster(nnodes,method)
mArrayData - getData(experiments)
clusterExport(cl, 'mArrayData')

Results - parLapply(cl, theMapList, function(x) t.testFnc(x))

With a function that define the mArrayData argument as a default parameter as in

t.testFnc - function(probeList, array=mArrayData){
x - array[probeList$A,]
y - array[probeList$B,]
 res - doSomeTest(x,y)
return(res)
}

Using this strategy, I was able to gain full advantage of my cluster and reduce 
the analysis time by the number of nodes I have in our cluster. The large data 
matrix was resident in each processes and didn't have to travel on the network 
every time a item from the list was pass to the function t.testFnc()

However, I quickly realized that this works (the call to clusterExport() ) only 
when I run the script one line at a time. When the process is enclosed in a 
function, the object mArrayData is not exported, presumably because it's not a 
global object from the Master process.

So, what is the alternative to push the content of an object to the slaves? The 
documentation in the snow package is a bit light and I couldn't find good 
example out there. I don't want to have the function getData() evaluated on 
each nodes because the argument to that functions are humongous and that would 
cause way too much traffic on the network. I want the result of the function 
getData(), the object mArrayData, propagated to the cluster only once and be 
available to downstream functions.

Hope this is clear and that a solution will be possible.

Many thanks

Marco

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] editor for MacOS X

2008-11-28 Thread Blanchette, Marco
Carbon emacs ( http://homepage.mac.com/zenitani/emacs-e.html) using the 
ess-mode ( http://ess.r-project.org/). Amazingly good integration of different 
buffer types for different tasks. You can have your R session running in a 
buffer, a .R buffer where you edit your functions/sources and with very simple 
key strokes you can send/run lines, functions or full buffer into the running R 
session. Great integration of the help pages too.

This as become my central environment where I do all my computing work, 
programming (perl, python, etc...), shell work, R jobs, MySQL work etc... In 
addition, your Mac can be configure to run emacs remotely as a client on any 
other type of machine. For instance, at home, on my PC or on my Mac, I can fire 
up an SSH connection from either X11 or PuTTY to the Mac desktop in my office, 
then fire up emacs from the terminal, et voila! I am running jobs on the 
computer in my office (which as 8 core and 32Mb of RAM) from the same 
environment as I normally used in my office (can be a bit bandwidth intensive 
though).

You should also check the noweb mode ( http://www.cs.tufts.edu/~nr/noweb/) for 
integrating codes and documentation, pretty cool.

Cheers,

Marco


On 11/28/08 7:55 AM, John Fox [EMAIL PROTECTED] wrote:

Dear Bunny,

I've been using Eclipse with the StatET plug-in
http://www.walware.de/goto/statet under both Windows and Mac OS X. Eclipse
+ StatET provides much more than a code editor, such as the ability to check
and build packages and to interact with an svn archive. On the downside, it
requires quite a bit of configuration.

I hope this helps,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Bunny, lautloscrew.com
 Sent: November-28-08 6:16 AM
 To: r-help@r-project.org
 Subject: [R] editor for MacOS X

 Hi all,

 just wondered again, if there is some R editor for Mac OS X comparable
 to TINN-R on windows.

 thx in advance..

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 64bit R for Mac

2008-11-24 Thread Blanchette, Marco
Dear R gurus,

On the CRAN website, it says that a 64bit version for Mac OS Tiger would be 
release shortly. Do we know what are the expected dates? Will the packages be 
also compiled for 64bit?

We are running large microarray analysis and we keep hitting the 3Gb memory 
limit.

I saw that there is a version available on the development mirrors, but I am 
not too excited to replace our very stable and reliable 32bit version with a 
64bit binary that might not be that stable and with packages that would need to 
be 64bit compiled on site...

Cheers
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dataframe with single level column

2008-11-21 Thread Blanchette, Marco
Dear all,

I have a dataframe with multiple observations and the levels as the last 
column, as in:
d - 
data.frame(A=sample(1:100,12),B=sample(1:100,12),levels=c(rep('A',4),rep('B',4),rep('C',4)))
 d
A  B levels
1  77 40  A
2  14 18  A
3  56  7  A
4  46 27  A
5  63 35  B
6  80 21  B
7   3 54  B
8  93 76  B
9   5 46  C
10 16 53  C
11 40 17  C
12 25 31  C

I need to run anova analyis on the group in levels against the merge data in 
the first two columns. I can manually split and join the different columns as in

 d.t - 
 rbind(data.frame(value=d[,1],ind=d[,3]),data.frame(value=d[,2],ind=d[,3]))

but I was wondering if there would be a more elegant and easy way than that 
that would prevent me from hard coding the different vectors making the data 
frame.

Thanks

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Suppression anova message

2008-11-21 Thread Blanchette, Marco
Dear all,

I am running anova(lm()) on a series of different data frame and I am getting 
the following message

Using dataFrame$levels as id variables


 1.  Why am I getting that message
 2.  How do I suppress it (or correct it).

Thanks

Marco

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Suppression anova message

2008-11-21 Thread Blanchette, Marco
Dear all,

I am running anova(lm()) on a series of different data frame and I am getting 
the following message

Using dataFrame$levels as id variables


 1.  Why am I getting that message
 2.  How do I suppress it (or correct it).

Thanks

Marco

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] More list to vector puzzle

2008-11-19 Thread Blanchette, Marco
Many thanks for the answers on my previous question, it got me started.
Indeed, stack() was the function I was vaguely remembering.

However, I didn¹t get very far because my data set is way more complicated
then I expected. In fact I have a mixture of levels and lists within a list.
Basically, it resemble the following list (named data) made of the levels H
and the list of lists A and T. for each level, the T[x]s are the same but
the A[x]s are different.

H - c(rep('H1',3),rep('H2',3),rep('H3',3))
 A - list(A1=round(runif(3,100,1000)),
+  A2=round(runif(3,100,1000)),
+  A3=round(runif(3,100,1000)),
+  A4=round(runif(3,100,1000)),
+  A5=round(runif(3,100,1000)),
+  A6=round(runif(3,100,1000)),
+  A7=round(runif(3,100,1000)),
+  A8=round(runif(3,100,1000)),
+  A9=round(runif(3,100,1000))
+  )
 T1 - round(runif(7,1,10))
 T2 - round(runif(5,1,10))
 T3 - round(runif(6,1,10))
 T - list(T1,T1,T1,T2,T2,T2,T3,T3,T3)
 data - list(H=H,A=A,T=T)

Basically, it can be represented as the following data structure:
H A   T
H1458 255 160 4  8  10 8  9  9  3
H1343 424 298 4  8  10 8  9  9  3
H1608 831 544 4  8  10 8  9  9  3

H2616 266 413 7  3  5  4  5
H2687 796 752 7  3  5  4  5
H2814 921 228 7  3  5  4  5

H3789 558 400 8  3  3  7  6  5
H3845 298 855 8  3  3  7  6  5
H3725 366 621 8  3  3  7  6  5

My goal is to get for each level of H a data frame of the value of As with
an indices representing what level of A it is coming and a single
representation of the Ts with a corresponding level. And so for every Hs. My
goal is to apply a linear model of value~ind for each H (of course, the data
are fake here) followed by an anova analysis for each H. Thus, for each
level of H I need something similar to:

$H1
value ind
458 A1
255 A1
160 A1
343 A2
424 A2
298 A2
608 A3
831 A3
544 A3
4   T
8   T
10  T
8   T
9   T
9   T
3   T
...

As you might have guess, we have several tens of thousand of Hs, thus, I
cannot just do it manually one at a time. I tried breaking down the problem
into small pieces but ended up not very far.

I was very excited when I got the following call to produce the expected
result:

 a - tapply(data$A,data$H,function(x) stack(x))
 t - tapply(data$T,data$H,function(x) x[1])
 tt - lapply(t,function(x) data.frame(values=unlist(x),
+ ind=rep(1:length(x),sapply(x,length
a
$H1
  values ind
1458  A1
2255  A1
3160  A1
4343  A2
5424  A2
6298  A2
7608  A3
8831  A3
9544  A3
...

 tt
$H1
  values ind
1  4   1
2  8   2
3 10   3
4  8   4
5  9   5
6  9   6
7  3   7
...

However, I tried to rbind the list in a and tt (which represent the H level)
using lapply or sapply without any success.

I am in need of some guru advices on this one...

Also, I am not sure this is the most elegant want to produce the data
structure I am trying to build. Any advice?

Thanks

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Transforming a list to a vector with associated levels

2008-11-18 Thread Blanchette, Marco
I am pretty sure that I came across a function that creates a vector of levels 
from a list but I just can't remember.

Basically, I have something like

 t - list(A=c(4,1,4),B=c(3,7,9,2))
 t
$A
[1] 4 1 4

$B
[1] 3 7 9 2

And I would like to get something like the following:
t levels
4 1
1 1
4 1
3 2
7 2
9 2
2 2

I tried unlist without success. I also do remember that there is a 
corresponding function that create a list of t's according to the level from 
the matrix a draw but no luck to remember it.

Thanks

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.