Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Ludwig Geistlinger
Thanks for these explanations, Martin.

What I actually want to do is sending each expression dataset on probe level to 
a worker, where probe level expression is summarized to gene level expression 
according to a chosen summarization function (such as 'mean').

The worker should then return the expression dataset on gene level back to the 
manager.

Taking also Vince' suggestion and your considerations on work load into 
account, it seems best if I first collect the probe2gene mappings for the 
unique set of annotation packages (which would only be 2 in the example) in 
serial mode on the manager.
Then, I can send the datasets together with the respective mapping as a simple 
table to the workers, which carry out the actual computational part, ie. the 
summarization.   

This would avoid parallel access on the database and would also spare 
retrieving the mappings on each worker over in over again, as it is likely that 
some datasets of the compendium share the corresponding annotation db.

Thanks,
Ludwig
 
--
Dr. Ludwig Geistlinger
CUNY School of Public Health


From: Martin Morgan <martin.mor...@roswellpark.org>
Sent: Friday, January 19, 2018 4:10 PM
To: Ludwig Geistlinger; Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image 
is malformed

On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:
> I apologize if I haven't been specific enough - however, I am also having 
> trouble to reliably reproduce the error.
> It does not seem to be exclusively related to the combination of 
> AnnotationDbi and parallel computation, but also with some other packages I 
> load.
>
> While still trying to produce a minimal reproducible example,  here is what 
> returns the error quite reliably on an 8-core Linux machine:
>
> # loading some dependencies of my package
> library(org.Hs.eg.db)
> library(pathview)
> library(graph)
> library(BiocParallel)
>
> # annotation packages for the datasets in the compendium
> pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)
>
> #
> getSymbols <- function ( anno.pkg )
> {
>  require(anno.pkg, character.only=TRUE)
>  anno.pkg <- get(anno.pkg)
>  syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg),
>  keytype="PROBEID", column="ENTREZID")
>  return(syms)
> }
>
>> x <- bplapply( pkgs , getSymbols )   ### sometimes I have to run this 2 
>> or 3 times in a row to produce this error
> Loading required package: hgu133plus2.db
>
> Error: BiocParallel errors
>element index: 29, 30, 31, 32, 33, 34, ...
>first error: database disk image is malformed

My guess is that the database is being accessed by multiple processes
simultaneously and, even though the data bases are opened read-only,
this causes a corruption in the access of some sort. You can avoid
multiple processes accessing the database at the same time by using a 'lock'

getSymbols <- function ( anno.pkg, id )
{
 nmspc <- loadNamespace(anno.pkg)
 anno.pkg <- get(anno.pkg, nmspc)

 BiocParallel::ipclock(id)
 syms <- suppressMessages({
 AnnotationDbi::mapIds(
 anno.pkg, keys=keys(anno.pkg), keytype="PROBEID",
 column="ENTREZID"
 )
 })
 BiocParallel::ipcunlock(id)

 length(syms)
}

x <- bplapply(pkgs , getSymbols, ipcid())

There are two additional considerations here.

The first is that one wants to worry about the amount of data transfered
between worker and manager compared to the amount of time spent in
computation. So in your previous formulation you sent back all the
symbols -- this will be relatively expensive compared to the amount of
work done in the function (reading the ids from the database), and you
would rather do more work and transmit less (both to and from the
worker) in each call to getSymbol().

The second is similar, but from the lock perspective -- since the lock
imposes essential serial evaluation through that portion of the code,
you'd like the locked portion of the worker's task to be just a small
portion of the total work done by the worker.

I guess a more clever use of locks would be one per data base (generate
two ipcid()'s in the manager, and pass these to the worker in such a way
that the worker uses the same lock for each database.

Martin

>
>
>> sessionInfo()
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: SUSE Linux Enterprise Desktop 12 SP3
>
> Matrix products: default
> BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so
> LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERI

Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Martin Morgan

On 01/19/2018 02:24 PM, Ludwig Geistlinger wrote:

I apologize if I haven't been specific enough - however, I am also having 
trouble to reliably reproduce the error.
It does not seem to be exclusively related to the combination of AnnotationDbi 
and parallel computation, but also with some other packages I load.

While still trying to produce a minimal reproducible example,  here is what 
returns the error quite reliably on an 8-core Linux machine:
  
# loading some dependencies of my package

library(org.Hs.eg.db)
library(pathview)
library(graph)
library(BiocParallel)

# annotation packages for the datasets in the compendium
pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)

#
getSymbols <- function ( anno.pkg )
{
 require(anno.pkg, character.only=TRUE)
 anno.pkg <- get(anno.pkg)
 syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg),
 keytype="PROBEID", column="ENTREZID")
 return(syms)
}


x <- bplapply( pkgs , getSymbols )   ### sometimes I have to run this 2 or 
3 times in a row to produce this error

Loading required package: hgu133plus2.db

Error: BiocParallel errors
   element index: 29, 30, 31, 32, 33, 34, ...
   first error: database disk image is malformed


My guess is that the database is being accessed by multiple processes 
simultaneously and, even though the data bases are opened read-only, 
this causes a corruption in the access of some sort. You can avoid 
multiple processes accessing the database at the same time by using a 'lock'


getSymbols <- function ( anno.pkg, id )
{
nmspc <- loadNamespace(anno.pkg)
anno.pkg <- get(anno.pkg, nmspc)

BiocParallel::ipclock(id)
syms <- suppressMessages({
AnnotationDbi::mapIds(
anno.pkg, keys=keys(anno.pkg), keytype="PROBEID",
column="ENTREZID"
)
})
BiocParallel::ipcunlock(id)

length(syms)
}

x <- bplapply(pkgs , getSymbols, ipcid())

There are two additional considerations here.

The first is that one wants to worry about the amount of data transfered 
between worker and manager compared to the amount of time spent in 
computation. So in your previous formulation you sent back all the 
symbols -- this will be relatively expensive compared to the amount of 
work done in the function (reading the ids from the database), and you 
would rather do more work and transmit less (both to and from the 
worker) in each call to getSymbol().


The second is similar, but from the lock perspective -- since the lock 
imposes essential serial evaluation through that portion of the code, 
you'd like the locked portion of the worker's task to be just a small 
portion of the total work done by the worker.


I guess a more clever use of locks would be one per data base (generate 
two ipcid()'s in the manager, and pass these to the worker in such a way 
that the worker uses the same lock for each database.


Martin





sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Desktop 12 SP3

Matrix products: default
BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so
LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] BiocParallel_1.12.0  graph_1.56.0 pathview_1.18.0
[4] org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0 IRanges_2.12.0
[7] S4Vectors_0.16.0 Biobase_2.38.0   BiocGenerics_0.24.0

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.14  KEGGgraph_1.38.1  XVector_0.18.0zlibbioc_1.24.0
  [5] bit_1.1-12R6_2.2.2  rlang_0.1.6   blob_1.1.0
  [9] httr_1.3.1tools_3.4.1   grid_3.4.1png_0.1-7
[13] DBI_0.7   bit64_0.9-7   digest_0.6.13 tibble_1.4.1
[17] Rgraphviz_2.22.0  KEGGREST_1.18.0   memoise_1.1.0 RSQLite_2.0
[21] compiler_3.4.1pillar_1.1.0  Biostrings_2.46.0 XML_3.98-1.9
[25] pkgconfig_2.0.1


--
Dr. Ludwig Geistlinger
CUNY School of Public Health


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Martin Morgan 
<martin.mor...@roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image 
is malformed

On 01/19/2018 12:37 PM, Gabe Becker wrote:

IT seems like you could also force a copy of the reference object via
$copy() a

Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Ludwig Geistlinger
I apologize if I haven't been specific enough - however, I am also having 
trouble to reliably reproduce the error.
It does not seem to be exclusively related to the combination of AnnotationDbi 
and parallel computation, but also with some other packages I load.

While still trying to produce a minimal reproducible example,  here is what 
returns the error quite reliably on an 8-core Linux machine:
 
# loading some dependencies of my package
library(org.Hs.eg.db)
library(pathview)
library(graph)
library(BiocParallel)

# annotation packages for the datasets in the compendium 
pkgs <- rep(c("hgu133plus2.db","hgu133a.db"), 42)

#
getSymbols <- function ( anno.pkg )
{ 
require(anno.pkg, character.only=TRUE)
anno.pkg <- get(anno.pkg)
syms <- AnnotationDbi::mapIds(anno.pkg, keys=keys(anno.pkg), 
keytype="PROBEID", column="ENTREZID")
return(syms)
}

> x <- bplapply( pkgs , getSymbols )   ### sometimes I have to run this 2 
> or 3 times in a row to produce this error
Loading required package: hgu133plus2.db

Error: BiocParallel errors
  element index: 29, 30, 31, 32, 33, 34, ...
  first error: database disk image is malformed


> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Desktop 12 SP3

Matrix products: default
BLAS: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRblas.so
LAPACK: /mnt/raidbio/biosoft/software/R/R-3.4.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets 
[8] methods   base 

other attached packages:
[1] BiocParallel_1.12.0  graph_1.56.0 pathview_1.18.0 
[4] org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0 IRanges_2.12.0  
[7] S4Vectors_0.16.0 Biobase_2.38.0   BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14  KEGGgraph_1.38.1  XVector_0.18.0zlibbioc_1.24.0  
 [5] bit_1.1-12R6_2.2.2  rlang_0.1.6   blob_1.1.0   
 [9] httr_1.3.1tools_3.4.1   grid_3.4.1png_0.1-7
[13] DBI_0.7   bit64_0.9-7   digest_0.6.13 tibble_1.4.1 
[17] Rgraphviz_2.22.0  KEGGREST_1.18.0   memoise_1.1.0 RSQLite_2.0  
[21] compiler_3.4.1pillar_1.1.0  Biostrings_2.46.0 XML_3.98-1.9 
[25] pkgconfig_2.0.1  


--
Dr. Ludwig Geistlinger
CUNY School of Public Health


From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Martin Morgan 
<martin.mor...@roswellpark.org>
Sent: Friday, January 19, 2018 1:54 PM
To: Gabe Becker; Vincent Carey
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image 
is malformed

On 01/19/2018 12:37 PM, Gabe Becker wrote:
> IT seems like you could also force a copy of the reference object via
> $copy() and then force a refresh of the conn slot by assigning a
> new db connection into it.
>
> I'm having trouble confirming that this would work, however, because I
> actually can't reproduce the error. The naive way works for me on my mac
> laptop (which is running an old R and Bioconductor) and on the linux
> cluster I have access to (running Bioc 3.6):
>
>
> (cluster)
>
>> getSymbol <- function ( x ) {
>
> + return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )
>
> + }

pass the data base connection to the function

getSymbol <- function ( x, db )
 ## olde schoole
 AnnotationDbi::mget(x, db)
 ## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")

and arrange for the general case, i.e., distinct processes with data
serialized between them

 > cl = parallel::makePSOCKcluster(2)
 > parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
Error in checkForRemoteErrors(val) :
   2 nodes produced errors; first error: external pointer is not valid

(getSymbol would fail as originally written in the serial case, since
the workers would not have access to hgu95av2SYMBOL

The workaround is to open the connection on the node, e.g.,

getSymbol <- function ( x, dbname ) {
 nmspc <- loadNamespace(dbname)
 db <- get(dbname, nmspc)
 AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
}

lapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())

Martin

>>
>
>> x <- list( "36090_at" , "38785_at&qu

Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Martin Morgan

On 01/19/2018 12:23 PM, Vincent Carey wrote:

good question

some of the discussion on

http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html

seems relevant.

converting the relatively small annotation package content to pure R
read-only tables on the master before parallelizing
might be very simple?

On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
ludwig.geistlin...@sph.cuny.edu> wrote:


Hi,

Within a package I am developing, I would like to enable parallel probe to
gene mapping for a compendium of microarray datasets.

This accordingly makes use of annotation packages such as hgu133a.db,
which in turn connect to the SQLite database via AnnotationDbi.

When running in multi-core mode (i.e. using a MulticoreParam with
BiocParallel) using more than 2 cores, this causes the error:

database disk image is malformed


In a very similar problem:

https://support.bioconductor.org/p/38541/

Adi Tarca and Dan Tenenbaum identified and resolved this problem by
ensuring that each process has its own unique database connection, i.e.
AnnotationDbi is not loaded before sending the job to the workers.

This solution was easily realized as this analysis was carried out within
a script and not a package.

However, within my package, AnnotationDbi is loaded as a dependency of my
package's imports.

How to resolve this here?


Can you be a little more specific here? The problem isn't likely with 
AnnotationDbi per se, but with the annotation package you use. Also, the 
connection on the worker is bad, but could be re-created (using, e.g., 
dbfile(org.Hs.eg.db)...) but probably a toy example would help.


Martin


I am not sure whether I perfectly understand the underlying mechanisms,
but is there a way to make my workers load their own version of
AnnotationDbi instead of using the one of the parent process?
Or am I supposed to unload all packages depending on AnnotationDbi, and
AnnotationDbi itself, before sending the job to the workers (and reload all
of them after the job has finished?)

Thanks a lot,
Ludwig



--
Dr. Ludwig Geistlinger
CUNY School of Public Health

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Gabe Becker
IT seems like you could also force a copy of the reference object via
$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.

I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):


(cluster)

> getSymbol <- function ( x ) {

+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )

+ }

>

> x <- list( "36090_at" , "38785_at" )

>

> mclapply( x , getSymbol )

[[1]]

[[1]]$`36090_at`

[1] "TBL2"



[[2]]

[[2]]$`38785_at`

[1] "MUC1"



>

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)


Matrix products: default

BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so

LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so


locale:

 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C

 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8

 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C

 [9] LC_ADDRESS=C   LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

[1] hgu95av2.db_3.2.3org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0

[4] IRanges_2.12.0   S4Vectors_0.16.0 Biobase_2.38.0

[7] BiocGenerics_0.24.0


loaded via a namespace (and not attached):

 [1] Rcpp_0.12.14digest_0.6.14   DBI_0.7 RSQLite_2.0

 [5] pillar_1.1.0rlang_0.1.6 blob_1.1.0  bit64_0.9-8

 [9] bit_1.1-13  compiler_3.4.3  pkgconfig_2.0.1 memoise_1.1.0

[13] tibble_1.4.1

>


~G

On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey 
wrote:

> good question
>
> some of the discussion on
>
> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
> read-only-in-memory-database-td91814.html
>
> seems relevant.
>
> converting the relatively small annotation package content to pure R
> read-only tables on the master before parallelizing
> might be very simple?
>
> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
> ludwig.geistlin...@sph.cuny.edu> wrote:
>
> > Hi,
> >
> > Within a package I am developing, I would like to enable parallel probe
> to
> > gene mapping for a compendium of microarray datasets.
> >
> > This accordingly makes use of annotation packages such as hgu133a.db,
> > which in turn connect to the SQLite database via AnnotationDbi.
> >
> > When running in multi-core mode (i.e. using a MulticoreParam with
> > BiocParallel) using more than 2 cores, this causes the error:
> >
> > database disk image is malformed
> >
> >
> > In a very similar problem:
> >
> > https://support.bioconductor.org/p/38541/
> >
> > Adi Tarca and Dan Tenenbaum identified and resolved this problem by
> > ensuring that each process has its own unique database connection, i.e.
> > AnnotationDbi is not loaded before sending the job to the workers.
> >
> > This solution was easily realized as this analysis was carried out within
> > a script and not a package.
> >
> > However, within my package, AnnotationDbi is loaded as a dependency of my
> > package's imports.
> >
> > How to resolve this here?
> > I am not sure whether I perfectly understand the underlying mechanisms,
> > but is there a way to make my workers load their own version of
> > AnnotationDbi instead of using the one of the parent process?
> > Or am I supposed to unload all packages depending on AnnotationDbi, and
> > AnnotationDbi itself, before sending the job to the workers (and reload
> all
> > of them after the job has finished?)
> >
> > Thanks a lot,
> > Ludwig
> >
> >
> >
> > --
> > Dr. Ludwig Geistlinger
> > CUNY School of Public Health
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Vincent Carey
good question

some of the discussion on

http://sqlite.1065341.n5.nabble.com/Parallel-access-to-read-only-in-memory-database-td91814.html

seems relevant.

converting the relatively small annotation package content to pure R
read-only tables on the master before parallelizing
might be very simple?

On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
ludwig.geistlin...@sph.cuny.edu> wrote:

> Hi,
>
> Within a package I am developing, I would like to enable parallel probe to
> gene mapping for a compendium of microarray datasets.
>
> This accordingly makes use of annotation packages such as hgu133a.db,
> which in turn connect to the SQLite database via AnnotationDbi.
>
> When running in multi-core mode (i.e. using a MulticoreParam with
> BiocParallel) using more than 2 cores, this causes the error:
>
> database disk image is malformed
>
>
> In a very similar problem:
>
> https://support.bioconductor.org/p/38541/
>
> Adi Tarca and Dan Tenenbaum identified and resolved this problem by
> ensuring that each process has its own unique database connection, i.e.
> AnnotationDbi is not loaded before sending the job to the workers.
>
> This solution was easily realized as this analysis was carried out within
> a script and not a package.
>
> However, within my package, AnnotationDbi is loaded as a dependency of my
> package's imports.
>
> How to resolve this here?
> I am not sure whether I perfectly understand the underlying mechanisms,
> but is there a way to make my workers load their own version of
> AnnotationDbi instead of using the one of the parent process?
> Or am I supposed to unload all packages depending on AnnotationDbi, and
> AnnotationDbi itself, before sending the job to the workers (and reload all
> of them after the job has finished?)
>
> Thanks a lot,
> Ludwig
>
>
>
> --
> Dr. Ludwig Geistlinger
> CUNY School of Public Health
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Ludwig Geistlinger
Hi,

Within a package I am developing, I would like to enable parallel probe to gene 
mapping for a compendium of microarray datasets.

This accordingly makes use of annotation packages such as hgu133a.db, which in 
turn connect to the SQLite database via AnnotationDbi.

When running in multi-core mode (i.e. using a MulticoreParam with BiocParallel) 
using more than 2 cores, this causes the error:

database disk image is malformed


In a very similar problem:

https://support.bioconductor.org/p/38541/

Adi Tarca and Dan Tenenbaum identified and resolved this problem by ensuring 
that each process has its own unique database connection, i.e. AnnotationDbi is 
not loaded before sending the job to the workers.

This solution was easily realized as this analysis was carried out within a 
script and not a package.

However, within my package, AnnotationDbi is loaded as a dependency of my 
package's imports.

How to resolve this here?
I am not sure whether I perfectly understand the underlying mechanisms, but is 
there a way to make my workers load their own version of AnnotationDbi instead 
of using the one of the parent process?
Or am I supposed to unload all packages depending on AnnotationDbi, and 
AnnotationDbi itself, before sending the job to the workers (and reload all of 
them after the job has finished?)

Thanks a lot,
Ludwig



--
Dr. Ludwig Geistlinger
CUNY School of Public Health

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel