[Bioc-devel] memory inefficiency problem of building MSPC packages

Jurat Shayidin Mon, 01 Aug 2016 23:55:05 -0700

Bioc-devel:
I haven been developing Bioconductor Package for multiple sample peak
calling, and all unit test for my packages is done efficiently. However, I
have one minor problem that cause memory inefficiency when building the
packages in my machines. To get straight, I am going to find overlap for
multiple GRanges objects simultaneously and proceed joint analysis for
multiple ChIP-Seq sample to rescue weak enriched region by helping with
co-localized evidence of multiple GRanges . After I reviewed all my source
code, indeed some paired overlap repeated many times that cause unnecessary
memory usage.
This is my custom function that I developed, it works perfectly in my
current workflow, but cause memory inefficiency problem.


grs <- GRangeslist(gr1, gr2, gr3, gr4, ...)

overlap <- function(grs, idx=1L, FUN=which.min) {
  chosen <- grs[[idx]]
  que.hit <- as(findOverlaps(chosen), "List")
  sup.hit <- lapply(grs[-idx], function(ele_) {
    ans <- as(findOverlaps(chosen, ele_), "List")
    out.idx0 <- as(FUN(extractList(ele_$p.value, ans)), "List")
    out.idx0 <- out.idx0[!is.na(out.idx0)]
    ans <- ans[out.idx0]
  })
  res <- c(list(que.hit), sup.hit)
  return(res)
}

How can I optimize my custom function without memory inefficiency? How can
I get rid of repeated overlapped paired GRanges? How can I efficiently
solve this issue? Can anyone propose possible ideas to get through this
problem? Thanks a lot



-- 
Jurat Shahidin
Ph.D. candidate
Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] memory inefficiency problem of building MSPC packages

Reply via email to