If the BAM files are each processed independently, and each processing task 
takes a while, then it is probably 'good enough' to use R-level parallel 
evaluation using BiocParallel (currently the recommendation for Bioconductor 
packages) or other evaluation framework. Also, presumably you will use Rhtslib, 
which provides C-level access to the hts library. This will requiring writing C 
/ C++ code to interface between R and the hts library, and will of course be a 
significant underataking.

It might be worth outlining in a bit more detail what your task is and how (not 
too much detail!) you've tried to implement this in Rsamtools.

Martin Morgan

On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko" 
<bioc-devel-boun...@r-project.org on behalf of oleksii.nikolaie...@gmail.com> 
wrote:

    Dear Bioc team,
    I'd like to ask for your advice on the parallelization within a Bioc
    package. Please point me to a better place if this mailing list is not
    appropriate.
    After a bit of thinking I decided that I'd like to parallelize processing
    at the level of C++ code. Would you strongly recommend not to and use an R
    approach instead (e.g. "future")?
    If parallel C++ is ok, what would be the best solution for all major OSs?
    My initial choice was OpenMP, but then it seems that Apple has something
    against it (https://mac.r-project.org/openmp/). My own dev environment is
    mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.

    (On the actual task: loading and specific processing of very large BAM
    files, ideally significantly faster than by means of Rsamtools as a backend)

    Best,
    Oleksii Nikolaienko

        [[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to