On 4 Feb 2021, at 10:14, Jonathan Aquilina via Beowulf
<[email protected]<mailto:[email protected]>> wrote:
I am curious though to chunk out such large data is something like hadoop/HBase
and the like of those platforms, are those whats being used?
It’s a combination of our home-grown sequencing pipeline which we use across
the board, and then a specific COG-UK analysis of the genomes themselves. This
pipeline is common to all consortium members who are contributing sequence
data. It’s a Nextflow pipeline, and the code is here:
https://github.com/connor-lab/ncov2019-artic-nf
Being nextflow, you can run it on anything for which nextflow has a backend
scheduler. It supports data from both Illumina and Oxford Nanopore sequencers.
Tim
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf