On 4 Feb 2021, at 10:14, Jonathan Aquilina via Beowulf 
<[email protected]<mailto:[email protected]>> wrote:

I am curious though to chunk out such large data is something like hadoop/HBase 
and the like of those platforms, are those whats being used?


It’s a combination of our home-grown sequencing pipeline which we use across 
the board, and then a specific COG-UK analysis of the genomes themselves.  This 
pipeline is common to all consortium members who are contributing sequence 
data.  It’s a Nextflow pipeline, and the code is here:

https://github.com/connor-lab/ncov2019-artic-nf

Being nextflow, you can run it on anything for which nextflow has a backend 
scheduler.   It supports data from both Illumina and Oxford Nanopore sequencers.

Tim



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to