Introduction to Linux workflows for biologists (IBUL03)

https://www.prinformatics.com/course/introduction-to-linux-workflows-for-
biologists-ibul03/

This course will run from October 1st -5th October 2018 in Glasgow city 
centre and will be delivered by Dr Martin Jones.

Course Overview:
Most high-throughput bioinformatics work these days takes place on the 
Linux command line. The programs which do the majority of the computational 
heavy lifting — genome assemblers, read mappers, and annotation tools — are 
designed to work best when used with a command-line interface. Because the 
command line can be an intimidating environment, many biologists learn the 
bare minimum needed to get their analysis tools working. This means that 
they miss out on the power of Linux to customize their environment and 
automate many parts of the bioinformatics workflow. This course will 
introduce the Linux command line environment from scratch and teach 
students how to make the most of its tools to achieve a high level of 
productivity when working with biological data.

Monday 1st
Module 1: The design of Linux.
In the first session we briefly cover the design of Linux: how is it 
different from Windows/OSX and how is it best used? We’ll then jump 
straight onto the command line and learn about the layout of the Linux file 
system and how to navigate it. We’ll describe Linux’s file permission 
system (which often trips up beginners), how paths work, and how we 
actually run programs on the command line. We’ll learn a few tricks for 
using the command line more efficiently, and how to deal with programs that 
are misbehaving. We’ll finish this session by looking at the built in help 
system and how to read and interpret manual pages.

Module 2: System management.
We’ll first look at a few command line tools for monitoring the status of 
the system and keeping track of what’s happening to processor power, 
memory, and disk space. We’ll go over the process of installing new 
software from the built in repositories (which is easy) and from source 
code downloads (which is trickier). We’ll also introduce some tools for 
benchmarking software (measuring the time/memory requirements of processing 
large datasets).

Tuesday 2nd
Module 3: Manipulating tabular data.
Many data types we want to work with in bioinformatics are stored as 
tabular plain text files, and here we learn all about manipulating tabular 
data on the command line. We’ll start with simple things like extracting 
columns, filtering and sorting, searching for text before moving on to more 
complex tasks like searching for duplicated values, summarizing large 
files, and combining simple tools into long commands.

Module 4: Constructing pipelines.
In this session we will look at the various tools Linux has for 
constructing pipelines out of individual commands. Aliases, shell 
redirection, pipes, and shell scripting will all be introduced here. We’ll 
also look at a couple of specific tools to help with running tools on 
multiple processors, and for monitoring the progress of long running tasks.
Wednesday 3rd – Classes from 09:00 to 17:00

Module 5: EMBOSS.
EMBOSS is a suite of bioinformatics command-line tools explicitly designed 
to work in the Linux paradigm. We’ll get an overview of the different 
sequence data formats that we might expect to work with, and put what we 
learned about shell scripting to biological use by building a pipeline to 
compare codon usage across two collections of DNA sequences.

Module 6: – Using a Linux server.
Often in bioinformatics we’ll be working on a Linux server rather than our 
own computer— typically because we need access to more computing power, or 
to specialized tools and datasets. In this session we’ll learn how to 
connect to a Linux server and how to manage sessions. We’ll also consider 
the various ways of moving data to and from a server from your own 
computer, and finish with a discussion of the considerations we have to 
make when working on a shared computer.

Thursday 4th
Module 7: Combining methods.
In the next two sessions — i.e. one full day — we’ll put everything we have 
learned together and implement a workflow for next-gen sequence analysis. 
In this first session we’ll carry out quality control on some paired-end 
Illumina data and map these reads to a reference genome. We’ll then look at 
various approaches to automating this pipeline, allowing us to quickly do 
the same for a second dataset.

Module 8: Combining methods.
The second part of the next-gen workflow is to call variants to identify 
SNPs between our two samples and the reference genome. We’ll look at the 
VCF file format and figure out how to filter SNPs for read coverage and 
quality. By counting the number of SNPs between each sample and the 
reference we will try to figure out something about the biology of the two 
samples. We’ll attempt to automate this analysis in various ways so that we 
could easily repeat the pipeline for additional samples.

Friday 5th
Module 9: Customization.
Part of the Linux design is that everything can be customized. This can be 
intimidating at first but, given that bioinformatics work is often fairly 
repetitive, can be used to good effect. Here we’ll learn about environment 
variables, custom prompts, soft links, and ssh configuration —  a 
collection of tools with modest capabilities, but which together can make 
life on the command line much more pleasant. In this last session there 
will also be time to continue working on the next-gen sequencing pipeline.
The afternoon of Friday 19th is reserved for finishing off the next-gen 
workflow exercise, working on your own datasets, or leaving early for 
travel.

Email oliverhoo...@prinformatics.com with any questions

Check out our sister sites,
www.PRstatistics.com (Ecology and Life Sciences)
www.PRinformatics.com (Bioinformatics and data science)
www.PSstatsistics.com (Behaviour and cognition) 

Upcoming courses

1.      April 9th – 13th 2018 
NETWORK ANAYLSIS FOR ECOLOGISTS USING R (NTWA02
Glasgow, Scotland, Dr. Marco Scotti   
www.prstatistics.com/course/network-analysis-ecologists-ntwa02/

2.      April 16th – 20th 2018
INTRODUCTION TO STATISTICAL MODELLING FOR PSYCHOLOGISTS USING R (IPSY01)
Glasgow, Scotland, Dr. Dale Barr, Dr Luc Bussierre   
http://www.psstatistics.com/course/introduction-to-statistics-using-r-for-
psychologists-ipsy01/

3.      April 23rd – 27th 2018
MULTIVARIATE ANALYSIS OF ECOLOGICAL COMMUNITIES USING THE VEGAN PACKAGE 
(VGNR01)
Glasgow, Scotland, Dr. Peter Solymos, Dr. Guillaume Blanchet             
www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-
in-r-with-the-vegan-package-vgnr01/

4.      April 30th – 4th May 2018
QUANTITATIVE GEOGRAPHIC ECOLOGY: MODELING GENOMES, NICHES, AND COMMUNITIES 
(QGER01)
Glasgow, Scotland, Dr. Dan Warren, Dr. Matt Fitzpatrick
www.prstatistics.com/course/quantitative-geographic-ecology-using-r-
modelling-genomes-niches-and-communities-qger01/

5.      May 7th – 11th 2018 ADVANCES IN MULTIVARIATE ANALYSIS OF SPATIAL 
ECOLOGICAL DATA USING R (MVSP02)
CANADA (QUEBEC), Prof. Pierre Legendre, Dr. Guillaume Blanchet
www.prstatistics.com/course/advances-in-spatial-analysis-of-multivariate-
ecological-data-theory-and-practice-mvsp03/
6.      May 14th - 18th 2018
INTRODUCTION TO MIXED (HIERARCHICAL) MODELS FOR BIOLOGISTS (IMBR01)
CANADA (QUEBEC), Prof Subhash Lele 
www.prstatistics.com/course/introduction-to-mixed-hierarchical-models-for-
biologists-using-r-imbr01/

7.      May 21st - 25th 2018
INTRODUCTION TO PYTHON FOR BIOLOGISTS (IPYB05)
SCENE, Scotland, Dr. Martin Jones
http://www.prinformatics.com/course/introduction-to-python-for-biologists-
ipyb05/

8.      May 21st - 25th 2018
INTRODUCTION TO REMOTE SENISNG AND GIS FOR ECOLOGICAL APPLICATIONS (IRMS01)
Glasgow, Scotland, Prof. Duccio Rocchini, Dr. Luca Delucchi
www.prinformatics.com/course/introduction-to-remote-sensing-and-gis-for-
ecological-applications-irms01/

9.      May 28th – 31st 2018
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR (SIMM04)
CANADA (QUEBEC) Dr. Andrew Parnell, Dr. Andrew Jackson 
www.prstatistics.com/course/stable-isotope-mixing-models-using-r-simm04/

10.     May 28th – June 1st 2018
ADVANCED PYTHON FOR BIOLOGISTS (APYB02)
SCENE, Scotland, Dr. Martin Jones
www.prinformatics.com/course/advanced-python-biologists-apyb02/

11.     June 12th - 15th 2018
SPECIES DISTRIBUTION MODELLING (DBMR01)
Myuna Bay sport and recreation, Australia, Prof. Jane Elith, Dr. Gurutzeta 
Guillera
www.prstatistics.com/course/species-distribution-models-using-r-sdmr01/

12.     June 18th – 22nd 2018
STRUCTURAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS 
USING R (SEMR02)
Myuna Bay sport and recreation, Australia, Dr. Jon Lefcheck
www.prstatistics.com/course/structural-equation-modelling-for-ecologists-
and-evolutionary-biologists-semr02/

13.     June 25th – 29th 2018
SPECIES DISTRIBUTION/OCCUPANCY MODELLING USING R (OCCU01)
Glasgow, Scotland, Dr. Darryl McKenzie
www.prstatistics.com/course/species-distributionoccupancy-modelling-using-r-
occu01/

14.     July 2nd - 5th 2018
SOCIAL NETWORK ANALYSIS FOR BEHAVIOURAL SCIENTISTS USING R (SNAR01)
Glasgow, Scotland, Prof James Curley
http://www.psstatistics.com/course/social-network-analysis-for-behavioral-
scientists-snar01/

15.     July 8th – 12th 2018
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R (MBMV02)
Glasgow, Scotland, Prof David Warton
www.prstatistics.com/course/model-base-multivariate-analysis-of-abundance-
data-using-r-mbmv02/

16.     July 16th – 20th 2018
PRECISION MEDICINE BIOINFORMATICS: FROM RAW GENOME AND TRANSCRIPTOME DATA 
TO CLINICAL INTERPRETATION (PMBI01)
Glasgow, Scotland, Dr Malachi Griffith, Dr. Obi Griffith
www.prinformatics.com/course/precision-medicine-bioinformatics-from-raw-
genome-and-transcriptome-data-to-clinical-interpretation-pmbi01/

17.     July 23rd – 27th 2018
EUKARYOTIC METABARCODING (EUKB01)
Glasgow, Scotland, Dr. Owen Wangensteen
http://www.prinformatics.com/course/eukaryotic-metabarcoding-eukb01/

18.     October 8th – 12th 2018
INTRODUCTION TO SPATIAL ANALYSIS OF ECOLOGICAL DATA USING R (ISAE01)
Glasgow, Scotland, Prof. Subhash Lele
https://www.prstatistics.com/course/introduction-to-spatial-analysis-of-
ecological-data-using-r-isae01/

19.     October 15th – 19th 2018
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS (ABME
Glasgow, Scotland, Dr. Matt Denwood, Emma Howard
http://www.prstatistics.com/course/applied-bayesian-modelling-ecologists-
epidemiologists-abme04/

20.     October 29th – November 2nd 2018
PHYLOGENETIC COMPARATIVE METHODS FOR STUDYING DIVERSIFICATION AND 
PHENOTYPIC EVOLUTION (PCME01)
Glasgow, Scotland, Prof. Subhash Lele
Dr. Antigoni Kaliontzopoulou
https://www.prstatistics.com/course/phylogenetic-comparative-methods-for-
studying-diversification-and-phenotypic-evolution-pcme01/

21.     November 26th – 30th 2018
FUNCTIONAL ECOLOGY FROM ORGANISM TO ECOSYSTEM: THEORY AND COMPUTATION (FEER
Glasgow, Scotland, Dr. Francesco de Bello, Dr. Lars Götzenberger, Dr. 
Carlos Carmona
http://www.prstatistics.com/course/functional-ecology-from-organism-to-
ecosystem-theory-and-computation-feer01/

22.     February 2018 TBC
MOVEMENT ECOLOGY (MOVE02)
Margam Discovery Centre, Wales, Dr Luca Borger, Dr Ronny Wilson, Dr 
Jonathan Potts
www.prstatistics.com/course/movement-ecology-move01/

Reply via email to