Is this issue wholly addressed by having a URI for the reference? Or is there some subtlety that I am missing here?
i.e. I would expect a minor version of a reference genome to have a different URI from a different minor version of the same major version of the reference genome …. am I naive? I have noticed reference declarations that fall short of my ideal … e.g. ##reference=GRCh37.p5 ##reference="hg19" ##reference=GRCh37 ##reference=GRCh37 ##reference=file:///humgen/1kg/reference/human_g1k_v37.fasta ##reference=file:///humgen/gsa-hpprojects/GATK/data/ucsc.hg19/ucsc.hg19.fasta ##reference=file:///humgen/gsa-hpprojects/GATK/data/ucsc.hg19/ucsc.hg19.fasta My take is that the "hg19" is a bug, and should read ##reference=hg19 and that somewhere I need some heuristics that convert these into URIs … (and which convert file:/// uris into something more useful) Any hints as to how to interpret these would be welcome. Jeremy On Mar 20, 2013, at 3:19 PM, Joachim Baran <joachim.ba...@gmail.com> wrote: > Hello, > > On 20 March 2013 18:09, Jerven Bolleman <m...@jerven.eu> wrote: > So instead of chromosome M you are really talking about assembly X of > a set of reads R mapped via some (variant calling) processes to > reference chromosome C that is also really an assembly of a different > set of reads. > Just to add to Jerven's comment: even when referring to a reference > assembly, it is best to add "Which version?". > > Even when talking about reference genome assemblies, you have multiple > versions (including "patches"). Additionally, when interpreting the genomes, > you will also get different results from various institutes (genes from UCSC > are not the same as Ensembl). > > I think my point here is that chromosomes (or anything, really), has > provenance that needs to be explicitly denoted. > > Best, > Joachim >