Hi,
I have a vcf with all sites produced by GATK.
I use bgzip to compress it and tabix 0.2.5 (r1005) to index it.
(tabix -p vcf filename.gz)
When I retrieve a region I do not only get the entries in this region but
also all entries with a position smaller than the desired start position if
the entry represents a deletion and the length of the reference allele
reaches the desired start position.
E.g. if I query for 'Chr1:25:45'
Chr1 12 . AAAAAAAAACAAAAC A ...
Chr1 19 . AACAAAAC A ...
Chr1 20 . ACAAAAC A,AAAAAC ...
Chr1 23 . AAAC A ...
Chr1 25 . A . ...
...
...
Is there a way to only get entries for which the pos (column 2) is in the
interval?
I can imagine that the current behavior is sometimes desired, but it is
problematic in my case. The current behaviour means that if I use tabix to
split a VCF I will get duplicate entries on joining it again.
Could I just solve the problem by using tabix -s 1 -b 2 -e 2 <filename.gz>,
or does -p vcf do anything more sophisticated I should be aware of?
Thanks for your help,
Hannes
--
Dr. Hannes Svardal
Postdoctoral researcher
Nordborg group
Gregor Mendel Institute
Dr. Bohr-Gasse 3
1030 Vienna, Austria
phone: +436803252197
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help