Hi,

I have a vcf with all sites produced by GATK.
I use bgzip to compress it and tabix 0.2.5 (r1005) to index it.
(tabix -p vcf filename.gz)

When I retrieve a region I do not only get the entries in this region but
also all entries with a position smaller than the desired start position if
the entry represents a deletion and the length of the reference allele
reaches the desired start position.

E.g. if I query for 'Chr1:25:45'

Chr1   12 .       AAAAAAAAACAAAAC A      ...
Chr1   19 .       AACAAAAC        A     ...
Chr1   20 .       ACAAAAC A,AAAAAC       ...
Chr1   23 .       AAAC    A     ...
Chr1   25 .       A       .      ...
...
...

Is there a way to only get entries for which the pos (column 2) is in the
interval?

I can imagine that the current behavior is sometimes desired, but it is
problematic in my case. The current behaviour means that if I use tabix to
split a VCF I will get duplicate entries on joining it again.

Could I just solve the problem by using tabix -s 1 -b 2 -e 2 <filename.gz>,
 or does -p vcf do anything more sophisticated I should be aware of?

Thanks for your help,
Hannes



-- 
Dr. Hannes Svardal
Postdoctoral researcher
Nordborg group

Gregor Mendel Institute
Dr. Bohr-Gasse 3
1030 Vienna, Austria
phone: +436803252197
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to