Hi Hannes, it does what you want and has no side effects other than that the index works by position only and does not take indels into account
tabix -s 1 -b 2 -e 2 Petr On Mon, 2015-05-18 at 22:19 +0200, Hannes Svardal wrote: > Hi, > > > I have a vcf with all sites produced by GATK. > I use bgzip to compress it and tabix 0.2.5 (r1005) to index it. > (tabix -p vcf filename.gz) > > > When I retrieve a region I do not only get the entries in this region > but also all entries with a position smaller than the desired start > position if the entry represents a deletion and the length of the > reference allele reaches the desired start position. > > > E.g. if I query for 'Chr1:25:45' > > > Chr1 12 . AAAAAAAAACAAAAC A ... > Chr1 19 . AACAAAAC A ... > Chr1 20 . ACAAAAC A,AAAAAC ... > Chr1 23 . AAAC A ... > Chr1 25 . A . ... > ... > ... > > > Is there a way to only get entries for which the pos (column 2) is in > the interval? > > > I can imagine that the current behavior is sometimes desired, but it > is problematic in my case. The current behaviour means that if I use > tabix to split a VCF I will get duplicate entries on joining it again. > > > Could I just solve the problem by using tabix -s 1 -b 2 -e 2 > <filename.gz>, or does -p vcf do anything more sophisticated I should > be aware of? > > > Thanks for your help, > Hannes > > > > > > > -- > Dr. Hannes Svardal > Postdoctoral researcher > Nordborg group > > Gregor Mendel Institute > Dr. Bohr-Gasse 3 > 1030 Vienna, Austria > phone: +436803252197 > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ Samtools-help mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
