Hi Hannes,

sorry for misleading you. I believed it used to work, but trying with a
few tabix versions, it does not. Unfortunately, at the moment I can't
offer a solution other than clumsy piping through awk (awk '$2>=25'). We
may want to add an option for this in bcftools and tabix, so that the
user can control this behaviour.

Petr


On Tue, 2015-05-19 at 14:46 +0200, Hannes Svardal wrote:
> Hi Petr,
> 
> 
> Thanks for the reply. 
> Strangely, I tried this but it gives me exactly the same result, i.e.,
> overlapping indels are again included even if the position is outside
> the interval.
> 
> 
> How can this be? Does tabix somehow detect that this is a VCF and
> overrides the tabix -s 1 -b 2 -e 2 setting?
> 
> 
> I do:
> rm filename.vcf.gz.tbi
> tabix -s 1 -b 2 -e 2 filename.vcf.gz
> tabix filename.vcf.gz 'CAE1:25-45'
> 
> 
> And get positions starting with CAE1 12
> 
> 
> Any ideas?
> 
> 
> Thanks,
> Hannes
> 
> 
> 
> On 19 May 2015 at 14:34, Petr Danecek <[email protected]> wrote:
>         Hi Hannes,
>         
>         it does what you want and has no side effects other than that
>         the index
>         works by position only and does not take indels into account
>         
>         tabix -s 1 -b 2 -e 2
>         
>         Petr
>         
>         On Mon, 2015-05-18 at 22:19 +0200, Hannes Svardal wrote:
>         > Hi,
>         >
>         >
>         > I have a vcf with all sites produced by GATK.
>         > I use bgzip to compress it and tabix 0.2.5 (r1005) to index
>         it.
>         > (tabix -p vcf filename.gz)
>         >
>         >
>         > When I retrieve a region I do not only get the entries in
>         this region
>         > but also all entries with a position smaller than the
>         desired start
>         > position if the entry represents a deletion and the length
>         of the
>         > reference allele reaches the desired start position.
>         >
>         >
>         > E.g. if I query for 'Chr1:25:45'
>         >
>         >
>         > Chr1   12 .       AAAAAAAAACAAAAC A      ...
>         > Chr1   19 .       AACAAAAC        A     ...
>         > Chr1   20 .       ACAAAAC A,AAAAAC       ...
>         > Chr1   23 .       AAAC    A     ...
>         > Chr1   25 .       A       .      ...
>         > ...
>         > ...
>         >
>         >
>         > Is there a way to only get entries for which the pos (column
>         2) is in
>         > the interval?
>         >
>         >
>         > I can imagine that the current behavior is sometimes
>         desired, but it
>         > is problematic in my case. The current behaviour means that
>         if I use
>         > tabix to split a VCF I will get duplicate entries on joining
>         it again.
>         >
>         >
>         > Could I just solve the problem by using tabix -s 1 -b 2 -e 2
>         > <filename.gz>,  or does -p vcf do anything more
>         sophisticated I should
>         > be aware of?
>         >
>         >
>         > Thanks for your help,
>         > Hannes
>         >
>         >
>         >
>         >
>         >
>         >
>         > --
>         > Dr. Hannes Svardal
>         > Postdoctoral researcher
>         > Nordborg group
>         >
>         > Gregor Mendel Institute
>         > Dr. Bohr-Gasse 3
>         > 1030 Vienna, Austria
>         > phone: +436803252197
>         >
>         >
>         
>         >
>         
> ------------------------------------------------------------------------------
>         > One dashboard for servers and applications across
>         Physical-Virtual-Cloud
>         > Widest out-of-the-box monitoring support with 50+
>         applications
>         > Performance metrics, stats and reports that give you
>         Actionable Insights
>         > Deep dive visibility with transaction tracing using APM
>         Insight.
>         > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>         > _______________________________________________
>         Samtools-help mailing list [email protected]
>         https://lists.sourceforge.net/lists/listinfo/samtools-help
>         
>         
>         
>         
>         --
>          The Wellcome Trust Sanger Institute is operated by Genome
>         Research
>          Limited, a charity registered in England with number 1021457
>         and a
>          company registered in England with number 2742969, whose
>         registered
>          office is 215 Euston Road, London, NW1 2BE.
> 
> 




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to