On 18 May 2016, at 18:23, Arti Tandon <[email protected]> wrote:
> I am using bcftools-1.3.1/htslib-1.3.1/tabix* to index the dbNSFP file to be 
> used by the program SnpSift, using the following commands and am getting an 
> error:
> 
> $ (head -n 1 dbNSFP3.2c_variant.chr1 ; cat dbNSFP3.2c_variant.chr* | grep -v 
> "^#" ) > dbNSFP3.2c.txt
> # Compress using block-gzip algorithm
> bgzip dbNSFP3.2c.txt
> # Create tabix index
> tabix -s 1 -b 2 -e 2 dbNSFP3.2c.txt.gz
> The first two steps work fine, but the tabix gives me an error:
> [E::hts_idx_push] unsorted positions
> tbx_index_build failed: dbNSFP3.2c.txt.gz

The dbNSFP files appear to be sorted by position, but in fact they are not.  
For example, in dbNSFP3.2c_variant.chr1 there are the following lines:

1       248918362       G       C       [...snipped further columns]
1       248918362       G       T
1       248918363       A       C
1       248918363       A       G
1       248918363       A       T
1       182709  A       C
1       182709  A       G
1       182709  A       T
1       182710  T       A
1       182710  T       C
1       182710  T       G

So to fix this you'll have to sort the files, e.g., with the Unix sort(1) 
command.  It would also be worth pointing this out to the dbNSFP people, as it 
looks a lot like they're intended to be sorted by position.

I've added to tabix's error message to give you a chance to find these problems:

$ tabix -s 1 -b 2 -e 2 chr1.txt.gz 
[E::hts_idx_push] unsorted positions on sequence #1: 248918363 followed by 
182709

(Note that "sequence #1" here means the first sequence in the file; 
unfortunately the actual sequence names are not available to this part of the 
code.)

    John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to