On 18 May 2016, at 18:23, Arti Tandon <[email protected]> wrote:
> I am using bcftools-1.3.1/htslib-1.3.1/tabix* to index the dbNSFP file to be
> used by the program SnpSift, using the following commands and am getting an
> error:
>
> $ (head -n 1 dbNSFP3.2c_variant.chr1 ; cat dbNSFP3.2c_variant.chr* | grep -v
> "^#" ) > dbNSFP3.2c.txt
> # Compress using block-gzip algorithm
> bgzip dbNSFP3.2c.txt
> # Create tabix index
> tabix -s 1 -b 2 -e 2 dbNSFP3.2c.txt.gz
> The first two steps work fine, but the tabix gives me an error:
> [E::hts_idx_push] unsorted positions
> tbx_index_build failed: dbNSFP3.2c.txt.gz
The dbNSFP files appear to be sorted by position, but in fact they are not.
For example, in dbNSFP3.2c_variant.chr1 there are the following lines:
1 248918362 G C [...snipped further columns]
1 248918362 G T
1 248918363 A C
1 248918363 A G
1 248918363 A T
1 182709 A C
1 182709 A G
1 182709 A T
1 182710 T A
1 182710 T C
1 182710 T G
So to fix this you'll have to sort the files, e.g., with the Unix sort(1)
command. It would also be worth pointing this out to the dbNSFP people, as it
looks a lot like they're intended to be sorted by position.
I've added to tabix's error message to give you a chance to find these problems:
$ tabix -s 1 -b 2 -e 2 chr1.txt.gz
[E::hts_idx_push] unsorted positions on sequence #1: 248918363 followed by
182709
(Note that "sequence #1" here means the first sequence in the file;
unfortunately the actual sequence names are not available to this part of the
code.)
John
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help