OK, I had totally missed that I needed each tag on a different column in the tab file.
Thanks again for your time and attention. Kind regards, Max Massimiliano S. Tagliamonte Graduate Student University of Florida College of Veterinary Medicine Department of Infectious Diseases and Pathology ________________________________________ From: Petr Danecek <[email protected]> Sent: Friday, November 6, 2015 9:50 AM To: Tagliamonte,Massimiliano S Cc: John Marshall; [email protected] Subject: Re: [Samtools-help] bcftools annotate could not parse header line Hit enter too quickly: yes, each of the annotations will be in a separate column. For example: ##INFO=<ID=MyFeature1,Number=1,Type=STRING,Description="PF3D7_0100100"> ##INFO=<ID=MyFeature2,Number=1,Type=STRING,Description="PF3D7_0100100"> annots.tab: CHROM POS MyFeature1 MyFeature2 1 1 SomeText1 SomeText2 and the INFO column of the VCF will be annotated like this: MyFeature1=SomeText1;MyFeature2=SomeText2 Petr On Fri, 2015-11-06 at 15:48 +0100, Petr Danecek wrote: > Your header should look like this: > > ##INFO=<ID=MyFeature,Number=1,Type=STRING,Description="PF3D7_0100100"> > > rather than this: > > ##FEATURE=<ID=STRING_TAG,Number=1,Type=STRING,Description="PF3D7_0100100"> > > After the annotation is added, it will appear in the INFO column as > MyFeature=SomeString > > Petr > > > On Fri, 2015-11-06 at 14:45 +0000, Tagliamonte,Massimiliano S wrote: > > Thank you for the follow up, I'm still learning my way through the SNP > > analysis. > > > > I checked again the vcf specifications and the bcftools annotate > > instruction. At this point I am not sure I understand: does each tag > > (i.e. WebId, LocusTag, etc) need to be in a different column of my > > tab-delimited file? > > > Regards, > > Max > > > > Massimiliano S. Tagliamonte > > Graduate Student > > University of Florida > > College of Veterinary Medicine > > Department of Infectious Diseases and Pathology > > > > ________________________________________ > > From: Petr Danecek <[email protected]> > > Sent: Friday, November 6, 2015 5:35 AM > > To: Tagliamonte,Massimiliano S > > Cc: John Marshall; [email protected] > > Subject: Re: [Samtools-help] bcftools annotate could not parse header line > > > > Hi Massimiliano, > > > > your FEATURE tag is defined as neither INFO nor FORMAT tag, please check > > the VCF specification > > http://samtools.github.io/hts-specs/ > > > > Best wishes, > > Petr > > > > > > On Thu, 2015-11-05 at 15:58 +0000, Tagliamonte,Massimiliano S wrote: > > > OK, sorry to bother again. > > > > > > I replaced all the underscores, but now I am getting 'The tag "FEATURE" > > > is not defined in my_file.tab.gz' > > > > > > This is my command: > > > > > > bcftools annotate -a my_file.tab.gz \ > > > -c CHROM,FROM,TO,FEATURE \ > > > -h bcftools_annots.hdr \ > > > -O v -o ./filtering/my_snps_bcftools_annotated.vcf \ > > > my_snps.vcf.gz > > > > > > The tab file has no header, and only 4 columns (chrom name, gene start , > > > gene end, annotation ('FEATURE') column. I have checked the instructions > > > on http://www.htslib.org/doc/bcftools.html#annotate but I am not sure > > > what I am doing wrong. This is the tab file first line: > > > > > > Pf3D7_01_v3 29510 37126 > > > ID=PF3D7_0100100;Name=PF3D7_0100100;description=erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29;size=7617;WebId=PF3D7_0100100;LocusTag=PF3D7_0100100;size=7617;Alias=VAR-UPSB1,124505645,MAL1P4.01,VAR,PF3D7_0100100,7670005,PFA0005w > > > > > > Thanks again for your time and kind attention, > > > Max > > > > > > > > > Massimiliano S. Tagliamonte > > > Graduate Student > > > University of Florida > > > College of Veterinary Medicine > > > Department of Infectious Diseases and Pathology > > > > > > > > > ________________________________________ > > > From: Tagliamonte,Massimiliano S > > > Sent: Thursday, November 5, 2015 9:50 AM > > > To: John Marshall > > > Cc: [email protected] > > > Subject: Re: [Samtools-help] bcftools annotate could not parse header line > > > > > > Great, I'll replace the underscores then. > > > > > > Thanks for your help, > > > Max > > > > > > Massimiliano S. Tagliamonte > > > Graduate Student > > > University of Florida > > > College of Veterinary Medicine > > > Department of Infectious Diseases and Pathology > > > > > > ________________________________________ > > > From: John Marshall <[email protected]> > > > Sent: Thursday, November 5, 2015 6:47 AM > > > To: Tagliamonte,Massimiliano S > > > Cc: [email protected] > > > Subject: Re: [Samtools-help] bcftools annotate could not parse header line > > > > > > On 4 Nov 2015, at 21:25, Tagliamonte,Massimiliano S > > > <[email protected]> wrote: > > > > I am trying to add an annotation column to my vcf file, after calling > > > > variants with the Samtools 1.x pipeline. I am using bcftools annotate, > > > > but I keep getting the same error regarding one of the headers: > > > > Could not parse the header line: > > > > "##FEATURE=<web_id=STRING_TAG,Number=1,Type=STRING,Description="PF3D7_0100100">" > > > > > > It's complaining about the underscore in your "web_id" key. Prior to VCF > > > v4.3, the spec gave no hints about what characters might be in INFO et al > > > field keys [1], and somewhat unfortunately htslib/bcftools allowed for > > > only letters and digits. This has been relaxed on the develop branch in > > > GitHub [2] and underscores and (non-leading) dots will be accepted by the > > > next bcftools release. > > > > > > In the meantime, you could either build htslib and bcftools from the > > > development branches in their GitHub repositories, or remove the > > > underscores from your web_id and locus_tag to get this to work with > > > bcftools 1.2. > > > > > > John > > > > > > [1] In the v4.3 spec, see ยง1.6.1/8 > > > [2] > > > https://github.com/samtools/htslib/commit/30fb9eee41953958923c56f7ea0af5a5b0376b94 > > > > > > -- > > > The Wellcome Trust Sanger Institute is operated by Genome Research > > > Limited, a charity registered in England with number 1021457 and a > > > company registered in England with number 2742969, whose registered > > > office is 215 Euston Road, London, NW1 2BE. > > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > > > Samtools-help mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
