Thanks very much Petr. This is very helpful! Best, Zhihao
> On 19 May 2015, at 16:01, Petr Danecek <[email protected]> wrote: > > Correction: the gt[] values are coded and cannot be used as is. The > bcf_gt_allele() and bcf_gt_is_phased() macros can be used to get the 0/1 > etc. values. > > petr > > On Tue, 2015-05-19 at 14:34 +0200, Petr Danecek wrote: >> Hi Zhihao, >> >> the get_format function returns the total number of values, this is how >> it is used: >> >> int ngts = bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr); >> ngts /= nsmpl; >> for (i=0; i<nsmpl; i++) >> { >> for (j=0; j<ngts; j++) >> printf(" %d",gt[i*ngts+j]); // todo: check for missing >> // values and different ploidy >> } >> >> >> One can avoid using the C api completely and do instead >> >> bcftools query -f'%CHROM\t%POS[\t%GT]\n' | ./my-script >> >> Petr >> >> >> On Tue, 2015-05-19 at 12:20 +0100, Zhihao Ding wrote: >>> Dear samtools developers, >>> >>> I’ve been trying to learn htslib for some simple analyses. I’d >>> like to parse 1KG phase3 vcfs to obtain a haplotype matrix >>> consisting of 1s and 0s, for a given region. After reading >>> comments in the header files, below is where I’ve got to. >>> I haven’t got the matrix yet, but I am getting genotypes >>> of samples one by one. >>> >>> Could I ask if I am on the right direction? Are there better >>> ways of doing it? >>> >>> Any help would be very appreciated. >>> >>> Best wishes, >>> Zhihao >>> >>> >>> >>> >>> const char* vcf = “test/test.vcf.gz"; >>> >>> int n = 0; // number of records in file >>> int nsnp = 0; // number of SNPs in file >>> int ngt_arr = 0; >>> int ngt = 0; >>> int *gt = NULL; >>> >>> htsFile * inf = bcf_open(vcf, "r"); >>> if (inf == NULL) { >>> return EXIT_FAILURE; >>> } >>> >>> bcf_hdr_t *hdr = bcf_hdr_read(inf); >>> fprintf(stderr, "File %s contains %i samples\n", vcf, >>> bcf_hdr_nsamples(hdr)); >>> >>> bcf1_t *rec = bcf_init(); >>> >>> while (bcf_read(inf, hdr, rec) == 0) { >>> n++; >>> if (bcf_is_snp(rec)) { >>> nsnp++; >>> } else { >>> continue; >>> } >>> >>> if ( bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr) > 0 >>> ){ >>> for (int i=0; i<bcf_hdr_nsamples(hdr); i++){ >>> printf("chr%s\t%i\t%s\t%s\t%s\n", >>> seqnames[rec->rid], >>> rec->pos, >>> rec->d.allele[0], >>> rec->d.allele[bcf_gt_allele(gt[0])], >>> hdr -> samples[i]); >>> } >>> } >>> } >>> >>> free(gt); >>> bcf_hdr_destroy(hdr); >>> bcf_close(inf); >>> bcf_destroy(rec); >>> >>> >>> ------------------------------------------------------------------------------ >>> One dashboard for servers and applications across Physical-Virtual-Cloud >>> Widest out-of-the-box monitoring support with 50+ applications >>> Performance metrics, stats and reports that give you Actionable Insights >>> Deep dive visibility with transaction tracing using APM Insight. >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >>> _______________________________________________ >>> Samtools-help mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/samtools-help >> >> >> >> > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/samtools-help
