Hi Zhihao,

the get_format function returns the total number of values, this is how
it is used:

int ngts = bcf_get_format_int32(hdr, rec, "GT", &gt, &ngt_arr);
ngts /= nsmpl;
for (i=0; i<nsmpl; i++)
{
   for (j=0; j<ngts; j++) 
       printf(" %d",gt[i*ngts+j]); // todo: check for missing 
                                   // values and different ploidy
}


One can avoid using the C api completely and do instead

bcftools query -f'%CHROM\t%POS[\t%GT]\n' | ./my-script

Petr


On Tue, 2015-05-19 at 12:20 +0100, Zhihao Ding wrote:
> Dear samtools developers, 
> 
> I’ve been trying to learn htslib for some simple analyses. I’d
> like to parse 1KG phase3 vcfs to obtain a haplotype matrix
> consisting of 1s and 0s, for a given region. After reading 
> comments in the header files, below is where I’ve got to. 
> I haven’t got the matrix yet, but I am getting genotypes 
> of samples one by one. 
> 
> Could I ask if I am on the right direction? Are there better 
> ways of doing it? 
> 
> Any help would be very appreciated. 
> 
> Best wishes, 
> Zhihao
> 
> 
> 
> 
>   const char* vcf = “test/test.vcf.gz";
> 
>   int n    = 0;  // number of records in file
>   int nsnp = 0;  // number of SNPs in file
>   int ngt_arr = 0;
>   int ngt     = 0;
>   int *gt     = NULL;
> 
>         htsFile * inf = bcf_open(vcf, "r");
>         if (inf == NULL) {
>           return EXIT_FAILURE;
>         }
> 
>         bcf_hdr_t *hdr = bcf_hdr_read(inf);
>         fprintf(stderr, "File %s contains %i samples\n", vcf, 
> bcf_hdr_nsamples(hdr));
> 
>         bcf1_t *rec = bcf_init();
>       
>         while (bcf_read(inf, hdr, rec) == 0) {
>                 n++;
>                 if (bcf_is_snp(rec)) {
>                         nsnp++;
>                 } else {
>                         continue;
>                 }
> 
>                 if ( bcf_get_format_int32(hdr, rec, "GT", &gt, &ngt_arr) > 0 
> ){
>                   for (int i=0; i<bcf_hdr_nsamples(hdr); i++){
>                     printf("chr%s\t%i\t%s\t%s\t%s\n",
>                            seqnames[rec->rid],
>                            rec->pos,
>                            rec->d.allele[0],
>                            rec->d.allele[bcf_gt_allele(gt[0])],
>                            hdr -> samples[i]);
>                   }
>                 }
>         }
>         
>         free(gt);
>         bcf_hdr_destroy(hdr);
>         bcf_close(inf);
>         bcf_destroy(rec);
> 
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Samtools-help mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/samtools-help




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to