Hi Zhihao,
the get_format function returns the total number of values, this is how
it is used:
int ngts = bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr);
ngts /= nsmpl;
for (i=0; i<nsmpl; i++)
{
for (j=0; j<ngts; j++)
printf(" %d",gt[i*ngts+j]); // todo: check for missing
// values and different ploidy
}
One can avoid using the C api completely and do instead
bcftools query -f'%CHROM\t%POS[\t%GT]\n' | ./my-script
Petr
On Tue, 2015-05-19 at 12:20 +0100, Zhihao Ding wrote:
> Dear samtools developers,
>
> I’ve been trying to learn htslib for some simple analyses. I’d
> like to parse 1KG phase3 vcfs to obtain a haplotype matrix
> consisting of 1s and 0s, for a given region. After reading
> comments in the header files, below is where I’ve got to.
> I haven’t got the matrix yet, but I am getting genotypes
> of samples one by one.
>
> Could I ask if I am on the right direction? Are there better
> ways of doing it?
>
> Any help would be very appreciated.
>
> Best wishes,
> Zhihao
>
>
>
>
> const char* vcf = “test/test.vcf.gz";
>
> int n = 0; // number of records in file
> int nsnp = 0; // number of SNPs in file
> int ngt_arr = 0;
> int ngt = 0;
> int *gt = NULL;
>
> htsFile * inf = bcf_open(vcf, "r");
> if (inf == NULL) {
> return EXIT_FAILURE;
> }
>
> bcf_hdr_t *hdr = bcf_hdr_read(inf);
> fprintf(stderr, "File %s contains %i samples\n", vcf,
> bcf_hdr_nsamples(hdr));
>
> bcf1_t *rec = bcf_init();
>
> while (bcf_read(inf, hdr, rec) == 0) {
> n++;
> if (bcf_is_snp(rec)) {
> nsnp++;
> } else {
> continue;
> }
>
> if ( bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr) > 0
> ){
> for (int i=0; i<bcf_hdr_nsamples(hdr); i++){
> printf("chr%s\t%i\t%s\t%s\t%s\n",
> seqnames[rec->rid],
> rec->pos,
> rec->d.allele[0],
> rec->d.allele[bcf_gt_allele(gt[0])],
> hdr -> samples[i]);
> }
> }
> }
>
> free(gt);
> bcf_hdr_destroy(hdr);
> bcf_close(inf);
> bcf_destroy(rec);
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Samtools-help mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/samtools-help
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help