Correction: the gt[] values are coded and cannot be used as is. The
bcf_gt_allele() and bcf_gt_is_phased() macros can be used to get the 0/1
etc. values.
petr
On Tue, 2015-05-19 at 14:34 +0200, Petr Danecek wrote:
> Hi Zhihao,
>
> the get_format function returns the total number of values, this is how
> it is used:
>
> int ngts = bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr);
> ngts /= nsmpl;
> for (i=0; i<nsmpl; i++)
> {
> for (j=0; j<ngts; j++)
> printf(" %d",gt[i*ngts+j]); // todo: check for missing
> // values and different ploidy
> }
>
>
> One can avoid using the C api completely and do instead
>
> bcftools query -f'%CHROM\t%POS[\t%GT]\n' | ./my-script
>
> Petr
>
>
> On Tue, 2015-05-19 at 12:20 +0100, Zhihao Ding wrote:
> > Dear samtools developers,
> >
> > I’ve been trying to learn htslib for some simple analyses. I’d
> > like to parse 1KG phase3 vcfs to obtain a haplotype matrix
> > consisting of 1s and 0s, for a given region. After reading
> > comments in the header files, below is where I’ve got to.
> > I haven’t got the matrix yet, but I am getting genotypes
> > of samples one by one.
> >
> > Could I ask if I am on the right direction? Are there better
> > ways of doing it?
> >
> > Any help would be very appreciated.
> >
> > Best wishes,
> > Zhihao
> >
> >
> >
> >
> > const char* vcf = “test/test.vcf.gz";
> >
> > int n = 0; // number of records in file
> > int nsnp = 0; // number of SNPs in file
> > int ngt_arr = 0;
> > int ngt = 0;
> > int *gt = NULL;
> >
> > htsFile * inf = bcf_open(vcf, "r");
> > if (inf == NULL) {
> > return EXIT_FAILURE;
> > }
> >
> > bcf_hdr_t *hdr = bcf_hdr_read(inf);
> > fprintf(stderr, "File %s contains %i samples\n", vcf,
> > bcf_hdr_nsamples(hdr));
> >
> > bcf1_t *rec = bcf_init();
> >
> > while (bcf_read(inf, hdr, rec) == 0) {
> > n++;
> > if (bcf_is_snp(rec)) {
> > nsnp++;
> > } else {
> > continue;
> > }
> >
> > if ( bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr) >
> > 0 ){
> > for (int i=0; i<bcf_hdr_nsamples(hdr); i++){
> > printf("chr%s\t%i\t%s\t%s\t%s\n",
> > seqnames[rec->rid],
> > rec->pos,
> > rec->d.allele[0],
> > rec->d.allele[bcf_gt_allele(gt[0])],
> > hdr -> samples[i]);
> > }
> > }
> > }
> >
> > free(gt);
> > bcf_hdr_destroy(hdr);
> > bcf_close(inf);
> > bcf_destroy(rec);
> >
> >
> > ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Samtools-help mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/samtools-help
>
>
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help