Dear samtools developers,
I’ve been trying to learn htslib for some simple analyses. I’d
like to parse 1KG phase3 vcfs to obtain a haplotype matrix
consisting of 1s and 0s, for a given region. After reading
comments in the header files, below is where I’ve got to.
I haven’t got the matrix yet, but I am getting genotypes
of samples one by one.
Could I ask if I am on the right direction? Are there better
ways of doing it?
Any help would be very appreciated.
Best wishes,
Zhihao
const char* vcf = “test/test.vcf.gz";
int n = 0; // number of records in file
int nsnp = 0; // number of SNPs in file
int ngt_arr = 0;
int ngt = 0;
int *gt = NULL;
htsFile * inf = bcf_open(vcf, "r");
if (inf == NULL) {
return EXIT_FAILURE;
}
bcf_hdr_t *hdr = bcf_hdr_read(inf);
fprintf(stderr, "File %s contains %i samples\n", vcf,
bcf_hdr_nsamples(hdr));
bcf1_t *rec = bcf_init();
while (bcf_read(inf, hdr, rec) == 0) {
n++;
if (bcf_is_snp(rec)) {
nsnp++;
} else {
continue;
}
if ( bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr) > 0 ){
for (int i=0; i<bcf_hdr_nsamples(hdr); i++){
printf("chr%s\t%i\t%s\t%s\t%s\n",
seqnames[rec->rid],
rec->pos,
rec->d.allele[0],
rec->d.allele[bcf_gt_allele(gt[0])],
hdr -> samples[i]);
}
}
}
free(gt);
bcf_hdr_destroy(hdr);
bcf_close(inf);
bcf_destroy(rec);
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help