Edmon Begoli created CALCITE-2025:
-------------------------------------
Summary: Create adapter(s) for standard bioinformatics database
files
Key: CALCITE-2025
URL: https://issues.apache.org/jira/browse/CALCITE-2025
Project: Calcite
Issue Type: New Feature
Reporter: Edmon Begoli
Assignee: Edmon Begoli
Priority: Minor
Common bioinformatics files, used mostly in genomic medicine, and life sciences
research are VCF, SAM, and FASTQ/FASTA files [1,2,3,4].
They are structured text files, with metadata headers, and (generally) column
oriented queries.
Having calcite support for these formats would enable it to serve as the front
end for processing of a very large body of important data, and to facilitate
the integration of these datasets into a downstream frameworks that incorporate
or use calcite.
This issue will serve as the parent issues for each format that will be
implemented (SAM, VCF, etc.)
1. SAM file format, https://en.wikipedia.org/wiki/SAM_(file_format)
2. VCF file format, https://en.wikipedia.org/wiki/Variant_Call_Format
3. FASTQ file format, https://en.wikipedia.org/wiki/FASTQ_format
4.Other,
https://bioinf.comav.upv.es/courses/sequence_analysis/sequence_file_formats.html
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)