[ 
https://issues.apache.org/jira/browse/CALCITE-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edmon Begoli updated CALCITE-2025:
----------------------------------
    Description: 
Common bioinformatics files, used mostly in genomic medicine, and life sciences 
research are VCF, SAM, and FASTQ/FASTA files [1,2,3,4]. 

They are structured text files, with metadata headers, and (generally) column 
oriented layout.

Having calcite support for these formats would enable it to serve as the front 
end for processing of a very large body of important data, and to facilitate 
the integration of these datasets into a downstream frameworks that incorporate 
or use calcite. 

This issue will serve as the parent issues for each format that will be 
implemented (SAM, VCF, etc.)

1. SAM file format, https://en.wikipedia.org/wiki/SAM_(file_format) 
2. VCF file format, https://en.wikipedia.org/wiki/Variant_Call_Format 
3. FASTQ file format, https://en.wikipedia.org/wiki/FASTQ_format 
4.Other, 
https://bioinf.comav.upv.es/courses/sequence_analysis/sequence_file_formats.html
 


  was:
Common bioinformatics files, used mostly in genomic medicine, and life sciences 
research are VCF, SAM, and FASTQ/FASTA files [1,2,3,4]. 

They are structured text files, with metadata headers, and (generally) column 
oriented queries.

Having calcite support for these formats would enable it to serve as the front 
end for processing of a very large body of important data, and to facilitate 
the integration of these datasets into a downstream frameworks that incorporate 
or use calcite. 

This issue will serve as the parent issues for each format that will be 
implemented (SAM, VCF, etc.)

1. SAM file format, https://en.wikipedia.org/wiki/SAM_(file_format) 
2. VCF file format, https://en.wikipedia.org/wiki/Variant_Call_Format 
3. FASTQ file format, https://en.wikipedia.org/wiki/FASTQ_format 
4.Other, 
https://bioinf.comav.upv.es/courses/sequence_analysis/sequence_file_formats.html
 



> Create adapter(s) for standard bioinformatics database files
> ------------------------------------------------------------
>
>                 Key: CALCITE-2025
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2025
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Edmon Begoli
>            Assignee: Edmon Begoli
>            Priority: Minor
>              Labels: features
>   Original Estimate: 8,736h
>  Remaining Estimate: 8,736h
>
> Common bioinformatics files, used mostly in genomic medicine, and life 
> sciences research are VCF, SAM, and FASTQ/FASTA files [1,2,3,4]. 
> They are structured text files, with metadata headers, and (generally) column 
> oriented layout.
> Having calcite support for these formats would enable it to serve as the 
> front end for processing of a very large body of important data, and to 
> facilitate the integration of these datasets into a downstream frameworks 
> that incorporate or use calcite. 
> This issue will serve as the parent issues for each format that will be 
> implemented (SAM, VCF, etc.)
> 1. SAM file format, https://en.wikipedia.org/wiki/SAM_(file_format) 
> 2. VCF file format, https://en.wikipedia.org/wiki/Variant_Call_Format 
> 3. FASTQ file format, https://en.wikipedia.org/wiki/FASTQ_format 
> 4.Other, 
> https://bioinf.comav.upv.es/courses/sequence_analysis/sequence_file_formats.html
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to