[ https://issues.apache.org/jira/browse/TIKA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Giuseppe Totaro updated TIKA-1580: ---------------------------------- Attachment: TIKA-1580.patch > ISA-Tab parsers > --------------- > > Key: TIKA-1580 > URL: https://issues.apache.org/jira/browse/TIKA-1580 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Giuseppe Totaro > Priority: Minor > Attachments: TIKA-1580.patch > > > We are going to add parsers for ISA-Tab data formats. > ISA-Tab files are related to [ISA Tools|http://www.isa-tools.org/] which help > to manage an increasingly diverse set of life science, environmental and > biomedical experiments that employing one or a combination of technologies. > The ISA tools are built upon _Investigation_, _Study_, and _Assay_ tabular > format. Therefore, ISA-Tab data format includes three types of file: > Investigation file ({{a_xxxx.txt}}), Study file ({{s_xxxx.txt}}), Assay file > ({{a_xxxx.txt}}). These files are organized as [top-down > hierarchy|http://www.isa-tools.org/format/specification/]: An Investigation > file includes one or more Study files: each Study files includes one or more > Assay files. > Essentially, the Investigation files contains high-level information about > the related study, so it provides only metadata about ISA-Tab files. > More details on file format specification are [available > online|http://isatab.sourceforge.net/docs/ISA-TAB_release-candidate-1_v1.0_24nov08.pdf]. > The patch in attachment provides a preliminary version of ISA-Tab parsers > (there are three parsers; one parser for each ISA-Tab filetype): > * {{ISATabInvestigationParser.java}}: parses Investigation files. It extracts > only metadata. > * {{ISATabStudyParser.java}}: parses Study files. > * {{ISATabAssayParser.java}}: parses Assay files. > The most important improvements are: > * Combine these three parsers in order to parse an ISArchive > * Provide a better mapping of both study and assay data on XHML. Currently, > {{ISATabStudyParser}} and {{ISATabAssayParser}} provide a naive mapping > function relying on [Apache Commons > CSV|https://commons.apache.org/proper/commons-csv/]. > Thanks for supporting me on this work [~chrismattmann]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)