Re: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

Alf Eaton Thu, 23 Feb 2006 10:01:58 -0800

To follow up on this, do you think it would be possible to create ageneric GRDDL transformation that would extract information from anywell-structured XHTML table, using the scoped <th> row and columnheaders?


alf.

On 19 Feb 2006, at 15:07, Alf Eaton wrote:

I've been trying to decide on a good way to provide tabular data inpapers using XHTML, for presentation online. The best options seemto be either just embedding the data as an array using JSON, orusing tables with class and id markup and allowing them to beprocessed with GRDDL or Javascript to transform the data. Has therebeen any work on presenting spreadsheets in XHTML?
alf.

On 19 Feb 2006, at 12:17, Eric Neumann wrote:
Matt,
Spreadsheets are indeed useful as formatted sources that can bereadily converted into RDF. We've used them as the primary sourceof expression data for BioDash (see attached averages; fullGeneLogic data at http://www.samsi.info/200304/dmml/web-internal/bio/data/data_rsvd.xls ). It almost seems a mapping tool could bewritten to take any excel files, a GRDDL-like conversion of columnheaders, row-headers, and cells, to produce RDF from these (seethe example).
In our example, we wrote the conversion scripts directly into theexcel file. The resulting (adenine/N3) file is show as well, withsymbols strings mapped to URI's. The cool thing here is that ifyou add a DB query using the symbols strings (we did this withinBioDash), you can take the returned gene information, convert itto RDF, and conenct it to the expression graph through the probesfor each the row (see resulting adenine file).
Perhaps the BIORDF group should include using sdf sources as partof their overall strategy for producing RDF from currentstructured files (e.g., gene expression, screening, and clinicaldata in sdf). Many published papers have data tables, and thiswould be a great way to auto convert them to RDF!
Eric

--- Matthew Cockerill <[EMAIL PROTECTED]> wrote:
I couldn't agree more.

Spreadsheets (and equivalently, CSV files) are a
large fraction of
the 'additional datafiles' that BioMed Central
receives from authors.

What would be great would be to be able to define
some simple
standards and/or templates which authors could
follow in their
spreadsheets, to allow the automatic recognition of
key life science
identifiers, and quantitative attributes,  and so
the generation of RDF.

 From my point of view, that's the most basic,
practical and
prevalent example of the whole semi-structured data,
and so seems
like a good starting point.

Matt

On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler)
wrote:
That's too deep for me.  I'll be satisfied, at
least in an immediate
sense, with a demonstration of how to generate RDF
from an Excel
spreadsheet.  I think I'll just start saying
"Excel spreadsheet" and
forget about the term that we use internally to
categorize the
kinds of
problems we have.  Spreadsheets are pretty much
the 80-20 of that
problem, so why not call a spade a spade.  I'm
really not very good at
generalizing and categorizing.

Re: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

Reply via email to