To follow up on this, do you think it would be possible to create a
generic GRDDL transformation that would extract information from any
well-structured XHTML table, using the scoped <th> row and column
headers?
alf.
On 19 Feb 2006, at 15:07, Alf Eaton wrote:
I've been trying to decide on a good way to provide tabular data in
papers using XHTML, for presentation online. The best options seem
to be either just embedding the data as an array using JSON, or
using tables with class and id markup and allowing them to be
processed with GRDDL or Javascript to transform the data. Has there
been any work on presenting spreadsheets in XHTML?
alf.
On 19 Feb 2006, at 12:17, Eric Neumann wrote:
Matt,
Spreadsheets are indeed useful as formatted sources that can be
readily converted into RDF. We've used them as the primary source
of expression data for BioDash (see attached averages; full
GeneLogic data at http://www.samsi.info/200304/dmml/web-internal/
bio/data/data_rsvd.xls ). It almost seems a mapping tool could be
written to take any excel files, a GRDDL-like conversion of column
headers, row-headers, and cells, to produce RDF from these (see
the example).
In our example, we wrote the conversion scripts directly into the
excel file. The resulting (adenine/N3) file is show as well, with
symbols strings mapped to URI's. The cool thing here is that if
you add a DB query using the symbols strings (we did this within
BioDash), you can take the returned gene information, convert it
to RDF, and conenct it to the expression graph through the probes
for each the row (see resulting adenine file).
Perhaps the BIORDF group should include using sdf sources as part
of their overall strategy for producing RDF from current
structured files (e.g., gene expression, screening, and clinical
data in sdf). Many published papers have data tables, and this
would be a great way to auto convert them to RDF!
Eric
--- Matthew Cockerill <[EMAIL PROTECTED]> wrote:
I couldn't agree more.
Spreadsheets (and equivalently, CSV files) are a
large fraction of
the 'additional datafiles' that BioMed Central
receives from authors.
What would be great would be to be able to define
some simple
standards and/or templates which authors could
follow in their
spreadsheets, to allow the automatic recognition of
key life science
identifiers, and quantitative attributes, and so
the generation of RDF.
From my point of view, that's the most basic,
practical and
prevalent example of the whole semi-structured data,
and so seems
like a good starting point.
Matt
On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler)
wrote:
That's too deep for me. I'll be satisfied, at
least in an immediate
sense, with a demonstration of how to generate RDF
from an Excel
spreadsheet. I think I'll just start saying
"Excel spreadsheet" and
forget about the term that we use internally to
categorize the
kinds of
problems we have. Spreadsheets are pretty much
the 80-20 of that
problem, so why not call a spade a spade. I'm
really not very good at
generalizing and categorizing.