I couldn't agree more.

Spreadsheets (and equivalently, CSV files) are a large fraction of the 'additional datafiles' that BioMed Central receives from authors.

What would be great would be to be able to define some simple standards and/or templates which authors could follow in their spreadsheets, to allow the automatic recognition of key life science identifiers, and quantitative attributes, and so the generation of RDF.

From my point of view, that's the most basic, practical and prevalent example of the whole semi-structured data, and so seems like a good starting point.

Matt

On 15 Feb 2006, at 5:42, Cutler, Roger (RogerCutler) wrote:


That's too deep for me.  I'll be satisfied, at least in an immediate
sense, with a demonstration of how to generate RDF from an Excel
spreadsheet.  I think I'll just start saying "Excel spreadsheet" and
forget about the term that we use internally to categorize the kinds of
problems we have.  Spreadsheets are pretty much the 80-20 of that
problem, so why not call a spade a spade.  I'm really not very good at
generalizing and categorizing.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Cavnor
Sent: Tuesday, February 14, 2006 3:54 PM
To: public-semweb-lifesci@w3.org
Subject: Re: Unstructured vs. Structured (was: HL7 and patient records
in RDF/OWL?)


I'd argue that most information resources are indeed semi-structured.
The human brain is only able to meta-categorize resources based on its
structured aspects (markup and structural metadata), its informational
content (its aboutness), and context (environmental metadata).

"Structured" data is only structured once we have a common understanding of its meaning. In this regard, data is never "raw" (except for randomly
generated data) - as even structured database tables have metadata to
add meaning. So the term "semi-structured" is always adequate as far as I am concerned. You'd have to prove that there is any other type of data
to me ;)


--
Christopher Cavnor


On 2/14/06 10:54 AM, "Cutler, Roger (RogerCutler)"
<[EMAIL PROTECTED]>
wrote:


OK, then is there a preferred term for what we call "semi-structured
data"?  That is, information that is structured but where the
structure
is not easily determined and perhaps has not been formalized at all,
but
for which a formalized structure could be defined?  For example,
tables
in a spreadsheet?  We really care about this kind of thing, but I
don't
want to confuse the issue by using terms that most people understand
differently.

Incidentally, from my personal experience the usage of the term
semi-structured, that is, binary blobs in structured databases, is not very common. Frankly, this is the first I have heard the term used in
that sense, but maybe I just don't run in the right circles.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim Hendler
Sent: Monday, February 13, 2006 3:43 PM
To: Pat Hayes; Gao, Yong
Cc: public-semweb-lifesci@w3.org
Subject: Re: Unstructured vs. Structured (was: HL7 and patient records
in RDF/OWL?)


At 14:46 -0600 2/13/06, Pat Hayes wrote:

The point I'm trying to make is this: The concept of
"structuredness"
is relative and context-sensitive.

Hear, hear. Well said.

Pat Hayes



FWIW, Structured, unstructured and semi-structured, although
non-precise
concepts in common language and (esp) philosophy, have well-defined
and
precise meanings in database jargon" -- most database books have
decent
definitions that are consistent with:
  unstructured - NL text
semi-structured - unstructured fields within a structured DB context
  structured - relational model (or similar) (those papers with
technical definitions tend to get ugly and recourse to relational
calculus, so these overly simplified definitions should suffice for
now)
that said, in the spirit of this particular thread, I think we should
be
careful and, if we mean to use it in a DB context, make it clear in
any
document that uses the term (i.e. "structured database" v.
"structured data" which are very different in some contexts)
    -JH








Reply via email to