I was thinking about the following scenario
Local triple store with SEER data <l>
Demo store <nc>
SELECT ...
FROM <l>
FROM <nc>
WHERE {... }
(which might be a nice use case for Eric's federation stuff)
Or integrating it into a local installation of the demo.
But I agree it is suboptimal.
Speaking of statistical analysis, we need an R SPARQL interface.
Anyone up for writing one?
There are a few SQL packages at http://lib.stat.cmu.edu/R/CRAN/src/
contrib/PACKAGES.html
SPARQL should be easier because it can be built off of
http://lib.stat.cmu.edu/R/CRAN/src/contrib/Descriptions/httpRequest.html
and
http://lib.stat.cmu.edu/R/CRAN/src/contrib/Descriptions/XML.html
-Alan
On Oct 5, 2007, at 8:14 AM, Matt Williams wrote:
Being able to do it, and do something useful with it would be good,
and might act as a good demonstrator. Again, I think the crucial
question is what it is *linking to* that gives it the added value:
I doubt that anyone would choose to do simple statistical analysis
on the data set in rdf (although I would be glad to be shot down).
Therefore if someone knows something we could link it with, I'd be
interested. I have done something similar (in non-rdf) linking SEER
with genomic data, but it's not big enough to make use of this.
It may also be that clear demonstration of the utility of this
might encourage them to relax the licensing restrictions.
I have an idea for a different data set which I will send as a
separate email.
Matt
Alan Ruttenberg wrote:
[cc changed to public-semweb-lifesci]
We could distribute a script that does the conversion to RDF so
that individuals who wanted to use it could still get it
themselves and put it into a local store.
There are two possible benefits of working with the data: 1)
Learning something from it 2) Adding it to the pool of rdf that is
in the demo
We can still perhaps benefit from 1), even if 2) is not possible -
but you tell us whether you think that is of value...
-Alan
On Oct 5, 2007, at 4:26 AM, Matt Williams wrote:
I've had a very quick look at this. It might be salutary to read
some parts of the data-user agreement.
1. You will not use nor permit others to use the data in any way
other than for statistical reporting and analysis for research
purposes. The SEER Program must be notified if it is discovered
that there has been any other use of the data.
<snip>
3. You will not attempt to link nor permit others to link the
data with individually identified records in another data base.
<snip>
6. You will not release nor permit others to release the data in
full or in part to any person except with the written approval of
the SEER Program. In particular, all members of the research team
who have access to the data must have signed data-use agreements.
<snip>
7. You will use appropriate safeguards to prevent use or
disclosure of the information other than as provided for by this
data-use agreement. If accessing the data from a centralized
location on a time sharing computer system or LAN with SEER*Stat
or another statistical package, you will not share your logon
name and password with any other individuals. You will also not
allow any other individuals to use your computer account after
you have logged on with your logon name and password.
I don't know to what extent this therefore causes problems with
the idea of sharing the data; while it can still be copied into
an rdf format, doing so and then keeping it on a local server
seems (mostly) pointless.
--http://acl.icnet.uk/~mw
http://adhominem.blogsome.com/
+44 (0)7834 899570
--
http://acl.icnet.uk/~mw
http://adhominem.blogsome.com/
+44 (0)7834 899570