Re: provenance questionnaire, v2

2011-09-06 Thread Egon Willighagen
On Tue, Sep 6, 2011 at 11:18 AM, Deus, Helena  wrote:
> I will forward you concerns to the provenance workgroup.

Well, authorization is going to be a big thing in our EU project...
various reasons for that, social, contractual, political. That's just
the way it is. I can elaborate further on our needs, if the is useful
to the WG.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: provenance questionnaire, v2

2011-09-06 Thread Egon Willighagen
On Thu, Sep 1, 2011 at 11:42 PM, Deus, Helena  wrote:
> For those of you who haven’t answered and would like to give your 2c about
> how provenance should be dealt with on the semantic web, here’s your chance!

Authorization would probably not be considered provenance, but I was
wondering if the WG has been talking about that, and if there is an
existing ontology that would be suitable for that, compatible with the
provenance ontology... it's clear that at least the depositors
(provenance) have authorization, so compatibility at that level seems
needed... Or?

Egon


-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: create HTML based on RDF?

2011-05-06 Thread Egon Willighagen
On Fri, May 6, 2011 at 2:17 PM, Kingsley Idehen  wrote:
> On 5/6/11 7:53 AM, Egon Willighagen wrote:
>> A very simple approach is to use standard web software, which everyone
>> has installed already: a normal webbrowser, a normal webserver,
>> RDF/XML, and XSLT.
>
> What is a normal Web Server? Linked Data is about a different Web dimension.

Sorry... linguistic differences... I was thinking about 'common'
rather that normal... in Dutch they are close together, and I am
clearly tired :)

> Even on the conventional front, how many people manage their own Web Server?

Indeed, that's one thing why I like the RDF/XML+XSLT approach, because
any HTML hosting service will do.

>> E.g. checkout this Resource:
>>
>> http://rdf.openmolecules.net/?InChI=1/CH4/h1H4
>
> What's increasingly lost re. Linked Data is that every resource needs to be
> a structured data source where information and data sources are decoupled.

That was in fact understood to be the setting. I am using unstructured
data sources with RDF too, e.g. using a Semantic MediaWiki. The
suggested RDF/XML+XSLT solution is by no means the only one. And, as
you know, I am using Virtuoso for certain use cases too :)

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: create HTML based on RDF?

2011-05-06 Thread Egon Willighagen
Dear Frans,

On Fri, May 6, 2011 at 1:02 PM, Frans Knibbe  wrote:
> I notice that it would be really helpful if I could automatically generate
> HTML files based on the RDF files. That way I can focus on just keeping the
> RDF file in good shape. After creating or editing an RDF file I could run
> something that makes a HTML representation.

A very simple approach is to use standard web software, which everyone
has installed already: a normal webbrowser, a normal webserver,
RDF/XML, and XSLT.

E.g. checkout this Resource:

http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

The web server will always return RDF/XML, and this XML document has
an associated XSLT stylesheet which is used by your web browser to
create human-targeted HTML, by using this line in the document:

http://rdf.openmolecules.net/html.xsl";?>

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Design issues 5-star data section tidy up

2011-03-09 Thread Egon Willighagen
Hi Christopher,

On Thu, Mar 10, 2011 at 8:27 AM, Christopher Gutteridge
 wrote:
> "Is that bad? For Linked Data to be useful, you need to be able to mix and
> share.". Sorry but that's simply not true. For it to be useful *to you*,
> perhaps, but (Closed) Linked Data still has massive value as a technology
> and not all data should or can be fully open!

Data consumption is indeed a 'use' too. Like watching the Simpsons.
Sorry for being sloppy there. There most certainly is a place and use
for Linked (Closed) Data.

> Linking and Openness are two unrelated, but great, things to do but you can
> do them independently. There is still value in data which is Linked but not
> entirely or even slightly open.
>
> Open is the gold standard, but it's not the only form of Linked Data.

Indeed not. And apologies for implying that Linked Data is bad in
itself. It simply disallows certain important use cases, which is what
I wanted to say.

> There's a massive value to companies to produce Linked Intranets which will
> link and use open data from outside, but certainly not be open.

Linked Data often needs dedicated, often individual licensing to keep
things going. While inefficient, there is a valid choice.

> At the heart of our university are lectures. From a Linked data perspective,
> these are a motherlode of linkage. A lecture is the nexus point joining: A
> room, eg.  with a lecturer, eg.
>  with a number of students, with the
> URI of a Module
>  and the
> specific instance of that module
>  and resources
> for that lecture  . However,
> unlike most of our other data, it would take a huge policy decision to make
> this information freely available, but I can still make it available in a
> closed form to a student or staff member, upon authentication, which means
> that they can still have it on an iphone app / google calendar etc.

So, can a student actually start a cool webservice where students can
mashup their classes with FaceBook? They will be redistributing the
data. Are they allowed? Are they allowed to fix errors and share
those? Are they allowed to make some profit out of it, to pay for the
Amazon EC2 hosting? If your data is not Open, they cannot.

> Linked is a technology.
> Open is an ideology.

I do not think that is true. Instead, I see them as both technologies:
they are both inventions to make things possible.

> Right now  is technically
> should get ZERO stars as it's very complex to work out what license we have
> the right to use.

And why is that? It sounds to me this is because your upstream data
provider is zero star? Should a star-rating system fail (or ideals
change), because the UK law system is, umm, akward?

> Some of the abstracts of papers may legally belong to
> publishers and it may be OK for us to publish and distribute tham as data,
> but not to grant licenses on something we don't own.

Well, I'd be the last to say the current publishing practices are
technologically working efficiently :) I've ranted enough about that
in my blog.

> This dataset is on two
> journeys, one ends with an open license (silver to gold), one with it
> getting fully linked into the data web (* to *). They converge at the
> heady heights of 5 gold-star fully linked and open data.

I fully understand how hard it is to not be able to join the party,
because your data providers are not cooperating, as they limit you
what to do with their data. But I feel bad about that deciding what
our ideals should be.

Instead, I would suggest SOTON to split data sets, and makes parts of
it Open (those for which it can), and make the Closed bits separately
available as Closed. That way, you still get your FIVE stars.

See, 'Open' is a technology: the fact that some closed data
"copylefts" the whole package doesn't sounds like an ideological, but
really a technological (legal) problem to me. But this can be simply
overcome to make them separately available, I think, just like Bio2RDF
and others do.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Design issues 5-star data section tidy up

2011-03-09 Thread Egon Willighagen
On Wed, Mar 9, 2011 at 10:03 PM, Martin Hepp
 wrote:
>> ★     Available on the web (whatever format), but with an open licence
>
> I fear that the "open" requirement as the entrance gate for the star schema 
> means that e-commerce data will be excluded.
>
> Most providers of e-commerce data (offers, model data, images,...) will
> - want to put some constraints on the usage of their data or
> - cannot release the data under an open license because they are bound by 
> their licensing conditions to the actual creator of the page.

Is that bad? For Linked Data to be useful, you need to be able to mix
and share. You need to be able to fix and redistribute. The 'open'
nature for LOD is for me the difference between LOD and LD... let this
closed e-commerce data be happy in the LD world, not?

Dear Tim,

may I request some clarification on what kind of 'Open' you are
referring to in the first star? Is this the Open Knowledge Foundation
kind of Open, which excludes the Non-Commercial clause (as on CKAN)?
Or is that allowed, and to only modification and redistribution matter
to you? Or perhaps you do have a completely different view on these
things.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Introducing Vocabularies of a Friend (VOAF)

2011-01-15 Thread Egon Willighagen
Hi Bernard,

On Fri, Jan 14, 2011 at 6:51 PM, Bernard Vatant
 wrote:
> VOAF is of course a clear homage to FOAF, which is the hub of the network :
> more than half of the listed vocabularies rely on it one way or another.
> I've asked Dan Brickley a couple of days ago if he did not mind this
> friendly hack. Without answer from him, I just went ahead following the
> adage "Qui ne dit mot consent".

Maybe it's just Saturday morning, but what exactly is the goal of your
VOAF effort? What problems with existing ontologies does it address?
Just curious, as it sounds interesting...

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-07 Thread Egon Willighagen
Hi Kingsley,

On Sun, Nov 7, 2010 at 4:50 PM, Kingsley Idehen  wrote:
> Seen this mail kinda late, hence late response. Some examples:

No worries!

> Links:
>
> 1. http://goo.gl/MG5iS -- shows a descriptor page, , and "Link"
> headers putting wdrs:describedBy to use
> 2.
> http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fpurl.org%2Fgoodrelations%2Fv1&p=12002
> -- rdfs:isDefinedBy and its effects .

Thanks! Very much appreciated!

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Egon Willighagen
Hi Kingsley,

On Fri, Nov 5, 2010 at 1:58 AM, Kingsley Idehen  wrote:
> As a best practice, common use of these predicates would increase
> navigability, link density, and overall cohesiveness of the burgeoning Web
> of Linked Data. It would truly demonstrate practicing what we preach,
> dog-food style!

So you have some examples where these two specifications are used?

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: New LOD Cloud

2010-09-24 Thread Egon Willighagen
2010/9/24 François Dongier :
> Would be nice to go a bit beyond this relatively crude categorisation.
> Enabling someone interested in, say, wine or Alabama farming, to highlight
> the datasets that are relevant to this interest.

Indeed, it would be great to select the classification ontology by
which the nodes are colored, which then dynamically changes the
coloring... perhaps also coloring by license could be interesting?

Egon

-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: New LOD Cloud

2010-09-23 Thread Egon Willighagen
On Wed, Sep 22, 2010 at 10:52 PM, Richard Cyganiak  wrote:
> On 22 Sep 2010, at 20:41, Egon Willighagen wrote:
> If you want to see ChEMBL in the next issue, better get started on those
> links ;-)

I am on the road right now, but there is low hanging fruit... however,
at the same time, I need to update from ChEBML 02 to 06... hope to get
this done early next week...

I will also try to make a dump of my http://rdf.openmolecule.net/
'node' even though a lot of that content is actually dynamically
generated (in principle, the number of triples is unbound)

Egon

-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: New LOD Cloud

2010-09-22 Thread Egon Willighagen
On Wed, Sep 22, 2010 at 8:50 PM, Anja Jentzsch  wrote:
> 215 data sets have been entered into CKAN and added to the lodcloud group. 
> 203 of those form a connected cloud of data sets, and are shown in the 
> picture. The data sets consist of over 25 billion RDF triples, which are 
> interlinked by around 395 million RDF links.

Nice!

I will try to update my ChEMBL data and make proper links out to the
other LODD sources...

Now that you build from CKAN, are you going to update the plot more often?

Egon

-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: [pedantic-web] RE: [ANN] RDFa Developer (1.0b1): RDFa extension for firefox

2010-08-11 Thread Egon Willighagen
On Wed, Jul 14, 2010 at 10:32 PM, Hondros, Constantine
 wrote:
> My only suggestion is that you try to detect whether the page contains
> deliberate RDFa, for example whether it uses the RDFa DTD, or contains
> version="XHTML+RDFa 1.x", and don’t report any triples if it doesn’t
> conform. Otherwise your tool reports a jumble of triples in the XHTML vocab
> namespace that are of questionable semantic value.

Agreed.

Also, I think the below 'notice' is the wrong way around:

Non-conventional prefix: the namespace
'http://github.com/egonw/cheminformatics.classics/1/#' mapped to the
prefix 'cc' isn't commonly used, 'http://creativecommons.org/ns#'
could be a better choice

I am pretty sure I am using the right NS, but the notice could suggest
me to use a different *prefix*... (BTW, perhaps it could be possible
to configure the notice types one would to see or not... I'd likely
turn of this one...)

Otherwise, it's indeed a great tool and I started using it doing my research:

http://chem-bla-ics.blogspot.com/2010/07/scripts-logs-as-htmlrdfa-mix-free-text.html

Egon


-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Announcement: Bio2RDF 0.3 released

2009-03-23 Thread Egon Willighagen
Hi Kei,

On Mon, Mar 23, 2009 at 3:37 PM, Kei Cheung  wrote:
> As part of the biordf query federation task, we are currently exploring a
> federation scenario involving integration of neuroreceptor-related
> information. For example, IUPHAR provides information for different classes
> of receptors. For example, in the table shown at
>  http://www.iuphar-db.org/GPCR/ReceptorListForward?class=class%20A, ligands
> are provided for receptors but not InChI codes ...

That's an interesting table... not Open it seems... did you ask
permission (and get) permission to  redistribute under a free license,
perhaps? The list is not overly long, and InChIs could be added
manually, though one would have to assume the compound names (btw,
some are compound classses!) are unique...

PubChem also has links to MeSH terms, and I also see a MeSH term in
the ChemBox on WikiPedia... that would be open data, and could provide
similary information.

I have been pondering about setting up open source semantic wiki to
linking data, where there is no Open source for that available, but
have not had time for that yet.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-22 Thread Egon Willighagen
On Mon, Mar 23, 2009 at 12:09 AM, Peter Ansell  wrote:
> 2009/3/22 Egon Willighagen :
>> On Sun, Mar 22, 2009 at 1:42 AM, Peter Ansell  wrote:
>>> Do you also provide InChIKey resolution?
>>
>> No. That requires look up, so only works against an existing database.
>> Chemspider is doing this, but is not a general solution. InChIKey's
>> are not unique, though clashes rare, and not observed so far.
>
> I didn't think it required a lookup to derive an InChIKey given an
> InChI.

Ah, sorry. InChIKey can be computed, but I thought you meant resolving
what structure has a given InChIKey... going from InChIKey to
structure does require lookup, generation from InChIKey from structure
(or InChI) does not.

> I realise that clashes are rare but possible, just wondering
> whether it would be supported. Leaving them out altogether just seems
> like missing possibly extra information.

I'll add them where missing.

>>> [1] It is just that InChI's
>>> can get pretty long for complex molecules and it makes it harder for
>>> people to accurately copy and paste them around when needed.
>>
>> Indeed. However, InChIKey is less precise. RDF allowing us to be do
>> things in an exact manner, I rather use InChI.
>>
>>> InChiKey's might be better for general use in RDF because they have a
>>> guaranteed identifier length and therefore won't become cumbersome for
>>> complex molecules.
>>
>> But can never be used for owl:sameAs like relations.
>
> Having them as properties could give someone a quick clue as to
> whether they are looking at the same molecule. Humans do interact with
> RDF (inevitably), and having short hash values can still be valuable.
> Given that hashes are usually designed to amplify small changes, it is
> easier than reading a 10 line InChiKey to determine whether there was
> a difference.

Agreed.

>>> Currently all of the InChI's that I have seen have been as Literals,
>>> but it would be relatively easy to also provide them as URI's to
>>> provide the link since you have a resolver for them set up.
>>
>> That was precisely the reason why I started the service.
>
> Good work.

Thanx for the feedback!

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-22 Thread Egon Willighagen
Hi Peter,

On Sun, Mar 22, 2009 at 1:42 AM, Peter Ansell  wrote:
> owl:sameAs fits well for that purpose.
>
>> http://rdf.openmolecules.net/?InChI=1/C12H8O2S/c13-8-5-6-10-11(12(8)14)7-3-1-2-4-9(7)15-10/h1-6,13-14H
>>
>> Linking back to rdf.openmolecules.net can be done as shown above with the 
>> InChI.
>
> Okay. Will look into doing owl:sameAs links back either dynamically or
> by modifying the way ChEBI is converted to RDF.

Thanx.

> Do you also provide InChIKey resolution?

No. That requires look up, so only works against an existing database.
Chemspider is doing this, but is not a general solution. InChIKey's
are not unique, though clashes rare, and not observed so far.

> [1] It is just that InChI's
> can get pretty long for complex molecules and it makes it harder for
> people to accurately copy and paste them around when needed.

Indeed. However, InChIKey is less precise. RDF allowing us to be do
things in an exact manner, I rather use InChI.

> InChiKey's might be better for general use in RDF because they have a
> guaranteed identifier length and therefore won't become cumbersome for
> complex molecules.

But can never be used for owl:sameAs like relations.

> Currently all of the InChI's that I have seen have been as Literals,
> but it would be relatively easy to also provide them as URI's to
> provide the link since you have a resolver for them set up.

That was precisely the reason why I started the service.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-20 Thread Egon Willighagen
Hi Peter,

On Fri, Mar 20, 2009 at 7:56 AM, Peter Ansell  wrote:
> * Some http://database.bio2rdf.org/database:identifier URI's are given
> by this, but these aren't standard, and are only shown where there is
> still at least one SPARQL endpoint available which uses them. People
> should utilise the http://bio2rdf.org/database:identifier versions
> when linking to Bio2RDF.

I'm using ChEBI IDs right now to link to your RDF with owl:sameAs:

http://rdf.openmolecules.net/?InChI=1/C12H8O2S/c13-8-5-6-10-11(12(8)14)7-3-1-2-4-9(7)15-10/h1-6,13-14H

Linking back to rdf.openmolecules.net can be done as shown above with the InChI.

I'll hook up to your DrugBank and DBPedia later today. Do you already
make links between ChEBI and DBPedia? I created links by converting
SMILES into InChIs:

http://chem-bla-ics.blogspot.com/2009/02/dbpedia-enters-rdfopenmoleculesnet.html

Comments most welcome!

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: [Dbpedia-discussion] [ANN] DBpedia Lookup

2009-02-11 Thread Egon Willighagen

On Tue, Feb 10, 2009 at 6:08 PM, Georgi Kobilarov
 wrote:
> Try the terms "Shakespeare", "EU", or "Cambridge" and see for yourself
> if the results you'd expect show up at the top. The result ranking is
> different - and supposed to be more useful - than a simple full-text
> search or SPARQL-Query with embedded regular expression for matching
> labels.

Or molecules... nice!

http://chem-bla-ics.blogspot.com/2009/02/dbpedia-lookup-and-autocomplete-of.html

Egon

-- 

http://chem-bla-ics.blogspot.com/