Hi Libby,
That's rather fabulous! Can you give some information about how often
this dataset is updated, and what's its geographical and product type
reach?
Thanks! This particular data set is a rather static collection and has a
bias towards US products. It will soon be complemented by a more dynamic
and European-centric second data set.
In the long run, we will have to convince professional providers of
commodity master data (e.g. GS1) to release their data following our
structure. Currently, this is not possible due to licensing restrictions
(there are look-up services like GEPIR, but none of them allows
redistribution of the data).
The upcoming second data set will be based on a community process, i.e.,
shop owners enter labels for EAN/UPCs in a Wiki.
Since EAN/UPCs must (theoretically) not be reused, the current data set
should be pretty reliable, though not necessarily very complete.
I see the main benefit of the current data set in
- using it as a showcase how small businesses can fetch product master
data from the Semantic Web and
- showing how data on the same commodity from multiple sources can be
easily linked on the basis of having the same
http://purl.org/goodrelations/v1.html#hasEAN_UCC-13
property value.
Individual commodity descriptions can be retrieved as follows:
http://openean.kaufkauf.net/id/EanUpc_<UPC/EAN>
Example:
http://openean.kaufkauf.net/id/EanUpc_0001067792600
This seems to give me multiple product descriptions - am I
misunderstanding?
The whole data set is divided in currently 100 (will be changed to 1000
soon) RDF files, which are being served via a bit complicated .htaccess
configuration.
The reason is that the large number of instance data would otherwise
require 1 million very small files (a few triples each), which may cause
problems with several file systems. Also, since we want as much of our
data as possible to stay within OWL DL (I know not everybody in the
community shares that), this would cause a lot of redundancy due to
ontology imports / header data in each single file.
But as far as I can see, the current approach should not have major side
effects - you get back additional triples, but the size of the files
being served is limited. Currently, we serve 4 MB file chunks. We will
shortly reduce that to 400 - 800 KB. That seems reasonable to me.
Best
Martin
Libby
begin:vcard
fn:Martin Hepp
n:Hepp;Martin
org:Bundeswehr University Munich;E-Business and Web Science Research Group
adr:;;Werner-Heisenberg-Web 39;Neubiberg;;D-85577;Germany
email;internet:mh...@computer.org
tel;work:+49 89 6004 4217
tel;pager:skype: mfhepp
url:http://www.heppnetz.de
version:2.1
end:vcard