Re: ANN: GoodRelations - E-Commerce on the Web of Data - New Datasets and Applications

Martin Hepp (UniBW) Wed, 20 May 2009 02:36:43 -0700

Hi Libby,

That's rather fabulous! Can you give some information about how oftenthis dataset is updated, and what's its geographical and product typereach?

Thanks! This particular data set is a rather static collection and has abias towards US products. It will soon be complemented by a more dynamicand European-centric second data set.

In the long run, we will have to convince professional providers ofcommodity master data (e.g. GS1) to release their data following ourstructure. Currently, this is not possible due to licensing restrictions(there are look-up services like GEPIR, but none of them allowsredistribution of the data).

The upcoming second data set will be based on a community process, i.e.,shop owners enter labels for EAN/UPCs in a Wiki.

Since EAN/UPCs must (theoretically) not be reused, the current data setshould be pretty reliable, though not necessarily very complete.


I see the main benefit of the current data set in

- using it as a showcase how small businesses can fetch product masterdata from the Semantic Web and- showing how data on the same commodity from multiple sources can beeasily linked on the basis of having the same


http://purl.org/goodrelations/v1.html#hasEAN_UCC-13

property value.

Individual commodity descriptions can be retrieved as follows:

http://openean.kaufkauf.net/id/EanUpc_<UPC/EAN>

Example:

http://openean.kaufkauf.net/id/EanUpc_0001067792600

This seems to give me multiple product descriptions - am Imisunderstanding?

The whole data set is divided in currently 100 (will be changed to 1000soon) RDF files, which are being served via a bit complicated .htaccessconfiguration.

The reason is that the large number of instance data would otherwiserequire 1 million very small files (a few triples each), which may causeproblems with several file systems. Also, since we want as much of ourdata as possible to stay within OWL DL (I know not everybody in thecommunity shares that), this would cause a lot of redundancy due toontology imports / header data in each single file.

But as far as I can see, the current approach should not have major sideeffects - you get back additional triples, but the size of the filesbeing served is limited. Currently, we serve 4 MB file chunks. We willshortly reduce that to 400 - 800 KB. That seems reasonable to me.


Best
Martin


Libby

begin:vcard
fn:Martin Hepp
n:Hepp;Martin
org:Bundeswehr University Munich;E-Business and Web Science Research Group
adr:;;Werner-Heisenberg-Web 39;Neubiberg;;D-85577;Germany
email;internet:mh...@computer.org
tel;work:+49 89 6004 4217
tel;pager:skype: mfhepp
url:http://www.heppnetz.de
version:2.1
end:vcard

Re: ANN: GoodRelations - E-Commerce on the Web of Data - New Datasets and Applications

Reply via email to