Hi,

I am looking at DBPedia for nutritional facts. I tried running a query:

select distinct ?subject where { ?subject dbpprop:carbs ?value } limit 100

And I spotted a few issues (there might be more, but I stopped looking):

1. Take dbpedia:Squab_(food) as an example. The infobox on Wikipedia states
"There is some variation in nutritional content depending on the breed of
utility pigeon used for squabbing.". It is copied into DBPedia as a
dbpprop:note, and I am not sure how to automatically figure out whether it
relates to the infobox or not. Also I am missing the citation from
Wikipedia.

2. Take dbpedia:Coconut as an example. The Wikipedia article has two
infoboxes related to nutrition, one being "Coconut-inner edible solid part,
raw (fresh kopra)", another "coconut water". In DBPedia all the values are
collected, so every property have two values and I think it is almost
impossible to figure out which value relates to which infobox. Furthermore
the only place I see the name of the two infoboxes is the dbpprop:name, but
it also contains two extra unrelated values.

3. Sometimes there is a source (e.g. USDA Nutrient Database). I can look at
the property dbpprop:sourceUsda for an ID, but sometimes it just contains 1
if the infobox only links to the USDA search website in general and not the
actual entry. Occasionally the value is just wrong, as in
dbpedia:Orange_juice, where the USDA ID points to "classic sirloin steak
(10 oz)". Maybe it is just because it was corrected on Wikipedia after the
last import?

4. Sometimes there is a note saying "Percentages are roughly approximated
using US recommendations for adults" including a link with further
information. This information is not copied to DBPedia.

How could this be improved. It might involve a lot of work, and I think the
following points are important to consider:

A. Create an ontology that corresponds to the combined ways of using the
nutritional facts infobox. Create a resource for each infobox.

B. Each nutritional facts resource must be named accordingly. It should
contain notes if the values are uncertain in some way. It should reference
sources if available. It should contain the percentage values and a note
about how the percentage values are calculated including possible
references for further information.

C. Link each food/drink resource to one or more nutritional facts resource.

Have I overlooked something? Is there any related work regarding this
topic? Any comment is appreciated.

Cheers,

Bjarke Walling
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to