Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Denny Vrandečić Tue, 04 Jun 2019 10:41:43 -0700

Thank you, I read now through the paper - great work, congratulations!

I am glad to see DBpedia evolve. I am very much looking forward to see it
move out of Beta.


I am also very glad (and quietly amused) to see you moving to opaque
identifiers. I think that is the right decision.

A few questions on the paper:

- the releases for the chapters - the enriched sets - aren't they basically
just a subset of the complete fused set? What is the point of these, why
would anyone prefer the Catalan release over the whole fused set? I am
missing something here.

- why would you load the English chapter into the main SPARQL endpoint
instead of the fused set?

- in Table 2, the Fusion dataset boasts 66M entities over Wikidata's 45M
entities. Where do the 21M more entities come from? Shouldn't most of the
individual Wikipedias' articles be already matched to Wikidata IDs and thus
fused together?

My understanding is that if I want to compare Wikidata to the fused
dataset, I need to download both the mappings and the fused dataset, and
then translate the latter using the former. Or is there some way for the
databus to create me a fused dataset using Wikidata IDs ("canonicalized",
as it used to be called) instead of the new DBpedia IDs? I thought I read
something like that in your previous answer, but I couldn't find that, and
am now thinking of just doing it myself.

Finally, as Table 4 shows, the Fused dataset has an appreciable win
of +2.16% over Wikidata on a property such as birth date. Would you
consider publishing these diffs under a CC0 license so they could be
provided to Wikidata for consideration and to enrich the source itself?

Cheers,
Denny



On Mon, Jun 3, 2019 at 10:48 PM Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de> wrote:

> Hi Denny,
> On 03.06.19 23:31, Denny Vrandečić wrote:
>
> Thank you a lot for the answer, that is super useful.
>
> I'll see if I can get the canonicalized version recreated :)
>
> One question though, is there a cleaned version of the DBpedia ontology
> mapping based data? I only found the uncleaned version.
>
>
> Two scripts are missing: Type consistency check and Redirect resolution.
> We will do them after the Unicode bug.
>
>
> Do you have any plans when the next release of DBpedia is going to be
> available?
>
>
> These are signed with the public key from
> http://webid.dbpedia.org/webid.ttl so they are the next releases. The
> structure will stay the same and each release should be a bit better than
> the previous one. We are just working on the handful of issues and a better
> way to comment on mistakes before we announce them on all channels.
>
> It is an open platform now. We will have core dataset releases (including
> raw data) and then the community can create their own additions.
>
> https://propi.github.io/webid.ttl will add the LHD dataset and Heiko
> Paulheim DBkWik, etc.
>
> If you do any analysis, you can get an account and publish the data on the
> bus. Links like https://databus.dbpedia.org/denny/analysis are stable
> redirects to files, just like purl.org.
>
>
> -- Sebastian
>
>
>
>
>
>
> On Mon, Jun 3, 2019 at 2:19 PM Sebastian Hellmann <
> hellm...@informatik.uni-leipzig.de> wrote:
>
>> Hi Denny,
>>
>> you didn't find them really, because they are not yet publicly released.
>> Please see them as a beta.
>>
>> The main reason is, that there are a handful of missing features and a
>> handful of stupid bugs.
>>
>> One example:
>>
>> - we discovered a unicode issue in URIs which still allows valid
>> analysis, but would not allow to load it into dbpedia.org/sparql
>>
>> - we built the Databus to have a group changelog and a dataset/artifact
>> changelog, however, these can only be changed at release time, so we can
>> not update reported errors after it was published, like the one above.
>>
>> It is not hard and marvin did new extractions already:
>> https://databus.dbpedia.org/marvin , there is just a bit missing.
>>
>>
>> i.e. files such as
>> http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
>> - can you point me where I can find the canonicalized versions in the new
>> files?
>>
>>
>> These are discontinued. Instead there is:
>>
>> https://databus.dbpedia.org/dbpedia/id-management/global-ids loaded into
>> this webservice:
>> https://global.dbpedia.org/same-thing/lookup/?uri=http://www.wikidata.org/entity/Q8087
>> where you can resolve many URIs against clusters.
>>
>> and the fused and enriched versions as described in
>> https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf
>>
>> Flexifusion is more systematic and can rewrite any datasetś subject with
>> any other subject from the ID management. So we could produce these
>> datasets any way.
>>
>>
>> Thanks for these pointers! I have run a few analyses, and now can rerun
>> them again with the actual current data :) I expect this to improve DBpedia
>> numbers by quite a bit.
>>
>> You could also try the fused version:
>> https://databus.dbpedia.org/dbpedia/fusion   This is the one we are
>> working on most and will aggregate a lot more data in the future.
>>
>>
>> I find it all a bit hard to navigate (although Databus has a few really
>> neat features, thanks for that).
>>
>> Any feedback welcome, the issue tracker is linked on top of the website.
>>
>>
>> Yes, another missing feature. However, we thought that the pros will just
>> look at the dataid files and then write sparql queries at
>> https://databus.dbpedia.org/yasgui/
>>
>> -- Sebastian
>>
>>
>> On 03.06.19 19:49, Denny Vrandečić wrote:
>>
>> Oh, wow, thanks Sebastian, thanks Kingsley for the answers!
>>
>> I was entirely unaware of the DBpedia datasets over at
>> databus.dbpedia.org - when I search for "dbpedia downloads" that's not
>> where I get to. Also, when I go to dbpedia.org and then click on
>> "Downloads", I get to the 2016 datasets.
>>
>> https://wiki.dbpedia.org/Datasets
>>
>> https://wiki.dbpedia.org/develop/datasets
>>
>> I honestly thought, that the 2016 dataset is the latest one, and was
>> rather disappointed. Thank you for showing me that I was just looking in
>> the wrong place - but I would really suggest that you update your Websites
>> to point to databus. I am sure I am not the only one who believes that
>> there has been no DBpedia update since 2016.
>>
>> Thanks for these pointers! I have run a few analyses, and now can rerun
>> them again with the actual current data :) I expect this to improve DBpedia
>> numbers by quite a bit.
>>
>> One question, I liked to use the canonicalized versions from here
>> https://wiki.dbpedia.org/downloads-2016-10, i.e. files such as
>> http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
>> - can you point me where I can find the canonicalized versions in the new
>> files? I find it all a bit hard to navigate (although Databus has a few
>> really neat features, thanks for that).
>>
>> Cheers,
>> Denny
>>
>>
>>
>>
>>
>> On Sat, Jun 1, 2019 at 9:43 AM Kingsley Idehen <kide...@openlinksw.com>
>> wrote:
>>
>>> On 6/1/19 5:45 AM, Sebastian Hellmann wrote:
>>>
>>> Hi Denny,
>>>
>>> * the old system was like this:
>>>
>>> we load from here: http://downloads.dbpedia.org/2016-10/core/
>>>
>>> metadata is in
>>> http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl with
>>> void:sparqlEndpoint     <http://dbpedia.org/sparql>
>>> <http://dbpedia.org/sparql> ;
>>>
>>>
>>> Hi Sebastian,
>>>
>>>
>>> I will also have the TTL referenced above loaded to a named graph so
>>> that it becomes accessible from the query solution I shared in my prior
>>> post.
>>>
>>>
>>>
>>> * the new system is here: https://databus.dbpedia.org/dbpedia
>>>
>>> There are 6 new releases and the metadata is in the endpoint
>>> https://databus.dbpedia.org/repo/sparql
>>>
>>> Once the collection saving feature  is finished, we will build a
>>> collection of datasets on the bus, which will then be loaded. It is
>>> basically a sparql query retrieving the downloadurls like this:
>>>
>>> http://dev.dbpedia.org/Data#example-application-virtuoso-docker
>>>
>>>
>>> Okay.
>>>
>>> Please install the Faceted Browser so that URIs like
>>> http://dev.dbpedia.org/Data#example-application-virtuoso-docker can
>>> also be looked up.
>>>
>>> As an aside, here's an Entity Type overview query results page
>>> <https://databus.dbpedia.org/repo/sparql?default-graph-uri=&query=SELECT+%28SAMPLE%28%3Fs%29+AS+%3Fsample%29+%28COUNT%281%29+AS+%3Fcount%29++%28%3Fo+AS+%3FentityType%29%0D%0AWHERE+%7B%0D%0A++++++++%3Fs+a+%3Fo.+%0D%0A%09%09FILTER+%28isIRI%28%3Fs%29%29+%0D%0A++++++++++++++++FILTER+%28%21+contains%28str%28%3Fs%29%2C%22virt%22%29%29%0D%0A++++++%7D+%0D%0AGROUP+BY+%3Fo%0D%0AORDER+BY+DESC+%28%3Fcount%29&format=text%2Fhtml&timeout=0&debug=on>
>>> for future use etc..
>>>
>>>
>>> Kingsley
>>>
>>>
>>>
>>>
>>> On 31.05.19 21:59, Denny Vrandečić wrote:
>>>
>>> Thank you for the answer!
>>>
>>> I don't see how the query solution page that you linked indicates that
>>> this is the English Wikipedia extraction. Where does it say that? How can I
>>> tell? I am trying to understand, thanks.
>>>
>>> Also, when I download the set of English extractions from here,
>>>
>>> http://downloads.dbpedia.org/2016-10/core-i18n/en/
>>>
>>> particularly this one,
>>>
>>>
>>> http://downloads.dbpedia.org/2016-10/core-i18n/en/mappingbased_objects_en.ttl.bz2
>>>
>>>
>>> it is only about 17,467 people with parents, not 20,120, so that dataset
>>> seems out of sync with the one in the SPARQL endpoint.
>>>
>>> I am curious where do you load the dataset from?
>>>
>>> Thank you!
>>>
>>>
>>> On Fri, May 31, 2019 at 11:49 AM Kingsley Idehen <kide...@openlinksw.com>
>>> wrote:
>>>
>>>> On 5/31/19 2:23 PM, Denny Vrandečić wrote:
>>>>
>>>> When I query the dbpedia.org/sparql endpoint asking for "how many
>>>> people with a parent do you know?", i.e. select (count (distinct ?p) as ?c)
>>>> where { ?s dbo:parent ?o }, I get as the answer 20,120.
>>>>
>>>> Where among the Downloads on wiki.dbpedia.org/downloads-2016-10 can I
>>>> find the dataset that the SPARQL endpoint actually serves? Is it the
>>>> English Wikipedia-based "Mappingbased" one? Or is t the "Infobox Properties
>>>> Mapped"?
>>>>
>>>> Cheers,
>>>> Denny
>>>>
>>>>
>>>> The query solution page
>>>> <http://dbpedia.org/sparql?default-graph-uri=&query=prefix+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0A%0D%0Aselect+%3Fg+%28count+%28distinct+%3Fs%29+as+%3Fc%29%0D%0Awhere+%7B+%0D%0A+++++++%0D%0A+++++++++graph+%3Fg+%7B%3Fs+dbo%3Aparent+%3Fo.%7D%0D%0A%0D%0A+++++%7D%0D%0Agroup+by+%3Fg&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+>
>>>> indicates this is the English Wikipedia dataset. That's what we've always
>>>> loaded into the Virtuoso instance from which DBpedia Linked Data and its
>>>> associated SPARQL endpoint are deployed.
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Kingsley Idehen    
>>>> Founder & CEO
>>>> OpenLink Software
>>>> Home Page: http://www.openlinksw.com
>>>> Community Support: https://community.openlinksw.com
>>>> Weblogs (Blogs):
>>>> Company Blog: https://medium.com/openlink-software-blog
>>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>>> Data Access Drivers Blog: 
>>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>>
>>>> Personal Weblogs (Blogs):
>>>> Medium Blog: https://medium.com/@kidehen
>>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>>               http://kidehen.blogspot.com
>>>>
>>>> Profile Pages:
>>>> Pinterest: https://www.pinterest.com/kidehen/
>>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>>> Twitter: https://twitter.com/kidehen
>>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>>
>>>> Web Identities (WebID):
>>>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>>         : 
>>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>>
>>>> _______________________________________________
>>>> DBpedia-discussion mailing list
>>>> DBpedia-discussion@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>>
>>>
>>>
>>> _______________________________________________
>>> DBpedia-discussion mailing 
>>> listDBpedia-discussion@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>> --
>>> All the best,
>>> Sebastian Hellmann
>>>
>>> Director of Knowledge Integration and Linked Data Technologies (KILT)
>>> Competence Center
>>> at the Institute for Applied Informatics (InfAI) at Leipzig University
>>> Executive Director of the DBpedia Association
>>> Projects: http://dbpedia.org, http://nlp2rdf.org,
>>> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
>>> <http://www.w3.org/community/ld4lt>
>>> Homepage: http://aksw.org/SebastianHellmann
>>> Research Group: http://aksw.org
>>>
>>>
>>> _______________________________________________
>>> DBpedia-discussion mailing 
>>> listDBpedia-discussion@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Kingsley Idehen     
>>> Founder & CEO
>>> OpenLink Software
>>> Home Page: http://www.openlinksw.com
>>> Community Support: https://community.openlinksw.com
>>> Weblogs (Blogs):
>>> Company Blog: https://medium.com/openlink-software-blog
>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>> Data Access Drivers Blog: 
>>> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>>
>>> Personal Weblogs (Blogs):
>>> Medium Blog: https://medium.com/@kidehen
>>> Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
>>>               http://kidehen.blogspot.com
>>>
>>> Profile Pages:
>>> Pinterest: https://www.pinterest.com/kidehen/
>>> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
>>> Twitter: https://twitter.com/kidehen
>>> Google+: https://plus.google.com/+KingsleyIdehen/about
>>> LinkedIn: http://www.linkedin.com/in/kidehen
>>>
>>> Web Identities (WebID):
>>> Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
>>>         : 
>>> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
>>>
>>> _______________________________________________
>>> DBpedia-discussion mailing list
>>> DBpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>
>>
>> _______________________________________________
>> DBpedia-discussion mailing 
>> listDBpedia-discussion@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>> --
>> All the best,
>> Sebastian Hellmann
>>
>> Director of Knowledge Integration and Linked Data Technologies (KILT)
>> Competence Center
>> at the Institute for Applied Informatics (InfAI) at Leipzig University
>> Executive Director of the DBpedia Association
>> Projects: http://dbpedia.org, http://nlp2rdf.org,
>> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
>> <http://www.w3.org/community/ld4lt>
>> Homepage: http://aksw.org/SebastianHellmann
>> Research Group: http://aksw.org
>>
> --
> All the best,
> Sebastian Hellmann
>
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
>

_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Reply via email to