Thanks everyone for your suggestions and clarifications. On Tue, Dec 15, 2015 at 10:18 PM, Tom Morris <tfmor...@gmail.com> wrote:
> Two other sources you might consider are Freebase and Wikidata. Using > them together with DBpedia might give you better results. > > Tom > > On Tue, Dec 15, 2015 at 5:27 AM, Vihari Piratla <viharipira...@gmail.com> > wrote: > >> Thanks Dimitris for a detailed response. >> I see 2,945,956 unique titles in instance-types_en.nt.bz2 and 2,716,774 >> unique titles in instance-types-transitive_en.nt.bz2. The number of unique >> titles in the two files together is 2,945,956. >> Currently, Wikipedia contains 5,031,836 articles in English. I am >> assuming the dump is missing 2 million or so titles because of the bug in >> the extraction framework. >> >> When can we expect the 2016 release? >> >> Thanks >> >> On Mon, Dec 14, 2015 at 8:53 PM, Dimitris Kontokostas <jimk...@gmail.com> >> wrote: >> >>> Hi Vihari, >>> >>> The main reason for the size reduction is due to the split between >>> direct & transitive types [1] >>> There was a bug [2] that indirectly affected some type assignments but >>> is now fixed and the next release will not have this problem. >>> Also note that besides SD-Types, in this release we published two >>> additional type datasets, dbatx and LHD [3] >>> >>> Regarding your 2nd question ('__'). These resources are extracted from >>> additional infoboxes in the same page but when they cannot be merged, we >>> create additional resources. >>> This is also a way to create intermediate node mappings >>> <http://mappings.dbpedia.org/index.php/Template:IntermediateNodeMapping>through >>> the mappings wiki e.g. in [4] >>> >>> [1] >>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types-transitive_en.nt.bz2 >>> [2] https://github.com/dbpedia/extraction-framework/issues/404 >>> [3] http://wiki.dbpedia.org/dbpedia-data-set-2015-04 >>> [4] >>> http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_officeholder >>> >>> On Mon, Dec 14, 2015 at 1:12 PM, Vihari Piratla <viharipira...@gmail.com >>> > wrote: >>> >>>> Hi, >>>> I am a software developer, we use DBpedia instance type or >>>> mapping-based type files in a pipeline to recognize entities. >>>> We found that the latest instance-types resource available at >>>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types_en.nt.bz2 >>>> is much smaller than the corresponding 2014 release >>>> http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/instance_types_en.nt.bz2 >>>> . >>>> As a result, the latest instance file is missing many entries present >>>> on Wikipedia such as Taj_Mahal, J._Paul_Getty_Museum, Grand_Canyon. >>>> What is the reason for the reduced size (110MB->35MB) >>>> Is this a bug? >>>> Are there some other files that we have to consider along with this >>>> file? >>>> >>>> We also sometimes see entries with '__', as in "Abraham_Lincoln__1" in >>>> the line >>>> <http://dbpedia.org/resource/Abraham_Lincoln__1> < >>>> http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < >>>> http://dbpedia.org/ontology/TimePeriod> >>>> What does '__' mean? Where can I find more information about these >>>> things. >>>> >>>> Thanks >>>> -- >>>> Vihari PIratla >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Dbpedia-discussion mailing list >>>> Dbpedia-discussion@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>>> >>>> >>> >>> >>> -- >>> Kontokostas Dimitris >>> >> >> >> >> -- >> V >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Dbpedia-discussion mailing list >> Dbpedia-discussion@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> > -- V
------------------------------------------------------------------------------
_______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion