Well what I was trying to do was to figure out just which of the DBpedia
files I need to combine to get a maximal set of useful high-quality data.

I had thought that this should be easy.  However, it is not.

First there is the problem of getting the file table in the dataset section
to show up at all.

There is also the question of whether to look in the core directory or the
core-i18n directory.  I guess that the core-i18n directory is the place to
because the files in the dataset section of
http://wiki.dbpedia.org/downloads-2016-10 are all from there.

Then there is the question of whether to use the canonicalized names or the
localized names.  There are warnings that the files using canonicalized
names may be missing some information.  But how much information is missing?
Every useful Wikipedia page has a Wikidata item for it so it seems at first
that there are no missing Wikipedia items.  But then I remembered that pages
with multiple mapped infoboxes will produce multiple DBpedia items, so I
guess that these are not present.  But how many of these are there?  My
guess is not many, and the benefits of the canonicalized names outweigh the
effect of missing some information.

Then there is the question of whether simple or commons is the way to go
like one of them might have been in the past.  I guess not, because the
canonicalized names provides better integration.

Then there is the question of whether to use only mapping-based information
or to include other information.  As I'm interested in high-quality
information, I chose mapping-based information only.  Then there is the
question of how to get all the mapping-based information.  My guess is that
I need "Mappingbased Literals" and "Mappingbased Objects" which should be
adequate to pick up all the non-instance triples based on their
descriptions.  However, I guess that I also need "Geo Coordinates
Mappingbased" but that I don't need "Specific Mappingbased Properties".
Then I guess that I also need "Instance Types" and "Instance Types
Transitive".  I also want labels of the information, so I guess I need
"Labels" and labels_nmw, whereever that is.

Then there is the question of which languages to include.  My guess is all
of them, as I'm using the canonicalized names and the mapping-based results
so everything should combine together correctly.  If I get some duplicates
(e.g., from labels) that should be benign.

So I tried
wget -nc -r -np --cut-dirs=3 -A
which seems to do the trick, but I'm not very confidant that I have
downloaded everything I need.


On 08/29/2017 04:21 AM, Sebastian Hellmann wrote:
> Dear Paul, Mariano and Peter, 
> I thought about the emails for a while and I think that you were asking for
> "release policy" and "overview" not "roadmap".
> So here are the answers you are looking for:
> In general, DBpedia has a very high technical standard regarding metadata
> thanks to Markus Freudenberg DataID. Please see his summary from the
> http://w3c.github.io/dwbp/dwbp-implementation-report.html :
> So you can find all the metadata for all releases in the DataID files. See
> e.g. here: http://downloads.dbpedia.org/2016-10/ and in all subfolders. This
> should still be one if not the most extensive metadata description that you
> will find anywhere on the web.
> This was our foremost focus. In addition we made the simple JSON widget on the
> download page of the website. Provding an additional HTML-only versions is a
> good idea. 
> @Paul, all: the data to produce such a visualisation and query interface is
> available. If any of you takes the effort to provide something more efficient
> or extra features, we are happy to include it on the DBpedia website.
> @Mariano: releases are stable. However, it might happen that we add some more
> datasets or different formats and additional files with fixes or community
> contributions to the folder after the release data. This should not affect the
> already published files. Do you have a problem with more data post release?
> This is not a strict policy as we really don't have one. I would be happy if
> somebody takes the lead on this to document it properly.
>> Concerning the 2016-10 datasets, I have just realized that the dataset
>> <http://wiki.dbpedia.org/downloads-2016-10> "*Instance Types Sdtyped Dbo*"
>> is available in MORE languages than the ones initially announced (en, de,
>> nl). I thought that release datasets do not change over time. Is there any
>> policy on this? Perhaps the changes could be noticed in the release page.
> Seems to be a small documentation erratum. We could also need people who
> proofread and verify the release note. Markus does a good job actually, but I
> am quite sloopy doing the release note review.
> All the best,
> Sebastian
> On 25.08.2017 19:52, Paul Houle wrote:
>> I can say the page at
>> http://wiki.dbpedia.org/downloads-2016-10
>> does not work well for me.  There is a really great intention that that page
>> renders RDF data in order to draw the download tables,  but I find that:
>> (1) It takes a long time (30+ seconds) for the page to display any text at 
>> all
>> (2) It takes more time (60+ seconds) for the download tables to appear
>> (3) there is no visual indication that loading is in progress,  what time it
>> should take,  etc.
>> I could stand (2) if it were not for (3).  As it is,  I don't know how long
>> the page will take to load,  if it will load.  Is it stuck in such a way
>> that a refresh will fix it?  Is it not compatible with
>> (Edge|Chrome|Firefox|Safari|...)
>> I am left pining for the bad old days of HTML pages (no script) rendered
>> with as few bytes,  DNS lookups,  etc. as possible.  However,  it is
>> possible to make Javascript applications that are responsive.  See:
>> https://www.windy.com/?42.600,-75.562,5
>> In the meantime,  real RDF heads should fetch the RDF data and write SPARQL
>> queries against that.
>> ------ Original Message ------
>> From: "Peter F. Patel-Schneider" <pfpschnei...@gmail.com>
>> To: "Markus Freudenberg" <markus.freudenb...@gmail.com>
>> Cc: "DBpedia" <DBpedia-discussion@lists.sourceforge.net>
>> Sent: 8/25/2017 1:05:27 PM
>> Subject: Re: [DBpedia-discussion] roadmap to the 2016-10 dumps
>>> Aha.
>>> The table under 3. Datasets wasn't showing up for me, probably because I had
>>> my browser set on maximum paranoia.   After allowing third-party scripts 
>>> (and
>>> doing some other fiddling) I can now see the table.
>>> Thanks,
>>> peter
>>> On 08/24/2017 11:51 PM, Markus Freudenberg wrote:
>>>> I think you are looking for the interlanguage_links dataset:
>>>> http://downloads.dbpedia.org/preview.php?file=2016-10_sl_core-i18n_sl_en_sl_interlanguage_links_en.ttl.bz2
>>>> <http://downloads.dbpedia.org/preview.php?file=2016-10_sl_core-i18n_sl_en_sl_interlanguage_links_en.ttl.bz2>
>>>> or
>>>> http://downloads.dbpedia.org/preview.php?file=2016-10_sl_core-i18n_sl_wikidata_sl_sameas_all_wikis_wikidata.ttl.bz2
>>>> <http://downloads.dbpedia.org/preview.php?file=2016-10_sl_core-i18n_sl_wikidata_sl_sameas_all_wikis_wikidata.ttl.bz2>
>>>> Did you have a look at the official download page?
>>>> http://wiki.dbpedia.org/downloads-2016-10
>>>> <http://wiki.dbpedia.org/downloads-2016-10>
>>>> Here you find a short summary of the dataset and can peek into the content
>>>> (click the question marks).
>>>> There are indeed some files in the wikidata folder which are either 
>>>> temporary
>>>> files which need not be published.
>>>> Best,
>>>> Markus Freudenberg
>>>> Release Manager, DBpedia <http://wiki.dbpedia.org>
>>>> On Fri, Aug 25, 2017 at 3:37 AM, Peter F. Patel-Schneider
>>>> <pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:
>>>>     Hi:
>>>>     Is there a roadmap to the 2016-10 dumps.  I'm having trouble finding
>>>> some of
>>>>     the stuff that I think should be there (particularly the links to
>>>> Wikidata).
>>>>     Or maybe there are files that should have content but don't.
>>>> downloads.dbpedia.org/2016-10/core-i18n/wikidata/wikipedia_links_wikidata.ttl.bz2
>>>> <http://downloads.dbpedia.org/2016-10/core-i18n/wikidata/wikipedia_links_wikidata.ttl.bz2>
>>>>     appears to have no useful content.
>>>>     peter
-- 
> All the best,
> Sebastian Hellmann
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org,
> https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org

Reply via email to