Re: Loading DBpedia datasets

prerak pradhan Thu, 05 Apr 2012 12:34:08 -0700

Thanks for the input, Paolo and yes I did use tdbloader2 on ubuntu and than 
transfered the directory into windows. Loading seems a lot faster in linux 
using tdbloader2. I am working on this mini-project which aims to develop that 
performs semantic search on DBpedia data, something like Kngine or Hakia on a 
very very small scale, I think I have got the natural language processing part 
done and am now trying to work on forming SPARQL queries to be run against 
Dbpedia dataset based on NLP output on the user entered query. Do you have any 
links or resource in this regard? thanks again appreciate it.

________________________________
 From: Paolo Castagna <[email protected]>
To: [email protected] 
Sent: Thursday, April 5, 2012 8:36 AM
Subject: Re: Loading DBpedia datasets

prerak pradhan wrote:
> Hello, there am just starting off with Jena and am pretty new to it. I am 
> trying to load all the DBpedia datasets so that I can have a local version of 
> DBpedia working on my station here. I used the TDB loader to load the data 
> sets while doing so I specified a directory on which to load the dataset.  I 
> used the following code to query the dataset.
>   String directory = "c:/dataset" ;
>   DatasetGraphTDB dataset = TDBFactory.createDatasetGraph(directory);
>   Graph g1 = dataset.getDefaultGraph();
>   Model newModel = ModelFactory.createModelForGraph(g1);
>   String q= "SELECT ?p ?o WHERE { 
><http://dbpedia.org/resource/Mendelian_inheritance> ?p ?o . }";
>   Query query = QueryFactory.create(q);
>   QueryExecution qexec = QueryExecutionFactory.create(query,newModel);
>   ResultSet results = qexec.execSelect();
>   while (results.hasNext()) {
>     QuerySolution result = results.nextSolution();
>     RDFNode s = result.get("s");
>     RDFNode p = result.get("p");
>     RDFNode o = result.get("o");
>     System.out.println( " { " + s + " " + p + " " + o + " . }");
> }
> Now my question is, the DBpedia data dumps come in various files, do I load 
> all these files in the same directory using TDB to create one huge model or 
> do I need to load it into different directories thus having to create 
> different models to query the data. Please not that I do not plan to load the 
> whole of DBpedia datasets onto the datastore just the english version of 
> Ontology Infobox properties, Titles and Ontology Infobox types. Forgive me 
> for my very amateur question but i am just getting started with it ;). 

Hi Prerak,
first of all, welcome on the Jena mailing list.

DBPedia is one of the "not so small" RDF data dumps around, so it's better you
check you are using a 64 bits OS and JVM and you have a decent amount of RAM on
that machine. A few more details here:
http://incubator.apache.org/jena/documentation/tdb/jvm_64_32.html

Then, allow me to suggest you to read about 'RDF dataset' here:
http://www.w3.org/TR/sparql11-query/#rdfDataset

TDB supports RDF datasets, documentation is here:
http://incubator.apache.org/jena/documentation/tdb/datasets.html

So, you can load the entire DBPedia data into a single TDB location on disk
(i.e. a single directory). This way, you can run SPARQL queries over it.
This is in my opinion the best option.

You could use named graphs, read more about N-Quads serialization format here:
http://sw.deri.org/2008/07/n-quads/

And, in relation to DBPedia, here:
http://wiki.dbpedia.org/Datasets#h18-18

You might decide to create your own named graphs and load parts of DBPedia in it
to support your data management needs, rather than taking the named graphs
given to you by DBPedia to track provenance.

Finally, with datasets of the size of DBPedia, tdbloader2 should be a better
choice than tdbloader, but you seems to be using Windows therefore tdbloader2 is
not a good choice for you. You could have a look at tdbloader3 as well, if you
have problems with tdbloader. But, try first with tdbloader. You can also load
the data of a server with a decent amount of RAM and move the files around as
you need.

What are you planning to do with DBPedia loaded locally?

I hope this helps and let me know how it goes,
Paolo

Re: Loading DBpedia datasets

Reply via email to