Re: Loading DBpedia datasets

Paolo Castagna Thu, 05 Apr 2012 12:46:27 -0700

prerak pradhan wrote:
> Thanks for the input, Paolo and yes I did use tdbloader2 on ubuntu and than 
> transfered the directory into windows. Loading seems a lot faster in linux 
> using tdbloader2.


Yep.

> I am working on this mini-project which aims to develop that performs 
> semantic search on DBpedia data, something like Kngine or Hakia on a very 
> very small scale, I think I have got the natural language processing part 
> done and am now trying to work on forming SPARQL queries to be run against 
> Dbpedia dataset based on NLP output on the user entered query. Do you have 
> any links or resource in this regard? thanks again appreciate it.

Oh, well... another "semantic search" project. :-)

I do not have more links that you would find from Wikipedia page
on "semantic search" and the references there.

By the way, Which NLP library are you using?
How do you generate a SPARQL query from natural language?

In a distant past, I tried to exploit the prefix:keyword pattern
that many search engines support, and therefore people might already
used to, to do something vaguely similar but much (much) simpler
thing (no NLP involved):

  type:car color:blue model:touran city:london ...

It works for very simple queries, but it quickly becomes impractical.
It would, however, be an improvement for certain type of searches.
Even supporting just type:{book|person|city|...} can be useful.

Paolo

> 
> 
> ________________________________
>  From: Paolo Castagna <[email protected]>
> To: [email protected] 
> Sent: Thursday, April 5, 2012 8:36 AM
> Subject: Re: Loading DBpedia datasets
>  
> prerak pradhan wrote:
>> Hello, there am just starting off with Jena and am pretty new to it. I am 
>> trying to load all the DBpedia datasets so that I can have a local version 
>> of DBpedia working on my station here. I used the TDB loader to load the 
>> data sets while doing so I specified a directory on which to load the 
>> dataset.  I used the following code to query the dataset.
>>    String directory = "c:/dataset" ;
>>    DatasetGraphTDB dataset = TDBFactory.createDatasetGraph(directory);
>>    Graph g1 = dataset.getDefaultGraph();
>>    Model newModel = ModelFactory.createModelForGraph(g1);
>>    String q= "SELECT ?p ?o WHERE { 
>> <http://dbpedia.org/resource/Mendelian_inheritance> ?p ?o . }";
>>    Query query = QueryFactory.create(q);
>>    QueryExecution qexec = QueryExecutionFactory.create(query,newModel);
>>    ResultSet results = qexec.execSelect();
>>    while (results.hasNext()) {
>>      QuerySolution result = results.nextSolution();
>>      RDFNode s = result.get("s");
>>      RDFNode p = result.get("p");
>>      RDFNode o = result.get("o");
>>      System.out.println( " { " + s + " " + p + " " + o + " . }");
>> }
>> Now my question is, the DBpedia data dumps come in various files, do I load 
>> all these files in the same directory using TDB to create one huge model or 
>> do I need to load it into different directories thus having to create 
>> different models to query the data. Please not that I do not plan to load 
>> the whole of DBpedia datasets onto the datastore just the english version of 
>> Ontology Infobox properties, Titles and Ontology Infobox types. Forgive me 
>> for my very amateur question but i am just getting started with it ;). 
> 
> Hi Prerak,
> first of all, welcome on the Jena mailing list.
> 
> DBPedia is one of the "not so small" RDF data dumps around, so it's better you
> check you are using a 64 bits OS and JVM and you have a decent amount of RAM 
> on
> that machine. A few more details here:
> http://incubator.apache.org/jena/documentation/tdb/jvm_64_32.html
> 
> Then, allow me to suggest you to read about 'RDF dataset' here:
> http://www.w3.org/TR/sparql11-query/#rdfDataset
> 
> TDB supports RDF datasets, documentation is here:
> http://incubator.apache.org/jena/documentation/tdb/datasets.html
> 
> So, you can load the entire DBPedia data into a single TDB location on disk
> (i.e. a single directory). This way, you can run SPARQL queries over it.
> This is in my opinion the best option.
> 
> You could use named graphs, read more about N-Quads serialization format here:
> http://sw.deri.org/2008/07/n-quads/
> 
> And, in relation to DBPedia, here:
> http://wiki.dbpedia.org/Datasets#h18-18
> 
> You might decide to create your own named graphs and load parts of DBPedia in 
> it
> to support your data management needs, rather than taking the named graphs
> given to you by DBPedia to track provenance.
> 
> Finally, with datasets of the size of DBPedia, tdbloader2 should be a better
> choice than tdbloader, but you seems to be using Windows therefore tdbloader2 
> is
> not a good choice for you. You could have a look at tdbloader3 as well, if you
> have problems with tdbloader. But, try first with tdbloader. You can also load
> the data of a server with a decent amount of RAM and move the files around as
> you need.
> 
> What are you planning to do with DBPedia loaded locally?
> 
> I hope this helps and let me know how it goes,
> Paolo

Re: Loading DBpedia datasets

Reply via email to