Thanks for the input, Paolo and yes I did use tdbloader2 on ubuntu and than transfered the directory into windows. Loading seems a lot faster in linux using tdbloader2. I am working on this mini-project which aims to develop that performs semantic search on DBpedia data, something like Kngine or Hakia on a very very small scale, I think I have got the natural language processing part done and am now trying to work on forming SPARQL queries to be run against Dbpedia dataset based on NLP output on the user entered query. Do you have any links or resource in this regard? thanks again appreciate it.
________________________________ From: Paolo Castagna <[email protected]> To: [email protected] Sent: Thursday, April 5, 2012 8:36 AM Subject: Re: Loading DBpedia datasets prerak pradhan wrote: > Hello, there am just starting off with Jena and am pretty new to it. I am > trying to load all the DBpedia datasets so that I can have a local version of > DBpedia working on my station here. I used the TDB loader to load the data > sets while doing so I specified a directory on which to load the dataset. I > used the following code to query the dataset. > String directory = "c:/dataset" ; > DatasetGraphTDB dataset = TDBFactory.createDatasetGraph(directory); > Graph g1 = dataset.getDefaultGraph(); > Model newModel = ModelFactory.createModelForGraph(g1); > String q= "SELECT ?p ?o WHERE { ><http://dbpedia.org/resource/Mendelian_inheritance> ?p ?o . }"; > Query query = QueryFactory.create(q); > QueryExecution qexec = QueryExecutionFactory.create(query,newModel); > ResultSet results = qexec.execSelect(); > while (results.hasNext()) { > QuerySolution result = results.nextSolution(); > RDFNode s = result.get("s"); > RDFNode p = result.get("p"); > RDFNode o = result.get("o"); > System.out.println( " { " + s + " " + p + " " + o + " . }"); > } > Now my question is, the DBpedia data dumps come in various files, do I load > all these files in the same directory using TDB to create one huge model or > do I need to load it into different directories thus having to create > different models to query the data. Please not that I do not plan to load the > whole of DBpedia datasets onto the datastore just the english version of > Ontology Infobox properties, Titles and Ontology Infobox types. Forgive me > for my very amateur question but i am just getting started with it ;). Hi Prerak, first of all, welcome on the Jena mailing list. DBPedia is one of the "not so small" RDF data dumps around, so it's better you check you are using a 64 bits OS and JVM and you have a decent amount of RAM on that machine. A few more details here: http://incubator.apache.org/jena/documentation/tdb/jvm_64_32.html Then, allow me to suggest you to read about 'RDF dataset' here: http://www.w3.org/TR/sparql11-query/#rdfDataset TDB supports RDF datasets, documentation is here: http://incubator.apache.org/jena/documentation/tdb/datasets.html So, you can load the entire DBPedia data into a single TDB location on disk (i.e. a single directory). This way, you can run SPARQL queries over it. This is in my opinion the best option. You could use named graphs, read more about N-Quads serialization format here: http://sw.deri.org/2008/07/n-quads/ And, in relation to DBPedia, here: http://wiki.dbpedia.org/Datasets#h18-18 You might decide to create your own named graphs and load parts of DBPedia in it to support your data management needs, rather than taking the named graphs given to you by DBPedia to track provenance. Finally, with datasets of the size of DBPedia, tdbloader2 should be a better choice than tdbloader, but you seems to be using Windows therefore tdbloader2 is not a good choice for you. You could have a look at tdbloader3 as well, if you have problems with tdbloader. But, try first with tdbloader. You can also load the data of a server with a decent amount of RAM and move the files around as you need. What are you planning to do with DBPedia loaded locally? I hope this helps and let me know how it goes, Paolo
