Hey Paolo, Sry for the late reply got quite busy here. I have been using the stanford NLP library along with Stanford NER and their dependency parser. To map user queries into SPARQL queries, the application am working on basically recognizes proper nouns or entities in the user input and tries to use the dependency parser to guess what property/relationship of that entity is the user trying to search for.
For example if the user puts in the line: "Managers of Manchester United with their active date" The application would recognize Manchester United as a proper noun or an entity and since the keyword "Manager" has a direct relationship here i.e the dependency "prep_of", it would try and find their Managers with the first query and than on the second run would try and add in their active date. The algorithm,for this is still very much under development and am currently testing it for more complex queries. So lot of hair pulling and head banging still to go ;) Prerak ________________________________ From: Paolo Castagna <[email protected]> To: [email protected] Sent: Friday, April 6, 2012 1:30 AM Subject: Re: Loading DBpedia datasets prerak pradhan wrote: > Thanks for the input, Paolo and yes I did use tdbloader2 on ubuntu and than > transfered the directory into windows. Loading seems a lot faster in linux > using tdbloader2. Yep. > I am working on this mini-project which aims to develop that performs > semantic search on DBpedia data, something like Kngine or Hakia on a very > very small scale, I think I have got the natural language processing part > done and am now trying to work on forming SPARQL queries to be run against > Dbpedia dataset based on NLP output on the user entered query. Do you have > any links or resource in this regard? thanks again appreciate it. Oh, well... another "semantic search" project. :-) I do not have more links that you would find from Wikipedia page on "semantic search" and the references there. By the way, Which NLP library are you using? How do you generate a SPARQL query from natural language? In a distant past, I tried to exploit the prefix:keyword pattern that many search engines support, and therefore people might already used to, to do something vaguely similar but much (much) simpler thing (no NLP involved): type:car color:blue model:touran city:london ... It works for very simple queries, but it quickly becomes impractical. It would, however, be an improvement for certain type of searches. Even supporting just type:{book|person|city|...} can be useful. Paolo > > > ________________________________ > From: Paolo Castagna <[email protected]> > To: [email protected] > Sent: Thursday, April 5, 2012 8:36 AM > Subject: Re: Loading DBpedia datasets > > prerak pradhan wrote: >> Hello, there am just starting off with Jena and am pretty new to it. I am >> trying to load all the DBpedia datasets so that I can have a local version >> of DBpedia working on my station here. I used the TDB loader to load the >> data sets while doing so I specified a directory on which to load the >> dataset. I used the following code to query the dataset. >> String directory = "c:/dataset" ; >> DatasetGraphTDB dataset = TDBFactory.createDatasetGraph(directory); >> Graph g1 = dataset.getDefaultGraph(); >> Model newModel = ModelFactory.createModelForGraph(g1); >> String q= "SELECT ?p ?o WHERE { >><http://dbpedia.org/resource/Mendelian_inheritance> ?p ?o . }"; >> Query query = QueryFactory.create(q); >> QueryExecution qexec = QueryExecutionFactory.create(query,newModel); >> ResultSet results = qexec.execSelect(); >> while (results.hasNext()) { >> QuerySolution result = results.nextSolution(); >> RDFNode s = result.get("s"); >> RDFNode p = result.get("p"); >> RDFNode o = result.get("o"); >> System.out.println( " { " + s + " " + p + " " + o + " . }"); >> } >> Now my question is, the DBpedia data dumps come in various files, do I load >> all these files in the same directory using TDB to create one huge model or >> do I need to load it into different directories thus having to create >> different models to query the data. Please not that I do not plan to load >> the whole of DBpedia datasets onto the datastore just the english version of >> Ontology Infobox properties, Titles and Ontology Infobox types. Forgive me >> for my very amateur question but i am just getting started with it ;). > > Hi Prerak, > first of all, welcome on the Jena mailing list. > > DBPedia is one of the "not so small" RDF data dumps around, so it's better you > check you are using a 64 bits OS and JVM and you have a decent amount of RAM > on > that machine. A few more details here: > http://incubator.apache.org/jena/documentation/tdb/jvm_64_32.html > > Then, allow me to suggest you to read about 'RDF dataset' here: > http://www.w3.org/TR/sparql11-query/#rdfDataset > > TDB supports RDF datasets, documentation is here: > http://incubator.apache.org/jena/documentation/tdb/datasets.html > > So, you can load the entire DBPedia data into a single TDB location on disk > (i.e. a single directory). This way, you can run SPARQL queries over it. > This is in my opinion the best option. > > You could use named graphs, read more about N-Quads serialization format here: > http://sw.deri.org/2008/07/n-quads/ > > And, in relation to DBPedia, here: > http://wiki.dbpedia.org/Datasets#h18-18 > > You might decide to create your own named graphs and load parts of DBPedia in > it > to support your data management needs, rather than taking the named graphs > given to you by DBPedia to track provenance. > > Finally, with datasets of the size of DBPedia, tdbloader2 should be a better > choice than tdbloader, but you seems to be using Windows therefore tdbloader2 > is > not a good choice for you. You could have a look at tdbloader3 as well, if you > have problems with tdbloader. But, try first with tdbloader. You can also load > the data of a server with a decent amount of RAM and move the files around as > you need. > > What are you planning to do with DBPedia loaded locally? > > I hope this helps and let me know how it goes, > Paolo
