Jon Phillips created JENA-275: --------------------------------- Summary: different query results for tdbloader and tdbloader3 Key: JENA-275 URL: https://issues.apache.org/jira/browse/JENA-275 Project: Apache Jena Issue Type: Question Components: TDB Affects Versions: TDB 0.9.2 Reporter: Jon Phillips Priority: Minor
I had intended to use tdbloader3 over tdbloader for loading some large data sets of (> 100 million triples) because I was seening higher sustained triples-per-second load rates. However, I am running into some immediate issues running basic queries on the resulting models, even on small toy test sets. In one simple case, a SPARQL query with a fixed predicate but unbound subject (excuse my novice grasp of terminology) and objects fails to return any results for the model loaded with tdbloader3. Here is the sequence of steps that I ran: cat dbpedia.nt (list of 10 triples from dbpedia) <http://dbpedia.org/resource/AccessibleComputing> <http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en . <http://dbpedia.org/resource/AfghanistanGeography> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en . <http://dbpedia.org/resource/AfghanistanHistory> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en . <http://dbpedia.org/resource/AfghanistanPeople> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en . <http://dbpedia.org/resource/AfghanistanCommunications> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en . <http://dbpedia.org/resource/AfghanistanTransportations> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en . <http://dbpedia.org/resource/AfghanistanMilitary> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en . <http://dbpedia.org/resource/AfghanistanTransnationalIssues> <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransnationalIssues"@en . <http://dbpedia.org/resource/AmoeboidTaxa> <http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en . build the model with tdbloader tdbloader --loc=dbpedia_tdbl1 dbpedia.nt 23:18:29 INFO loader :: -- Start triples data phase 23:18:29 INFO loader :: ** Load empty triples table 23:18:29 INFO loader :: Load: dbpedia.nt -- 2012/07/11 23:18:29 EDT 23:18:29 INFO loader :: -- Finish triples data phase 23:18:29 INFO loader :: 9 triples loaded in 0.04 seconds [Rate: 214.29 per second] 23:18:29 INFO loader :: -- Start triples index phase 23:18:29 INFO loader :: ** Index SPO->POS: 9 slots indexed in 0.00 seconds [Rate: 9,000.00 per second] 23:18:29 INFO loader :: ** Index SPO->OSP: 9 slots indexed in 0.00 seconds [Rate: 9,000.00 per second] 23:18:29 INFO loader :: -- Finish triples index phase 23:18:29 INFO loader :: ** 9 triples indexed in 0.00 seconds [Rate: 1,800.00 per second] 23:18:29 INFO loader :: -- Finish triples load 23:18:29 INFO loader :: ** Completed: 9 triples loaded in 0.05 seconds [Rate: 163.64 per second] now build the same model with tdbloader3 tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt 23:18:38 INFO tdbloader3 :: Load: dbpedia.nt -- 2012/07/11 23:18:38 EDT 23:18:38 INFO tdbloader3 :: Node Table (1/3): building nodes.dat and sorting hash|id ... 23:18:38 INFO tdbloader3 :: Total: 27 tuples : 0.01 seconds : 1,928.57 tuples/sec [2012/07/11 23:18:38 EDT] 23:18:38 INFO tdbloader3 :: Node Table (2/3): generating input data using node ids... 23:18:38 INFO tdbloader3 :: Total: 8 tuples : 0.03 seconds : 275.86 tuples/sec [2012/07/11 23:18:38 EDT] 23:18:38 INFO tdbloader3 :: Node Table (3/3): building node table B+Tree index (i.e. node2id.dat and node2id.idn files)... 23:18:39 INFO tdbloader3 :: Total: 19 tuples : 0.08 seconds : 234.57 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating SPO index... 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds : 1,500.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating GSPO index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for POS index... 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 4,500.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating POS index... 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds : 1,125.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for OSP index... 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating OSP index... 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 1,800.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for GPOS index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating GPOS index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for GOSP index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating GOSP index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for POSG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating POSG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for OSPG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating OSPG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: sorting data for SPOG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Index: creating SPOG index... 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 tuples/sec [2012/07/11 23:18:39 EDT] 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.45 seconds : 20.18 tuples/sec [2012/07/11 23:18:39 EDT] two simple queries that return the entire result set return the same set of triples: ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }" ----------------------------------------------------------------------------------------------------------------------------------------------------- | x | y | z | ===================================================================================================================================================== | <http://dbpedia.org/resource/AccessibleComputing> | <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en | | <http://dbpedia.org/resource/AfghanistanGeography> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en | | <http://dbpedia.org/resource/AfghanistanHistory> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en | | <http://dbpedia.org/resource/AfghanistanPeople> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en | | <http://dbpedia.org/resource/AfghanistanCommunications> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en | | <http://dbpedia.org/resource/AfghanistanTransportations> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en | | <http://dbpedia.org/resource/AfghanistanMilitary> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en | | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransnationalIssues"@en | | <http://dbpedia.org/resource/AmoeboidTaxa> | <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en | ----------------------------------------------------------------------------------------------------------------------------------------------------- same result for the model built with tdbloader3 ./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }" ----------------------------------------------------------------------------------------------------------------------------------------------------- | x | y | z | ===================================================================================================================================================== | <http://dbpedia.org/resource/AccessibleComputing> | <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en | | <http://dbpedia.org/resource/AfghanistanCommunications> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en | | <http://dbpedia.org/resource/AfghanistanGeography> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en | | <http://dbpedia.org/resource/AfghanistanHistory> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en | | <http://dbpedia.org/resource/AfghanistanMilitary> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en | | <http://dbpedia.org/resource/AfghanistanPeople> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en | | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransnationalIssues"@en | | <http://dbpedia.org/resource/AfghanistanTransportations> | <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en | | <http://dbpedia.org/resource/AmoeboidTaxa> | <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en | ----------------------------------------------------------------------------------------------------------------------------------------------------- different query run on model build with tdbloader that matches on the predicate type: ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x <http://www.w3.org/2000/01/rdf-schema#label> ?z }" ---------------------------------------------------------------------------------------------------------- | x | y | z | ========================================================================================================== | <http://dbpedia.org/resource/AccessibleComputing> | | "AccessibleComputing"@en | | <http://dbpedia.org/resource/AfghanistanGeography> | | "AfghanistanGeography"@en | | <http://dbpedia.org/resource/AfghanistanHistory> | | "AfghanistanHistory"@en | | <http://dbpedia.org/resource/AfghanistanPeople> | | "AfghanistanPeople"@en | | <http://dbpedia.org/resource/AfghanistanCommunications> | | "AfghanistanCommunications"@en | | <http://dbpedia.org/resource/AfghanistanTransportations> | | "AfghanistanTransportations"@en | | <http://dbpedia.org/resource/AfghanistanMilitary> | | "AfghanistanMilitary"@en | | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | | "AfghanistanTransnationalIssues"@en | | <http://dbpedia.org/resource/AmoeboidTaxa> | | "AmoeboidTaxa"@en | ---------------------------------------------------------------------------------------------------------- Expected that the data loaded with tdbloader3 to return the same result but returned empty result: tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x <http://www.w3.org/2000/01/rdf-schema#label> ?z }" ------------- | x | y | z | ============= ------------- Any help would be much appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira