[ https://issues.apache.org/jira/browse/JENA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne closed JENA-275. ------------------------------ > different query results for tdbloader and tdbloader3 > ---------------------------------------------------- > > Key: JENA-275 > URL: https://issues.apache.org/jira/browse/JENA-275 > Project: Apache Jena > Issue Type: Question > Components: TDB > Affects Versions: TDB 0.9.2 > Reporter: Jon Phillips > Assignee: Andy Seaborne > > I had intended to use tdbloader3 over tdbloader for loading some large data > sets of (> 100 million triples) because I was seening higher sustained > triples-per-second load rates. However, I am running into some immediate > issues running basic queries on the resulting models, even on small toy test > sets. In one simple case, a SPARQL query with a fixed predicate but unbound > subject (excuse my novice grasp of terminology) and objects fails to return > any results for the model loaded with tdbloader3. > Here is the sequence of steps that I ran: > cat dbpedia.nt (list of 10 triples from dbpedia) > <http://dbpedia.org/resource/AccessibleComputing> > <http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en . > <http://dbpedia.org/resource/AfghanistanGeography> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en . > <http://dbpedia.org/resource/AfghanistanHistory> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en . > <http://dbpedia.org/resource/AfghanistanPeople> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en . > <http://dbpedia.org/resource/AfghanistanCommunications> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en . > <http://dbpedia.org/resource/AfghanistanTransportations> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en . > <http://dbpedia.org/resource/AfghanistanMilitary> > <http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en . > <http://dbpedia.org/resource/AfghanistanTransnationalIssues> > <http://www.w3.org/2000/01/rdf-schema#label> > "AfghanistanTransnationalIssues"@en . > <http://dbpedia.org/resource/AmoeboidTaxa> > <http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en . > build the model with tdbloader > tdbloader --loc=dbpedia_tdbl1 dbpedia.nt > 23:18:29 INFO loader :: -- Start triples data phase > 23:18:29 INFO loader :: ** Load empty triples table > 23:18:29 INFO loader :: Load: dbpedia.nt -- 2012/07/11 > 23:18:29 EDT > 23:18:29 INFO loader :: -- Finish triples data phase > 23:18:29 INFO loader :: 9 triples loaded in 0.04 seconds > [Rate: 214.29 per second] > 23:18:29 INFO loader :: -- Start triples index phase > 23:18:29 INFO loader :: ** Index SPO->POS: 9 slots indexed in > 0.00 seconds [Rate: 9,000.00 per second] > 23:18:29 INFO loader :: ** Index SPO->OSP: 9 slots indexed in > 0.00 seconds [Rate: 9,000.00 per second] > 23:18:29 INFO loader :: -- Finish triples index phase > 23:18:29 INFO loader :: ** 9 triples indexed in 0.00 seconds > [Rate: 1,800.00 per second] > 23:18:29 INFO loader :: -- Finish triples load > 23:18:29 INFO loader :: ** Completed: 9 triples loaded in 0.05 > seconds [Rate: 163.64 per second] > now build the same model with tdbloader3 > tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt > 23:18:38 INFO tdbloader3 :: Load: dbpedia.nt -- 2012/07/11 > 23:18:38 EDT > 23:18:38 INFO tdbloader3 :: Node Table (1/3): building nodes.dat > and sorting hash|id ... > 23:18:38 INFO tdbloader3 :: Total: 27 tuples : 0.01 seconds : > 1,928.57 tuples/sec [2012/07/11 23:18:38 EDT] > 23:18:38 INFO tdbloader3 :: Node Table (2/3): generating input > data using node ids... > 23:18:38 INFO tdbloader3 :: Total: 8 tuples : 0.03 seconds : > 275.86 tuples/sec [2012/07/11 23:18:38 EDT] > 23:18:38 INFO tdbloader3 :: Node Table (3/3): building node table > B+Tree index (i.e. node2id.dat and node2id.idn files)... > 23:18:39 INFO tdbloader3 :: Total: 19 tuples : 0.08 seconds : > 234.57 tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating SPO index... > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds : > 1,500.00 tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating GSPO index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for POS index... > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : > 4,500.00 tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating POS index... > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.01 seconds : > 1,125.00 tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for OSP index... > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating OSP index... > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.00 seconds : > 1,800.00 tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for GPOS index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating GPOS index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for GOSP index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating GOSP index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for POSG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating POSG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for OSPG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating OSPG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: sorting data for SPOG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Index: creating SPOG index... > 23:18:39 INFO tdbloader3 :: Total: 0 tuples : 0.00 seconds : 0.00 > tuples/sec [2012/07/11 23:18:39 EDT] > 23:18:39 INFO tdbloader3 :: Total: 9 tuples : 0.45 seconds : 20.18 > tuples/sec [2012/07/11 23:18:39 EDT] > two simple queries that return the entire result set return the same set of > triples: > ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }" > ----------------------------------------------------------------------------------------------------------------------------------------------------- > | x | y > | z | > ===================================================================================================================================================== > | <http://dbpedia.org/resource/AccessibleComputing> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en > | > | <http://dbpedia.org/resource/AfghanistanGeography> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en > | > | <http://dbpedia.org/resource/AfghanistanHistory> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en > | > | <http://dbpedia.org/resource/AfghanistanPeople> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en > | > | <http://dbpedia.org/resource/AfghanistanCommunications> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en > | > | <http://dbpedia.org/resource/AfghanistanTransportations> | > <http://www.w3.org/2000/01/rdf-schema#label> | > "AfghanistanTransportations"@en | > | <http://dbpedia.org/resource/AfghanistanMilitary> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en > | > | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | > <http://www.w3.org/2000/01/rdf-schema#label> | > "AfghanistanTransnationalIssues"@en | > | <http://dbpedia.org/resource/AmoeboidTaxa> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en > | > ----------------------------------------------------------------------------------------------------------------------------------------------------- > same result for the model built with tdbloader3 > ./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y ?z }" > ----------------------------------------------------------------------------------------------------------------------------------------------------- > | x | y > | z | > ===================================================================================================================================================== > | <http://dbpedia.org/resource/AccessibleComputing> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en > | > | <http://dbpedia.org/resource/AfghanistanCommunications> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en > | > | <http://dbpedia.org/resource/AfghanistanGeography> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en > | > | <http://dbpedia.org/resource/AfghanistanHistory> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en > | > | <http://dbpedia.org/resource/AfghanistanMilitary> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en > | > | <http://dbpedia.org/resource/AfghanistanPeople> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en > | > | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | > <http://www.w3.org/2000/01/rdf-schema#label> | > "AfghanistanTransnationalIssues"@en | > | <http://dbpedia.org/resource/AfghanistanTransportations> | > <http://www.w3.org/2000/01/rdf-schema#label> | > "AfghanistanTransportations"@en | > | <http://dbpedia.org/resource/AmoeboidTaxa> | > <http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en > | > ----------------------------------------------------------------------------------------------------------------------------------------------------- > different query run on model build with tdbloader that matches on the > predicate type: > ./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x > <http://www.w3.org/2000/01/rdf-schema#label> ?z }" > ---------------------------------------------------------------------------------------------------------- > | x | y | z > | > ========================================================================================================== > | <http://dbpedia.org/resource/AccessibleComputing> | | > "AccessibleComputing"@en | > | <http://dbpedia.org/resource/AfghanistanGeography> | | > "AfghanistanGeography"@en | > | <http://dbpedia.org/resource/AfghanistanHistory> | | > "AfghanistanHistory"@en | > | <http://dbpedia.org/resource/AfghanistanPeople> | | > "AfghanistanPeople"@en | > | <http://dbpedia.org/resource/AfghanistanCommunications> | | > "AfghanistanCommunications"@en | > | <http://dbpedia.org/resource/AfghanistanTransportations> | | > "AfghanistanTransportations"@en | > | <http://dbpedia.org/resource/AfghanistanMilitary> | | > "AfghanistanMilitary"@en | > | <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | | > "AfghanistanTransnationalIssues"@en | > | <http://dbpedia.org/resource/AmoeboidTaxa> | | > "AmoeboidTaxa"@en | > ---------------------------------------------------------------------------------------------------------- > Expected that the data loaded with tdbloader3 to return the same result but > returned empty result: > tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x > <http://www.w3.org/2000/01/rdf-schema#label> ?z }" > ------------- > | x | y | z | > ============= > ------------- > Any help would be much appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira