Jon Phillips created JENA-275:
---------------------------------

             Summary: different query results for tdbloader and tdbloader3
                 Key: JENA-275
                 URL: https://issues.apache.org/jira/browse/JENA-275
             Project: Apache Jena
          Issue Type: Question
          Components: TDB
    Affects Versions: TDB 0.9.2
            Reporter: Jon Phillips
            Priority: Minor


I had intended to use tdbloader3 over tdbloader for loading some large data 
sets of (> 100 million triples) because I was seening higher sustained 
triples-per-second load rates.  However, I am running into some immediate 
issues running basic queries on the resulting models, even on small toy test 
sets.  In one simple case, a SPARQL query with a fixed predicate but unbound 
subject (excuse my novice grasp of terminology) and objects fails to return any 
results for the model loaded with tdbloader3. 

Here is the sequence of steps that I ran:

cat dbpedia.nt  (list of 10 triples from dbpedia)

<http://dbpedia.org/resource/AccessibleComputing> 
<http://www.w3.org/2000/01/rdf-schema#label> "AccessibleComputing"@en .
<http://dbpedia.org/resource/AfghanistanGeography> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanGeography"@en .
<http://dbpedia.org/resource/AfghanistanHistory> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanHistory"@en .
<http://dbpedia.org/resource/AfghanistanPeople> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanPeople"@en .
<http://dbpedia.org/resource/AfghanistanCommunications> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanCommunications"@en .
<http://dbpedia.org/resource/AfghanistanTransportations> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanTransportations"@en .
<http://dbpedia.org/resource/AfghanistanMilitary> 
<http://www.w3.org/2000/01/rdf-schema#label> "AfghanistanMilitary"@en .
<http://dbpedia.org/resource/AfghanistanTransnationalIssues> 
<http://www.w3.org/2000/01/rdf-schema#label> 
"AfghanistanTransnationalIssues"@en .
<http://dbpedia.org/resource/AmoeboidTaxa> 
<http://www.w3.org/2000/01/rdf-schema#label> "AmoeboidTaxa"@en .

build the model with tdbloader

tdbloader --loc=dbpedia_tdbl1 dbpedia.nt 
23:18:29 INFO  loader               :: -- Start triples data phase
23:18:29 INFO  loader               :: ** Load empty triples table
23:18:29 INFO  loader               :: Load: dbpedia.nt -- 2012/07/11 23:18:29 
EDT
23:18:29 INFO  loader               :: -- Finish triples data phase
23:18:29 INFO  loader               :: 9 triples loaded in 0.04 seconds [Rate: 
214.29 per second]
23:18:29 INFO  loader               :: -- Start triples index phase
23:18:29 INFO  loader               :: ** Index SPO->POS: 9 slots indexed in 
0.00 seconds [Rate: 9,000.00 per second]
23:18:29 INFO  loader               :: ** Index SPO->OSP: 9 slots indexed in 
0.00 seconds [Rate: 9,000.00 per second]
23:18:29 INFO  loader               :: -- Finish triples index phase
23:18:29 INFO  loader               :: ** 9 triples indexed in 0.00 seconds 
[Rate: 1,800.00 per second]
23:18:29 INFO  loader               :: -- Finish triples load
23:18:29 INFO  loader               :: ** Completed: 9 triples loaded in 0.05 
seconds [Rate: 163.64 per second]

now build the same model with tdbloader3

tdbloader3 --loc=dbpedia_tdbl3 dbpedia.nt 
23:18:38 INFO  tdbloader3           :: Load: dbpedia.nt -- 2012/07/11 23:18:38 
EDT
23:18:38 INFO  tdbloader3           :: Node Table (1/3): building nodes.dat and 
sorting hash|id ...
23:18:38 INFO  tdbloader3           :: Total: 27 tuples : 0.01 seconds : 
1,928.57 tuples/sec [2012/07/11 23:18:38 EDT]
23:18:38 INFO  tdbloader3           :: Node Table (2/3): generating input data 
using node ids...
23:18:38 INFO  tdbloader3           :: Total: 8 tuples : 0.03 seconds : 275.86 
tuples/sec [2012/07/11 23:18:38 EDT]
23:18:38 INFO  tdbloader3           :: Node Table (3/3): building node table 
B+Tree index (i.e. node2id.dat and node2id.idn files)...
23:18:39 INFO  tdbloader3           :: Total: 19 tuples : 0.08 seconds : 234.57 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating SPO index...
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 
1,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating GSPO index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for POS index...
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 
4,500.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating POS index...
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.01 seconds : 
1,125.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for OSP index...
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating OSP index...
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.00 seconds : 
1,800.00 tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for GPOS index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating GPOS index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for GOSP index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating GOSP index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for POSG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating POSG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for OSPG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating OSPG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: sorting data for SPOG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Index: creating SPOG index...
23:18:39 INFO  tdbloader3           :: Total: 0 tuples : 0.00 seconds : 0.00 
tuples/sec [2012/07/11 23:18:39 EDT]
23:18:39 INFO  tdbloader3           :: Total: 9 tuples : 0.45 seconds : 20.18 
tuples/sec [2012/07/11 23:18:39 EDT]


two simple queries that return the entire result set return the same set of 
triples:

./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
-----------------------------------------------------------------------------------------------------------------------------------------------------
| x                                                            | y              
                              | z                                   |
=====================================================================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing>            | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en         
   |
| <http://dbpedia.org/resource/AfghanistanGeography>           | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en        
   |
| <http://dbpedia.org/resource/AfghanistanHistory>             | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en          
   |
| <http://dbpedia.org/resource/AfghanistanPeople>              | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en           
   |
| <http://dbpedia.org/resource/AfghanistanCommunications>      | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en   
   |
| <http://dbpedia.org/resource/AfghanistanTransportations>     | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en  
   |
| <http://dbpedia.org/resource/AfghanistanMilitary>            | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en         
   |
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | 
<http://www.w3.org/2000/01/rdf-schema#label> | 
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AmoeboidTaxa>                   | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en                
   |
-----------------------------------------------------------------------------------------------------------------------------------------------------

same result for the model built with tdbloader3

./tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x ?y  ?z }"
-----------------------------------------------------------------------------------------------------------------------------------------------------
| x                                                            | y              
                              | z                                   |
=====================================================================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing>            | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AccessibleComputing"@en         
   |
| <http://dbpedia.org/resource/AfghanistanCommunications>      | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanCommunications"@en   
   |
| <http://dbpedia.org/resource/AfghanistanGeography>           | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanGeography"@en        
   |
| <http://dbpedia.org/resource/AfghanistanHistory>             | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanHistory"@en          
   |
| <http://dbpedia.org/resource/AfghanistanMilitary>            | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanMilitary"@en         
   |
| <http://dbpedia.org/resource/AfghanistanPeople>              | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanPeople"@en           
   |
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> | 
<http://www.w3.org/2000/01/rdf-schema#label> | 
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AfghanistanTransportations>     | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AfghanistanTransportations"@en  
   |
| <http://dbpedia.org/resource/AmoeboidTaxa>                   | 
<http://www.w3.org/2000/01/rdf-schema#label> | "AmoeboidTaxa"@en                
   |
-----------------------------------------------------------------------------------------------------------------------------------------------------


different query run on model build with tdbloader that matches on the predicate 
type:

./tdbquery --loc=dbpedia_tdbl1 "SELECT ?x ?y ?z WHERE { ?x 
<http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
----------------------------------------------------------------------------------------------------------

| x                                                            | y | z          
                         |
==========================================================================================================
| <http://dbpedia.org/resource/AccessibleComputing>            |   | 
"AccessibleComputing"@en            |
| <http://dbpedia.org/resource/AfghanistanGeography>           |   | 
"AfghanistanGeography"@en           |
| <http://dbpedia.org/resource/AfghanistanHistory>             |   | 
"AfghanistanHistory"@en             |
| <http://dbpedia.org/resource/AfghanistanPeople>              |   | 
"AfghanistanPeople"@en              |
| <http://dbpedia.org/resource/AfghanistanCommunications>      |   | 
"AfghanistanCommunications"@en      |
| <http://dbpedia.org/resource/AfghanistanTransportations>     |   | 
"AfghanistanTransportations"@en     |
| <http://dbpedia.org/resource/AfghanistanMilitary>            |   | 
"AfghanistanMilitary"@en            |
| <http://dbpedia.org/resource/AfghanistanTransnationalIssues> |   | 
"AfghanistanTransnationalIssues"@en |
| <http://dbpedia.org/resource/AmoeboidTaxa>                   |   | 
"AmoeboidTaxa"@en                   |
----------------------------------------------------------------------------------------------------------

Expected that the data loaded with tdbloader3 to return the same result but 
returned empty result:

tdbquery --loc=dbpedia_tdbl3 "SELECT ?x ?y ?z WHERE { ?x 
<http://www.w3.org/2000/01/rdf-schema#label>  ?z }"
-------------
| x | y | z |
=============
-------------

Any help would be much appreciated.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to