[ 
https://issues.apache.org/jira/browse/JENA-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221758#comment-13221758
 ] 

Sarven Capadisli commented on JENA-117:
---------------------------------------

> I have the following in my tdb2.worldbank.ttl: 
> tdb:location "/usr/lib/fuseki/DB/WorldBank" ; 

What else do you have in your config file? What about the union 
unionDefaultGraph settings? 

Yes, I do have it. Here is the contents of tdb2.worldbank.ttl:
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
#[] ja:loadClass "org.openjena.fuseki.BackwardForwardDescribeFactory" .

tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

 <#dataset> rdf:type      tdb:DatasetTDB ;
     tdb:location "/usr/lib/fuseki/DB/WorldBank" ;
     tdb:unionDefaultGraph true ;
     .

Here is what I've found. The main difference between your query and mine was 
that, I've used --desc and you've used --loc. --desc doesn't give me results. 
--loc works! And, when I check -h, hold and behold: -h is not advertised :) 
That explains it I suppose.

BTW, loading all the files in the directory works as well:

java -cp 
target/jena-tdbloader3-0.1-incubating-SNAPSHOT-jar-with-dependencies.jar 
-server -d64 -Xmx2048M cmd.tdbloader3 --no-stats --compression --spill-size 
1500000 --loc /usr/lib/fuseki/DB/WorldBank /tmp/*.nt

Thanks for helping me through this.

I suppose now I have to decide whether to take the N-Quads route or use 
tdbloader2.

Good stuff.
                
> A pure Java version of tdbloader2, a.k.a. tdbloader3
> ----------------------------------------------------
>
>                 Key: JENA-117
>                 URL: https://issues.apache.org/jira/browse/JENA-117
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>            Reporter: Paolo Castagna
>            Assignee: Paolo Castagna
>            Priority: Minor
>              Labels: performance, tdbloader2
>         Attachments: TDB_JENA-117_r1171714.patch
>
>
> There is probably a significant performance improvement for tdbloader2 in 
> replacing the UNIX sort over text files with an external sorting pure Java 
> implementation.
> Since JENA-99 we now have a SortedDataBag which does exactly that.
>     ThresholdPolicyCount<Tuple<Long>> policy = new 
> ThresholdPolicyCount<Tuple<Long>>(1000000);
>     SerializationFactory<Tuple<Long>> serializerFactory = new 
> TupleSerializationFactory();
>     Comparator<Tuple<Long>> comparator = new TupleComparator();
>     SortedDataBag<Tuple<Long>> sortedDataBag = new 
> SortedDataBag<Tuple<Long>>(policy, serializerFactory, comparator);
> TupleSerializationFactory greates TupleInputStream|TupleOutputStream which 
> are wrappers around DataInputStream|DataOutputStream. TupleComparator is 
> trivial.
> Preliminary results seems promising and show that the Java implementation can 
> be faster than UNIX sort since it uses smaller binary files (instead of text 
> files) and it does comparisons of long values rather than strings.
> An example of ExternalSort which compare SortedDataBag vs. UNIX sort is 
> available here:
> https://github.com/castagna/tdbloader3/blob/hadoop-0.20.203.0/src/main/java/com/talis/labs/tdb/tdbloader3/dev/ExternalSort.java
> A further advantage in doing the sorting with Java rather than UNIX sort is 
> that we could stream results directly into the BPlusTreeRewriter rather than 
> on disk and then reading them from disk into the BPlusTreeRewriter.
> I've not done an experiment yet to see if this is actually a significant 
> improvement.
> Using compression for intermediate files might help, but more experiments are 
> necessary to establish if it is worthwhile or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to