Hi All, I did bulk loading through the command line utility (with time) after making some tuning with the Mysql(buffer_pool and key buffer size) and got the final loading time as roughly 5 and half hours for ~24 million triples which seems okay to me. It took 4 hours to index this dataset.
Any comments here?? I am now querying this dataset through command utility again where the resulting tuples are printed along with the execution time, I would like to know if this execution time includes the printing time as well (which I would *not* prefer), kindly let me know this.. Thanks to all of you for you advices, it was very helpful to me :) BR, Shri On Fri, Sep 30, 2011 at 2:54 AM, Shri :) <[email protected]> wrote: > Hello, Sorry my dataset is in .NT format.. > > > On Fri, Sep 30, 2011 at 2:52 AM, Shri :) <[email protected]> wrote: > >> Hi All, >> >> >> @Damian thanks for the link, I will now try increasing the >> buffer_pool_size and carry out the loading..Will let you know how it goes. >> >> @ Andy: Are you using the sdb bulk loader or loading via your own code?What >> format is the data in? >> But why not use the sdbload tool? Take the source code and add whatever >> extras timing you need (it already can print some timing info). >> >> >> I am using the following code, which I don't think it is very different >> from the one that you suggested, *my data is in .TTL format* >> Here is the snippet of my code: >> >> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn = >> new DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection(); >> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ; >> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model= >> SDBFactory.connectDefaultModel(store); //read data into the database >> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start = >> System.currentTimeMillis(); model.read(inn, "localhost", "TTL"); >> loadtime=ext.elapsedTime(start); // Close the database connection >> store.close(); System.out.println("Loading time: " + loadtime); >> >> >> >> @Dave I think I followed the pattern suggested in the link that you gave >> me (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet >> of my source code. >> And one more thing, I didn't get the idea of "Are you wrapping the load >> in a transaction to avoid auto-commit costs?", can you please elaborate a >> bit on this?? Sorry, I am relatively a novice.. >> >> >> Any thoughts over this? thank you very much! :) >> >> BR, >> shri >> >> >> >> >> >> >> >> >> On Thu, Sep 29, 2011 at 12:00 AM, Shri :) <[email protected]> wrote: >> >>> * >>> * >>> >>> Hi Again, >>> >>> I supposed to evaluate the performance of few triple stores as a part of >>> my thesis work (which is the specification which I cannot change >>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java >>> code to load the data and not the command line tool, as I wanted to make >>> note of the loading time. I am using .NT format of data for loading. >>> >>> I have a 8 GB RAM >>> >>> any thoughts/suggestion over this? thanks for your help. >>> >>> >>> >>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <[email protected]> wrote: >>> >>>> Hi Everyone, >>>> >>>> I am currently doing my master thesis wherein I have to work with Jena >>>> SDB using mySQL as a backend store. I have around 25 million triples to >>>> load >>>> which has taken more than 5 days to load in windows platform, whereas >>>> according to the Berlin Benchmark, it took only 4 hours to load the same >>>> number of triples but in Linux platform, this has left me confused..is the >>>> enormous difference because of the difference in the platform or should I >>>> do >>>> any performance tuning/optimization to improve the load time?? >>>> >>>> kindly give your suggestions/comments >>>> >>>> P.S I am using WAMP >>>> >>>> >>>> Thanks >>>> >>>> Shridevika >>>> >>> >>> >> >
