Hello Marek, Stefano,

There is a little bit of information here about how to load a lot of data (the problems being that the Sesame workbench/browser will time out if it takes too long and OWLIM uses a lot of memory if the transaction size is too big):

https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F

There is also some information here about using the demonstrator program that comes with OWLIM to do this:

https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading

This latter would be my preferred approach, because it allows you control parsing errors in your data, e.g. skip errors or stop, validate literals, etc.

I hope this helps,
barry

Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email: barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com

On 03/28/2013 10:51 PM, Marek Šurek wrote:
Hi,
if you want to see progress in loading, there is and option to use standard "curl" command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell :

curl -X POST -H "Content-Type:application/x-trig" -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements

If you have xml style data, change content type to application/rdf+xml


If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly.

Hope this will help.

Best regards,
Marek

------------------------------------------------------------------------
*From:* Joshua Greben <jgre...@stanford.edu>
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory.

My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '100000000' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository'

The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded.

My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this:

[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1)

Could these "Rio errors" be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed.

Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance!

- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu <mailto:jgre...@stanford.edu>



_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com>
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion




_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

Reply via email to