Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

Barry Bishop Fri, 29 Mar 2013 08:46:21 -0700

Hello Marek, Stefano,

There is a little bit of information here about how to load a lot ofdata (the problems being that the Sesame workbench/browser will time outif it takes too long and OWLIM uses a lot of memory if the transactionsize is too big):


https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F

There is also some information here about using the demonstrator programthat comes with OWLIM to do this:


https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading

This latter would be my preferred approach, because it allows youcontrol parsing errors in your data, e.g. skip errors or stop, validateliterals, etc.


I hope this helps,
barry

Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email: barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com

On 03/28/2013 10:51 PM, Marek Šurek wrote:

Hi,
if you want to see progress in loading, there is and option to usestandard "curl" command instead of openrdf-workbench. It gives yousome information what is already loaded.To load files into owlim(from .trig file), run this command in yourlinux shell :
curl -X POST -H "Content-Type:application/x-trig" -T/path/to/data/datafile.triglocalhost:8080/openrdf-sesame/repositories/repository-name/statements
If you have xml style data, change content type to application/rdf+xml
If you load big amount of data, I recommend to use configuration.xlswhich is part of OWLIM-SE.zip. It can help you to set datastore properly.
Hope this will help.

Best regards,
Marek

------------------------------------------------------------------------
*From:* Joshua Greben <jgre...@stanford.edu>
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

Hello all,
I am new to this list and to OWLIM-SE and was wondering if anyonecould offer advice for loading a large triple store. I am trying toload 670M triples into a repository using the openrdf-sesame workbenchunder tomcat6 on a single linux VM with 64-bit hardware and 64GB ofmemory.
My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configuredparameter 'entity-id-size' to '32'[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configuredparameter 'enable-context-index' to 'false'[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configuredparameter 'entity-index-size' to '100000000'[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configuredparameter 'tuple-index-memory' to '1600m'[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configuredparameter 'cache-memory' to '3200m'[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pagesfor tuples: 83886[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pagesfor predicates: 0[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configuredparameter 'storage-folder' to 'storage'[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configuredparameter 'in-memory-literal-properties' to 'false'[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configuredparameter 'repository-type' to 'file-repository'
The loading came to a standstill after 19 hours and tomcat threw anOutOfMemoryError: GC overhead limit exceeded.
My question is what the application is doing with all this memoryand whether I configured my instance correctly for this load tofinish. I also see a lot of entries in the main log such as this:
[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error]Unescaped backslash in: L\'ambassadrice (314764886, -1)
Could these "Rio errors" be contributing to my troubles? I was alsowondering if there was a way to configure logging to be able to trackthe application's progress. Right now these warnings are the only wayI can tell how far the loading has progressed.
Advice from anyone who has experience successfully loading a largetriplestore is much appreciated! Thanks in advance!
- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu <mailto:jgre...@stanford.edu>



_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com>
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion




_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

Reply via email to