Hello Marek, Stefano,
There is a little bit of information here about how to load a lot of
data (the problems being that the Sesame workbench/browser will time out
if it takes too long and OWLIM uses a lot of memory if the transaction
size is too big):
https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F
There is also some information here about using the demonstrator program
that comes with OWLIM to do this:
https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading
This latter would be my preferred approach, because it allows you
control parsing errors in your data, e.g. skip errors or stop, validate
literals, etc.
I hope this helps,
barry
Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email: barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com
On 03/28/2013 10:51 PM, Marek Šurek wrote:
Hi,
if you want to see progress in loading, there is and option to use
standard "curl" command instead of openrdf-workbench. It gives you
some information what is already loaded.
To load files into owlim(from .trig file), run this command in your
linux shell :
curl -X POST -H "Content-Type:application/x-trig" -T
/path/to/data/datafile.trig
localhost:8080/openrdf-sesame/repositories/repository-name/statements
If you have xml style data, change content type to application/rdf+xml
If you load big amount of data, I recommend to use configuration.xls
which is part of OWLIM-SE.zip. It can help you to set datastore properly.
Hope this will help.
Best regards,
Marek
------------------------------------------------------------------------
*From:* Joshua Greben <jgre...@stanford.edu>
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hello all,
I am new to this list and to OWLIM-SE and was wondering if anyone
could offer advice for loading a large triple store. I am trying to
load 670M triples into a repository using the openrdf-sesame workbench
under tomcat6 on a single linux VM with 64-bit hardware and 64GB of
memory.
My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m
Here is the log info for my repository configuration:
...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured
parameter 'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured
parameter 'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured
parameter 'entity-index-size' to '100000000'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured
parameter 'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured
parameter 'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages
for tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages
for predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured
parameter 'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured
parameter 'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured
parameter 'repository-type' to 'file-repository'
The loading came to a standstill after 19 hours and tomcat threw an
OutOfMemoryError: GC overhead limit exceeded.
My question is what the application is doing with all this memory
and whether I configured my instance correctly for this load to
finish. I also see a lot of entries in the main log such as this:
[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error]
Unescaped backslash in: L\'ambassadrice (314764886, -1)
Could these "Rio errors" be contributing to my troubles? I was also
wondering if there was a way to configure logging to be able to track
the application's progress. Right now these warnings are the only way
I can tell how far the loading has progressed.
Advice from anyone who has experience successfully loading a large
triplestore is much appreciated! Thanks in advance!
- Josh
Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu <mailto:jgre...@stanford.edu>
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com>
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion