[Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-28 Thread Joshua Greben
Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone could offer 
advice for loading a large triple store. I am trying to load 670M triples into 
a repository using the openrdf-sesame workbench under tomcat6 on a single linux 
VM with 64-bit hardware and 64GB of memory.  

My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 
'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 
'repository-type' to 'file-repository'

The loading came to a standstill after 19 hours and tomcat threw an 
OutOfMemoryError: GC overhead limit exceeded. 

My question is what the application is doing with all this memory and whether I 
configured my instance correctly for this load to finish.  I also see a lot of 
entries in the main log such as this:

[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
Unescaped backslash in: L\'ambassadrice (314764886, -1)

Could these "Rio errors" be contributing to my troubles? I was also wondering 
if there was a way to configure logging to be able to track the application's 
progress. Right now these warnings are the only way I can tell how far the 
loading has progressed.

Advice from anyone who has experience successfully loading a large triplestore 
is much appreciated! Thanks in advance!

- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu


___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-28 Thread Marek Šurek
Hi,
if you want to see progress in loading, there is and option to use standard 
"curl" command instead of openrdf-workbench. It gives you some information what 
is already loaded.
To load files into owlim(from .trig file), run this command in your linux shell 
:

curl -X POST -H "Content-Type:application/x-trig" -T 
/path/to/data/datafile.trig 
localhost:8080/openrdf-sesame/repositories/repository-name/statements

If you have xml style data, change content type to application/rdf+xml 



If you load big amount of data, I recommend to use configuration.xls which is 
part of OWLIM-SE.zip. It can help you to set datastore properly.

Hope this will help.

Best regards,
Marek




 From: Joshua Greben 
To: owlim-discussion@ontotext.com 
Sent: Thursday, 28 March 2013, 22:30
Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
 

Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone could offer 
advice for loading a large triple store. I am trying to load 670M triples into 
a repository using the openrdf-sesame workbench under tomcat6 on a single linux 
VM with 64-bit hardware and 64GB of memory.  

My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:


...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 
'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 
'repository-type' to 'file-repository'

The loading came to a standstill after 19 hours and tomcat threw an 
OutOfMemoryError: GC overhead limit exceeded. 

My question is what the application is doing with all this memory and whether I 
configured my instance correctly for this load to finish.  I also see a lot of 
entries in the main log such as this:

[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
Unescaped backslash in: L\'ambassadrice (314764886, -1)

Could these "Rio errors" be contributing to my troubles? I was also wondering 
if there was a way to configure logging to be able to track the application's 
progress. Right now these warnings are the only way I can tell how far the 
loading has progressed.

Advice from anyone who has experience successfully loading a large triplestore 
is much appreciated! Thanks in advance!

- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries                
(650) 714-1937
jgre...@stanford.edu

 

___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-29 Thread Barry Bishop

Hello Marek, Stefano,

There is a little bit of information here about how to load a lot of 
data (the problems being that the Sesame workbench/browser will time out 
if it takes too long and OWLIM uses a lot of memory if the transaction 
size is too big):


https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F

There is also some information here about using the demonstrator program 
that comes with OWLIM to do this:


https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading

This latter would be my preferred approach, because it allows you 
control parsing errors in your data, e.g. skip errors or stop, validate 
literals, etc.


I hope this helps,
barry

Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email: barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com

On 03/28/2013 10:51 PM, Marek Šurek wrote:

Hi,
if you want to see progress in loading, there is and option to use 
standard "curl" command instead of openrdf-workbench. It gives you 
some information what is already loaded.
To load files into owlim(from .trig file), run this command in your 
linux shell :


curl -X POST -H "Content-Type:application/x-trig" -T 
/path/to/data/datafile.trig 
localhost:8080/openrdf-sesame/repositories/repository-name/statements


If you have xml style data, change content type to application/rdf+xml


If you load big amount of data, I recommend to use configuration.xls 
which is part of OWLIM-SE.zip. It can help you to set datastore properly.


Hope this will help.

Best regards,
Marek


*From:* Joshua Greben 
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone 
could offer advice for loading a large triple store. I am trying to 
load 670M triples into a repository using the openrdf-sesame workbench 
under tomcat6 on a single linux VM with 64-bit hardware and 64GB of 
memory.


My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
parameter 'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
parameter 'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
parameter 'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
parameter 'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
parameter 'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages 
for tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages 
for predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
parameter 'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured 
parameter 'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured 
parameter 'repository-type' to 'file-repository'


The loading came to a standstill after 19 hours and tomcat threw an 
OutOfMemoryError: GC overhead limit exceeded.


My question is what the application is doing with all this memory 
and whether I configured my instance correctly for this load to 
finish.  I also see a lot of entries in the main log such as this:


[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
Unescaped backslash in: L\'ambassadrice (314764886, -1)


Could these "Rio errors" be contributing to my troubles? I was also 
wondering if there was a way to configure logging to be able to track 
the application's progress. Right now these warnings are the only way 
I can tell how far the loading has progressed.


Advice from anyone who has experience successfully loading a large 
triplestore is much appreciated! Thanks in advance!


- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu <mailto:jgre...@stanford.edu>



___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com>
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion




___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-29 Thread Joshua Greben
Thanks for the advice! 

I used the spreadsheet and was able to size the application correctly. 17 hours 
later my rdf+xml triple file is 80% loaded. It looks like it might still take 
up to another 11 hours to finish, but again, this is based on my reading of 
"unescaped backslash" errors that are logged and timestamped with the file line 
number. 

I am still running this under tomcat using the workbench because the CURL 
command threw the following error: MALFORMED DATA: Element type "http:" must be 
followed by either attribute specifications, ">" or "/>".  I might try it again 
later using   curl --data-urlencode -T /path/to/data/data.nt ... to see if that 
helps, but I just wanted to get something running overnight.

Thanks again!

- Josh

It seems that the workbench application is better able to handle these 
On Mar 28, 2013, at 2:51 PM, Marek Šurek wrote:

> Hi,
> if you want to see progress in loading, there is and option to use standard 
> "curl" command instead of openrdf-workbench. It gives you some information 
> what is already loaded.
> To load files into owlim(from .trig file), run this command in your linux 
> shell :
> 
> curl -X POST -H "Content-Type:application/x-trig" -T 
> /path/to/data/datafile.trig 
> localhost:8080/openrdf-sesame/repositories/repository-name/statements
> 
> If you have xml style data, change content type to application/rdf+xml 
> 
> 
> If you load big amount of data, I recommend to use configuration.xls which is 
> part of OWLIM-SE.zip. It can help you to set datastore properly.
> 
> Hope this will help.
> 
> Best regards,
> Marek
> 
> From: Joshua Greben 
> To: owlim-discussion@ontotext.com 
> Sent: Thursday, 28 March 2013, 22:30
> Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
> 
> Hello all,
> 
> I am new to this list and to OWLIM-SE and was wondering if anyone could offer 
> advice for loading a large triple store. I am trying to load 670M triples 
> into a repository using the openrdf-sesame workbench under tomcat6 on a 
> single linux VM with 64-bit hardware and 64GB of memory.  
> 
> My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m
> 
> Here is the log info for my repository configuration:
> 
> ...
> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
> parameter 'entity-id-size' to '32'
> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
> parameter 'enable-context-index' to 'false'
> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
> parameter 'entity-index-size' to '1'
> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
> parameter 'tuple-index-memory' to '1600m'
> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
> parameter 'cache-memory' to '3200m'
> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
> tuples: 83886
> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
> predicates: 0
> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
> parameter 'storage-folder' to 'storage'
> [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured 
> parameter 'in-memory-literal-properties' to 'false'
> [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured 
> parameter 'repository-type' to 'file-repository'
> 
> The loading came to a standstill after 19 hours and tomcat threw an 
> OutOfMemoryError: GC overhead limit exceeded. 
> 
> My question is what the application is doing with all this memory and whether 
> I configured my instance correctly for this load to finish.  I also see a lot 
> of entries in the main log such as this:
> 
>   [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
> Unescaped backslash in: L\'ambassadrice (314764886, -1)
> 
> Could these "Rio errors" be contributing to my troubles? I was also wondering 
> if there was a way to configure logging to be able to track the application's 
> progress. Right now these warnings are the only way I can tell how far the 
> loading has progressed.
> 
> Advice from anyone who has experience successfully loading a large 
> triplestore is much appreciated! Thanks in advance!
> 
> - Josh
> 
> 
> Joshua Greben
> Library Systems Programmer & Analyst
> Stanford University Libraries
> (650) 714-1937
> jgre...@stanford.edu
> 
> 
> 
> ___
> Owlim-discussion mailing list
> Owlim-discussion@ontotext.com
> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
> 
> 

___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-04-09 Thread Joshua Greben
Hi Barry,

Following you advice I ran the load using the example.sh script and pointing to 
my repository on localhost:8080. The load ran fine for 7 hours, but then it 
gave up with the following error in the main log:

[ERROR] 2013-04-08 20:13:18,019 [repositories/BFWorks_STF] Error while 
handling request (500): java.net.SocketTimeoutException: Read timed out

I noticed that tomcat's connectionTimeout param was at the default (20sec.) so 
I considered increasing it to 10 minutes. Any advice on this?

Also, once this error happened I am unable to do anything with the repository 
except view the Contexts in Repository (via the workbench). When I try to clear 
the contexts to start over from scratch it takes a very long time and then I 
end up getting:

javax.servlet.ServletException: 
org.openrdf.repository.RepositoryException: java.io.EOFException

At this point I am forced to kill the tomcat process and delete the repository 
forcibly.


I then tried creating a repository using the sesame_owlim console. but I keep 
getting 

ERROR: No template called BFWorks.ttl found in 
/storage/openrdf-sesame-console/templates

even though I have a BFWorks.ttl file in that directory.

Any help/advice is appreciated.

 -Josh

On Mar 29, 2013, at 8:46 AM, Barry Bishop wrote:

> Hello Marek, Stefano,
> 
> There is a little bit of information here about how to load a lot of data 
> (the problems being that the Sesame workbench/browser will time out if it 
> takes too long and OWLIM uses a lot of memory if the transaction size is too 
> big):
> 
> https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F
> 
> There is also some information here about using the demonstrator program that 
> comes with OWLIM to do this:
> 
> https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading
> 
> This latter would be my preferred approach, because it allows you control 
> parsing errors in your data, e.g. skip errors or stop, validate literals, etc.
> 
> I hope this helps,
> barry
> 
> Barry Bishop
> OWLIM Product Manager
> Ontotext AD
> Tel: +43 650 2000 237
> email: barry.bis...@ontotext.com
> skype: bazbishop
> www.ontotext.com
> On 03/28/2013 10:51 PM, Marek Šurek wrote:
>> Hi,
>> if you want to see progress in loading, there is and option to use standard 
>> "curl" command instead of openrdf-workbench. It gives you some information 
>> what is already loaded.
>> To load files into owlim(from .trig file), run this command in your linux 
>> shell :
>> 
>> curl -X POST -H "Content-Type:application/x-trig" -T 
>> /path/to/data/datafile.trig 
>> localhost:8080/openrdf-sesame/repositories/repository-name/statements
>> 
>> If you have xml style data, change content type to application/rdf+xml 
>> 
>> 
>> If you load big amount of data, I recommend to use configuration.xls which 
>> is part of OWLIM-SE.zip. It can help you to set datastore properly.
>> 
>> Hope this will help.
>> 
>> Best regards,
>> Marek
>> 
>> From: Joshua Greben 
>> To: owlim-discussion@ontotext.com 
>> Sent: Thursday, 28 March 2013, 22:30
>> Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
>> 
>> Hello all,
>> 
>> I am new to this list and to OWLIM-SE and was wondering if anyone could 
>> offer advice for loading a large triple store. I am trying to load 670M 
>> triples into a repository using the openrdf-sesame workbench under tomcat6 
>> on a single linux VM with 64-bit hardware and 64GB of memory.  
>> 
>> My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m
>> 
>> Here is the log info for my repository configuration:
>> 
>> ...
>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>> parameter 'entity-id-size' to '32'
>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>> parameter 'enable-context-index' to 'false'
>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>> parameter 'entity-index-size' to '1'
>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>> parameter 'tuple-index-memory' to '1600m'
>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
>> parameter 'cache-memory' to '3200m'
>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
>> tuples: 83886
>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pa

Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-04-09 Thread Barry Bishop

Hi Joshua,

Sorry to hear that you are still having problems loading data. Looking 
more closely, I think you have a less than optimal memory configuration:


Java heap 32G
'tuple-index-memory' to '1600m'
'cache-memory' to '3200m'

I suggest you increase the last two parameters to something more like 
'10G' or possibly even 15G for loading.


More comments inline:

On 04/09/2013 09:47 PM, Joshua Greben wrote:

Hi Barry,

Following you advice I ran the load using the example.sh script and 
pointing to my repository on localhost:8080. The load ran fine for 7 
hours, but then it gave up with the following error in the main log:


[ERROR] 2013-04-08 20:13:18,019 [repositories/BFWorks_STF] Error while 
handling request (500): java.net.SocketTimeoutException: Read timed out


I don't have the full stack trace, but I guess this is because 
successive commit operations are taking longer and longer (not much 
memory for the cache) and eventually one takes too long and this error 
occurs.




I noticed that tomcat's connectionTimeout param was at the default 
(20sec.) so I considered increasing it to 10 minutes. Any advice on this?


I don't think this will hurt, so I agree that increasing this would be a 
good idea.




Also, once this error happened I am unable to do anything with the 
repository except view the Contexts in Repository (via the workbench). 
When I try to clear the contexts to start over from scratch it takes a 
very long time and then I end up getting:


javax.servlet.ServletException: 
org.openrdf.repository.RepositoryException: java.io.EOFException


A full stack trace would be really useful here.



At this point I am forced to kill the tomcat process and delete the 
repository forcibly.




It could be that OWLIM is still busily trying to commit a large 
transaction with materialisation of inferences (lots of random index 
lookups), so killing tomcat would quite possibly leave the storage files 
in an inconsistent state.




I then tried creating a repository using the sesame_owlim console. but 
I keep getting


ERROR: No template called BFWorks.ttl found in 
/storage/openrdf-sesame-console/templates


even though I have a BFWorks.ttl file in that directory.


Not sure about this one. I believe it is the client (not the server) 
that needs to be able to load this template file. Is there a permissions 
problem? Are you overriding the default location for loading template files?




Any help/advice is appreciated.

 -Josh


All the best,
barry



On Mar 29, 2013, at 8:46 AM, Barry Bishop wrote:


Hello Marek, Stefano,

There is a little bit of information here about how to load a lot of 
data (the problems being that the Sesame workbench/browser will time 
out if it takes too long and OWLIM uses a lot of memory if the 
transaction size is too big):


https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F

There is also some information here about using the demonstrator 
program that comes with OWLIM to do this:


https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading

This latter would be my preferred approach, because it allows you 
control parsing errors in your data, e.g. skip errors or stop, 
validate literals, etc.


I hope this helps,
barry

Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email:barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com
On 03/28/2013 10:51 PM, Marek Šurek wrote:

Hi,
if you want to see progress in loading, there is and option to use 
standard "curl" command instead of openrdf-workbench. It gives you 
some information what is already loaded.
To load files into owlim(from .trig file), run this command in your 
linux shell :


curl -X POST -H "Content-Type:application/x-trig" -T 
/path/to/data/datafile.trig 
localhost:8080/openrdf-sesame/repositories/repository-name/statements


If you have xml style data, change content type to application/rdf+xml


If you load big amount of data, I recommend to use configuration.xls 
which is part of OWLIM-SE.zip. It can help you to set datastore 
properly.


Hope this will help.

Best regards,
Marek


*From:* Joshua Greben 
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using 
OWLIM-SE


Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone 
could offer advice for loading a large triple store. I am trying to 
load 670M triples into a repository using the openrdf-sesame 
workbench under tomcat6 on a single linux VM with 64-bit hardware 
and 64GB of memory.


My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:0

Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-04-16 Thread Joshua Greben
OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F
>>> 
>>> There is also some information here about using the demonstrator program 
>>> that comes with OWLIM to do this:
>>> 
>>> https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading
>>> 
>>> This latter would be my preferred approach, because it allows you control 
>>> parsing errors in your data, e.g. skip errors or stop, validate literals, 
>>> etc.
>>> 
>>> I hope this helps,
>>> barry
>>> 
>>> Barry Bishop
>>> OWLIM Product Manager
>>> Ontotext AD
>>> Tel: +43 650 2000 237
>>> email: barry.bis...@ontotext.com
>>> skype: bazbishop
>>> www.ontotext.com
>>> On 03/28/2013 10:51 PM, Marek Šurek wrote:
>>>> Hi,
>>>> if you want to see progress in loading, there is and option to use 
>>>> standard "curl" command instead of openrdf-workbench. It gives you some 
>>>> information what is already loaded.
>>>> To load files into owlim(from .trig file), run this command in your linux 
>>>> shell :
>>>> 
>>>> curl -X POST -H "Content-Type:application/x-trig" -T 
>>>> /path/to/data/datafile.trig 
>>>> localhost:8080/openrdf-sesame/repositories/repository-name/statements
>>>> 
>>>> If you have xml style data, change content type to application/rdf+xml 
>>>> 
>>>> 
>>>> If you load big amount of data, I recommend to use configuration.xls which 
>>>> is part of OWLIM-SE.zip. It can help you to set datastore properly.
>>>> 
>>>> Hope this will help.
>>>> 
>>>> Best regards,
>>>> Marek
>>>> 
>>>> From: Joshua Greben 
>>>> To: owlim-discussion@ontotext.com 
>>>> Sent: Thursday, 28 March 2013, 22:30
>>>> Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
>>>> 
>>>> Hello all,
>>>> 
>>>> I am new to this list and to OWLIM-SE and was wondering if anyone could 
>>>> offer advice for loading a large triple store. I am trying to load 670M 
>>>> triples into a repository using the openrdf-sesame workbench under tomcat6 
>>>> on a single linux VM with 64-bit hardware and 64GB of memory.  
>>>> 
>>>> My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m
>>>> 
>>>> Here is the log info for my repository configuration:
>>>> 
>>>> ...
>>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>>>> parameter 'entity-id-size' to '32'
>>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>>>> parameter 'enable-context-index' to 'false'
>>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>>>> parameter 'entity-index-size' to '1'
>>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured 
>>>> parameter 'tuple-index-memory' to '1600m'
>>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
>>>> parameter 'cache-memory' to '3200m'
>>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
>>>> tuples: 83886
>>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
>>>> predicates: 0
>>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured 
>>>> parameter 'storage-folder' to 'storage'
>>>> [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured 
>>>> parameter 'in-memory-literal-properties' to 'false'
>>>> [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured 
>>>> parameter 'repository-type' to 'file-repository'
>>>> 
>>>> The loading came to a standstill after 19 hours and tomcat threw an 
>>>> OutOfMemoryError: GC overhead limit exceeded. 
>>>> 
>>>> My question is what the application is doing with all this memory and 
>>>> whether I configured my instance correctly for this load to finish.  I 
>>>> also see a lot of entries in the main log such as this:
>>>> 
>>>>  [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
>>>> Unescaped backslash in: L\'ambassadrice (314764886, -1)
>>>> 
>>>> Could these "Rio errors" be contributing to my troubles? I was also 
>>>> wondering if there was a way to configure logging to be able to track the 
>>>> application's progress. Right now these warnings are the only way I can 
>>>> tell how far the loading has progressed.
>>>> 
>>>> Advice from anyone who has experience successfully loading a large 
>>>> triplestore is much appreciated! Thanks in advance!
>>>> 
>>>> - Josh
>>>> 
>>>> 
>>>> Joshua Greben
>>>> Library Systems Programmer & Analyst
>>>> Stanford University Libraries
>>>> (650) 714-1937
>>>> jgre...@stanford.edu
>>>> 
>>>> 
>>>> 
>>>> ___
>>>> Owlim-discussion mailing list
>>>> Owlim-discussion@ontotext.com
>>>> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ___
>>>> Owlim-discussion mailing list
>>>> Owlim-discussion@ontotext.com
>>>> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
>>> 
>> 
> 

___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-04-16 Thread Barry Bishop
would be my preferred approach, because it allows you 
control parsing errors in your data, e.g. skip errors or stop, 
validate literals, etc.


I hope this helps,
barry

Barry Bishop
OWLIM Product Manager
Ontotext AD
Tel: +43 650 2000 237
email:barry.bis...@ontotext.com
skype: bazbishop
www.ontotext.com
On 03/28/2013 10:51 PM, Marek Šurek wrote:

Hi,
if you want to see progress in loading, there is and option to use 
standard "curl" command instead of openrdf-workbench. It gives you 
some information what is already loaded.
To load files into owlim(from .trig file), run this command in 
your linux shell :


curl -X POST -H "Content-Type:application/x-trig" -T 
/path/to/data/datafile.trig 
localhost:8080/openrdf-sesame/repositories/repository-name/statements


If you have xml style data, change content type to 
application/rdf+xml



If you load big amount of data, I recommend to use 
configuration.xls which is part of OWLIM-SE.zip. It can help you 
to set datastore properly.


Hope this will help.

Best regards,
Marek


*From:* Joshua Greben 
*To:* owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 22:30
*Subject:* [Owlim-discussion] Loading a Large Triple Store using 
OWLIM-SE


Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone 
could offer advice for loading a large triple store. I am trying 
to load 670M triples into a repository using the openrdf-sesame 
workbench under tomcat6 on a single linux VM with 64-bit hardware 
and 64GB of memory.


My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] 
Configured parameter 'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] 
Configured parameter 'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] 
Configured parameter 'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] 
Configured parameter 'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] 
Configured parameter 'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache 
pages for tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache 
pages for predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] 
Configured parameter 'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] 
Configured parameter 'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] 
Configured parameter 'repository-type' to 'file-repository'


The loading came to a standstill after 19 hours and tomcat threw 
an OutOfMemoryError: GC overhead limit exceeded.


My question is what the application is doing with all this memory 
and whether I configured my instance correctly for this load to 
finish.  I also see a lot of entries in the main log such as this:


[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio 
error] Unescaped backslash in: L\'ambassadrice (314764886, -1)


Could these "Rio errors" be contributing to my troubles? I was 
also wondering if there was a way to configure logging to be able 
to track the application's progress. Right now these warnings are 
the only way I can tell how far the loading has progressed.


Advice from anyone who has experience successfully loading a large 
triplestore is much appreciated! Thanks in advance!


- Josh


Joshua Greben
Library Systems Programmer & Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu <mailto:jgre...@stanford.edu>



___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com>
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion




___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion










___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion