[Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these "Rio errors" be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer & Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hi, if you want to see progress in loading, there is and option to use standard "curl" command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell : curl -X POST -H "Content-Type:application/x-trig" -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements If you have xml style data, change content type to application/rdf+xml If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly. Hope this will help. Best regards, Marek From: Joshua Greben To: owlim-discussion@ontotext.com Sent: Thursday, 28 March 2013, 22:30 Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these "Rio errors" be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer & Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hello Marek, Stefano, There is a little bit of information here about how to load a lot of data (the problems being that the Sesame workbench/browser will time out if it takes too long and OWLIM uses a lot of memory if the transaction size is too big): https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F There is also some information here about using the demonstrator program that comes with OWLIM to do this: https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading This latter would be my preferred approach, because it allows you control parsing errors in your data, e.g. skip errors or stop, validate literals, etc. I hope this helps, barry Barry Bishop OWLIM Product Manager Ontotext AD Tel: +43 650 2000 237 email: barry.bis...@ontotext.com skype: bazbishop www.ontotext.com On 03/28/2013 10:51 PM, Marek Šurek wrote: Hi, if you want to see progress in loading, there is and option to use standard "curl" command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell : curl -X POST -H "Content-Type:application/x-trig" -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements If you have xml style data, change content type to application/rdf+xml If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly. Hope this will help. Best regards, Marek *From:* Joshua Greben *To:* owlim-discussion@ontotext.com *Sent:* Thursday, 28 March 2013, 22:30 *Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these "Rio errors" be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer & Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu <mailto:jgre...@stanford.edu> ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Thanks for the advice! I used the spreadsheet and was able to size the application correctly. 17 hours later my rdf+xml triple file is 80% loaded. It looks like it might still take up to another 11 hours to finish, but again, this is based on my reading of "unescaped backslash" errors that are logged and timestamped with the file line number. I am still running this under tomcat using the workbench because the CURL command threw the following error: MALFORMED DATA: Element type "http:" must be followed by either attribute specifications, ">" or "/>". I might try it again later using curl --data-urlencode -T /path/to/data/data.nt ... to see if that helps, but I just wanted to get something running overnight. Thanks again! - Josh It seems that the workbench application is better able to handle these On Mar 28, 2013, at 2:51 PM, Marek Šurek wrote: > Hi, > if you want to see progress in loading, there is and option to use standard > "curl" command instead of openrdf-workbench. It gives you some information > what is already loaded. > To load files into owlim(from .trig file), run this command in your linux > shell : > > curl -X POST -H "Content-Type:application/x-trig" -T > /path/to/data/datafile.trig > localhost:8080/openrdf-sesame/repositories/repository-name/statements > > If you have xml style data, change content type to application/rdf+xml > > > If you load big amount of data, I recommend to use configuration.xls which is > part of OWLIM-SE.zip. It can help you to set datastore properly. > > Hope this will help. > > Best regards, > Marek > > From: Joshua Greben > To: owlim-discussion@ontotext.com > Sent: Thursday, 28 March 2013, 22:30 > Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE > > Hello all, > > I am new to this list and to OWLIM-SE and was wondering if anyone could offer > advice for loading a large triple store. I am trying to load 670M triples > into a repository using the openrdf-sesame workbench under tomcat6 on a > single linux VM with 64-bit hardware and 64GB of memory. > > My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m > > Here is the log info for my repository configuration: > > ... > [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured > parameter 'entity-id-size' to '32' > [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured > parameter 'enable-context-index' to 'false' > [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured > parameter 'entity-index-size' to '1' > [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured > parameter 'tuple-index-memory' to '1600m' > [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured > parameter 'cache-memory' to '3200m' > [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for > tuples: 83886 > [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for > predicates: 0 > [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured > parameter 'storage-folder' to 'storage' > [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured > parameter 'in-memory-literal-properties' to 'false' > [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured > parameter 'repository-type' to 'file-repository' > > The loading came to a standstill after 19 hours and tomcat threw an > OutOfMemoryError: GC overhead limit exceeded. > > My question is what the application is doing with all this memory and whether > I configured my instance correctly for this load to finish. I also see a lot > of entries in the main log such as this: > > [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] > Unescaped backslash in: L\'ambassadrice (314764886, -1) > > Could these "Rio errors" be contributing to my troubles? I was also wondering > if there was a way to configure logging to be able to track the application's > progress. Right now these warnings are the only way I can tell how far the > loading has progressed. > > Advice from anyone who has experience successfully loading a large > triplestore is much appreciated! Thanks in advance! > > - Josh > > > Joshua Greben > Library Systems Programmer & Analyst > Stanford University Libraries > (650) 714-1937 > jgre...@stanford.edu > > > > ___ > Owlim-discussion mailing list > Owlim-discussion@ontotext.com > http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion > > ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hi Barry, Following you advice I ran the load using the example.sh script and pointing to my repository on localhost:8080. The load ran fine for 7 hours, but then it gave up with the following error in the main log: [ERROR] 2013-04-08 20:13:18,019 [repositories/BFWorks_STF] Error while handling request (500): java.net.SocketTimeoutException: Read timed out I noticed that tomcat's connectionTimeout param was at the default (20sec.) so I considered increasing it to 10 minutes. Any advice on this? Also, once this error happened I am unable to do anything with the repository except view the Contexts in Repository (via the workbench). When I try to clear the contexts to start over from scratch it takes a very long time and then I end up getting: javax.servlet.ServletException: org.openrdf.repository.RepositoryException: java.io.EOFException At this point I am forced to kill the tomcat process and delete the repository forcibly. I then tried creating a repository using the sesame_owlim console. but I keep getting ERROR: No template called BFWorks.ttl found in /storage/openrdf-sesame-console/templates even though I have a BFWorks.ttl file in that directory. Any help/advice is appreciated. -Josh On Mar 29, 2013, at 8:46 AM, Barry Bishop wrote: > Hello Marek, Stefano, > > There is a little bit of information here about how to load a lot of data > (the problems being that the Sesame workbench/browser will time out if it > takes too long and OWLIM uses a lot of memory if the transaction size is too > big): > > https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F > > There is also some information here about using the demonstrator program that > comes with OWLIM to do this: > > https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading > > This latter would be my preferred approach, because it allows you control > parsing errors in your data, e.g. skip errors or stop, validate literals, etc. > > I hope this helps, > barry > > Barry Bishop > OWLIM Product Manager > Ontotext AD > Tel: +43 650 2000 237 > email: barry.bis...@ontotext.com > skype: bazbishop > www.ontotext.com > On 03/28/2013 10:51 PM, Marek Šurek wrote: >> Hi, >> if you want to see progress in loading, there is and option to use standard >> "curl" command instead of openrdf-workbench. It gives you some information >> what is already loaded. >> To load files into owlim(from .trig file), run this command in your linux >> shell : >> >> curl -X POST -H "Content-Type:application/x-trig" -T >> /path/to/data/datafile.trig >> localhost:8080/openrdf-sesame/repositories/repository-name/statements >> >> If you have xml style data, change content type to application/rdf+xml >> >> >> If you load big amount of data, I recommend to use configuration.xls which >> is part of OWLIM-SE.zip. It can help you to set datastore properly. >> >> Hope this will help. >> >> Best regards, >> Marek >> >> From: Joshua Greben >> To: owlim-discussion@ontotext.com >> Sent: Thursday, 28 March 2013, 22:30 >> Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE >> >> Hello all, >> >> I am new to this list and to OWLIM-SE and was wondering if anyone could >> offer advice for loading a large triple store. I am trying to load 670M >> triples into a repository using the openrdf-sesame workbench under tomcat6 >> on a single linux VM with 64-bit hardware and 64GB of memory. >> >> My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m >> >> Here is the log info for my repository configuration: >> >> ... >> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >> parameter 'entity-id-size' to '32' >> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >> parameter 'enable-context-index' to 'false' >> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >> parameter 'entity-index-size' to '1' >> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >> parameter 'tuple-index-memory' to '1600m' >> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured >> parameter 'cache-memory' to '3200m' >> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for >> tuples: 83886 >> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pa
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hi Joshua, Sorry to hear that you are still having problems loading data. Looking more closely, I think you have a less than optimal memory configuration: Java heap 32G 'tuple-index-memory' to '1600m' 'cache-memory' to '3200m' I suggest you increase the last two parameters to something more like '10G' or possibly even 15G for loading. More comments inline: On 04/09/2013 09:47 PM, Joshua Greben wrote: Hi Barry, Following you advice I ran the load using the example.sh script and pointing to my repository on localhost:8080. The load ran fine for 7 hours, but then it gave up with the following error in the main log: [ERROR] 2013-04-08 20:13:18,019 [repositories/BFWorks_STF] Error while handling request (500): java.net.SocketTimeoutException: Read timed out I don't have the full stack trace, but I guess this is because successive commit operations are taking longer and longer (not much memory for the cache) and eventually one takes too long and this error occurs. I noticed that tomcat's connectionTimeout param was at the default (20sec.) so I considered increasing it to 10 minutes. Any advice on this? I don't think this will hurt, so I agree that increasing this would be a good idea. Also, once this error happened I am unable to do anything with the repository except view the Contexts in Repository (via the workbench). When I try to clear the contexts to start over from scratch it takes a very long time and then I end up getting: javax.servlet.ServletException: org.openrdf.repository.RepositoryException: java.io.EOFException A full stack trace would be really useful here. At this point I am forced to kill the tomcat process and delete the repository forcibly. It could be that OWLIM is still busily trying to commit a large transaction with materialisation of inferences (lots of random index lookups), so killing tomcat would quite possibly leave the storage files in an inconsistent state. I then tried creating a repository using the sesame_owlim console. but I keep getting ERROR: No template called BFWorks.ttl found in /storage/openrdf-sesame-console/templates even though I have a BFWorks.ttl file in that directory. Not sure about this one. I believe it is the client (not the server) that needs to be able to load this template file. Is there a permissions problem? Are you overriding the default location for loading template files? Any help/advice is appreciated. -Josh All the best, barry On Mar 29, 2013, at 8:46 AM, Barry Bishop wrote: Hello Marek, Stefano, There is a little bit of information here about how to load a lot of data (the problems being that the Sesame workbench/browser will time out if it takes too long and OWLIM uses a lot of memory if the transaction size is too big): https://confluence.ontotext.com/display/OWLIMv53/OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F There is also some information here about using the demonstrator program that comes with OWLIM to do this: https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading This latter would be my preferred approach, because it allows you control parsing errors in your data, e.g. skip errors or stop, validate literals, etc. I hope this helps, barry Barry Bishop OWLIM Product Manager Ontotext AD Tel: +43 650 2000 237 email:barry.bis...@ontotext.com skype: bazbishop www.ontotext.com On 03/28/2013 10:51 PM, Marek Šurek wrote: Hi, if you want to see progress in loading, there is and option to use standard "curl" command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell : curl -X POST -H "Content-Type:application/x-trig" -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements If you have xml style data, change content type to application/rdf+xml If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly. Hope this will help. Best regards, Marek *From:* Joshua Greben *To:* owlim-discussion@ontotext.com *Sent:* Thursday, 28 March 2013, 22:30 *Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:0
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
OWLIM+FAQ#OWLIMFAQ-HowdoIloadlargeamountsofdataintoOWLIMSEorOWLIMEnterprise%3F >>> >>> There is also some information here about using the demonstrator program >>> that comes with OWLIM to do this: >>> >>> https://confluence.ontotext.com/display/OWLIMv53/OWLIM-SE+Configuration#OWLIM-SEConfiguration-Bulkdataloading >>> >>> This latter would be my preferred approach, because it allows you control >>> parsing errors in your data, e.g. skip errors or stop, validate literals, >>> etc. >>> >>> I hope this helps, >>> barry >>> >>> Barry Bishop >>> OWLIM Product Manager >>> Ontotext AD >>> Tel: +43 650 2000 237 >>> email: barry.bis...@ontotext.com >>> skype: bazbishop >>> www.ontotext.com >>> On 03/28/2013 10:51 PM, Marek Šurek wrote: >>>> Hi, >>>> if you want to see progress in loading, there is and option to use >>>> standard "curl" command instead of openrdf-workbench. It gives you some >>>> information what is already loaded. >>>> To load files into owlim(from .trig file), run this command in your linux >>>> shell : >>>> >>>> curl -X POST -H "Content-Type:application/x-trig" -T >>>> /path/to/data/datafile.trig >>>> localhost:8080/openrdf-sesame/repositories/repository-name/statements >>>> >>>> If you have xml style data, change content type to application/rdf+xml >>>> >>>> >>>> If you load big amount of data, I recommend to use configuration.xls which >>>> is part of OWLIM-SE.zip. It can help you to set datastore properly. >>>> >>>> Hope this will help. >>>> >>>> Best regards, >>>> Marek >>>> >>>> From: Joshua Greben >>>> To: owlim-discussion@ontotext.com >>>> Sent: Thursday, 28 March 2013, 22:30 >>>> Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE >>>> >>>> Hello all, >>>> >>>> I am new to this list and to OWLIM-SE and was wondering if anyone could >>>> offer advice for loading a large triple store. I am trying to load 670M >>>> triples into a repository using the openrdf-sesame workbench under tomcat6 >>>> on a single linux VM with 64-bit hardware and 64GB of memory. >>>> >>>> My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m >>>> >>>> Here is the log info for my repository configuration: >>>> >>>> ... >>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >>>> parameter 'entity-id-size' to '32' >>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >>>> parameter 'enable-context-index' to 'false' >>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >>>> parameter 'entity-index-size' to '1' >>>> [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured >>>> parameter 'tuple-index-memory' to '1600m' >>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured >>>> parameter 'cache-memory' to '3200m' >>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for >>>> tuples: 83886 >>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for >>>> predicates: 0 >>>> [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured >>>> parameter 'storage-folder' to 'storage' >>>> [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured >>>> parameter 'in-memory-literal-properties' to 'false' >>>> [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured >>>> parameter 'repository-type' to 'file-repository' >>>> >>>> The loading came to a standstill after 19 hours and tomcat threw an >>>> OutOfMemoryError: GC overhead limit exceeded. >>>> >>>> My question is what the application is doing with all this memory and >>>> whether I configured my instance correctly for this load to finish. I >>>> also see a lot of entries in the main log such as this: >>>> >>>> [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] >>>> Unescaped backslash in: L\'ambassadrice (314764886, -1) >>>> >>>> Could these "Rio errors" be contributing to my troubles? I was also >>>> wondering if there was a way to configure logging to be able to track the >>>> application's progress. Right now these warnings are the only way I can >>>> tell how far the loading has progressed. >>>> >>>> Advice from anyone who has experience successfully loading a large >>>> triplestore is much appreciated! Thanks in advance! >>>> >>>> - Josh >>>> >>>> >>>> Joshua Greben >>>> Library Systems Programmer & Analyst >>>> Stanford University Libraries >>>> (650) 714-1937 >>>> jgre...@stanford.edu >>>> >>>> >>>> >>>> ___ >>>> Owlim-discussion mailing list >>>> Owlim-discussion@ontotext.com >>>> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion >>>> >>>> >>>> >>>> >>>> ___ >>>> Owlim-discussion mailing list >>>> Owlim-discussion@ontotext.com >>>> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion >>> >> > ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
would be my preferred approach, because it allows you control parsing errors in your data, e.g. skip errors or stop, validate literals, etc. I hope this helps, barry Barry Bishop OWLIM Product Manager Ontotext AD Tel: +43 650 2000 237 email:barry.bis...@ontotext.com skype: bazbishop www.ontotext.com On 03/28/2013 10:51 PM, Marek Šurek wrote: Hi, if you want to see progress in loading, there is and option to use standard "curl" command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell : curl -X POST -H "Content-Type:application/x-trig" -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements If you have xml style data, change content type to application/rdf+xml If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly. Hope this will help. Best regards, Marek *From:* Joshua Greben *To:* owlim-discussion@ontotext.com *Sent:* Thursday, 28 March 2013, 22:30 *Subject:* [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these "Rio errors" be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer & Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu <mailto:jgre...@stanford.edu> ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com <mailto:Owlim-discussion@ontotext.com> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion