Re: [Owlim-discussion] Owlim-SE not responding with high CPU load
we experienced very similar behaviour with nearly the same usecase. i reported one bug which was in our case temporarily solved by turning off context index. recently i found other very similar issue, which is not reportwd yet as i cant figure out the cause. look pls into catalina.out whether there is not log error in predicate statistics which appears in both mentioned issues. maybe we hit the same problem. best regards, marek -Original Message- From: Stefano Parmesan parme...@spaziodati.eu Sent: 28.3.2013 12:09 To: owlim-discussion@ontotext.com owlim-discussion@ontotext.com Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load Hi everybody, We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues: Our test repository contains around 1 million triples (around 100 of those are owl:sameAs) and have concurrent applications both inserting and querying (through sesame-console and the sparql endpoint provided by the sesame-workbench). The machine is a 12-core 64GB ram debian machine. Everything worked fine as of today, when something happened while we were submitting a high load to the sparql endpoint. Since then, tomcat7 uses from 200% to 650% of cpu, and the sparql endpoint does not respond with even simple queries. We tried restarting tomcat7 multiple times, but as soon as it comes back the CPU usage increases again and there's no way to do anything (through both sesame-workbench and sesame-console). Could this be due to some misconfiguration? Is this a known issue? How can we know what's really happening (apart from checking .aduna/openrdf-sesame/logs)? We could clear the repository and start from scratch, but as we are evaluating Owlim for production usage we need to find out what's the issue to better understand if it fits our needs. -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Owlim-SE not responding with high CPU load
Hi, as long as I understand the error and behaviour it causes (I experienced this error few times before so I'm familiar), it can end with two scenarios, but both are considered as blocker/critical bugs: 1. You didn't recieve results which you should recieve (all data indicate the query is correct but even though you don't get all results you should get) 2. As statistics are broken the query which should normally take 1sec now runs e.g 20 minutes. I think the second option fits to you. When query is executed, it is normally running and in some future it will give results, but as it needs the much higher time to return results it blocks database(instead of taking database resources for 1second it uses it for 20 minutes and therefore you see such high CPU usage). The thing that you noticed this behaviour this morning is just lucky concidence and sooner or later you will certainly fall into trouble. I think the statistics which are broken are always related to specific predicates. As you didn't use the predicate which has broken statistics, you didn't notice it. From my previous experience, disabling context-index + also set index compression to -1 could solve some issues (but probably you'll have to reload the database). It is certainly not cure, but it can help you to work with application until the bug will be fixed. Hope I explain it bit to you. Hope the fix will come soon. Best regards, Marek From: Stefano Parmesan parme...@spaziodati.eu To: Marek marek_su...@yahoo.co.uk Cc: owlim-discussion@ontotext.com owlim-discussion@ontotext.com Sent: Thursday, 28 March 2013, 14:10 Subject: Re: [Owlim-discussion] Owlim-SE not responding with high CPU load $ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l 32368 (since the 25th) Apparently the last error is of yesterday afternoon, but we experienced such problems this morning as well, I can't say if they are related. Thanks and regards 2013/3/28 Marek marek_su...@yahoo.co.uk we experienced very similar behaviour with nearly the same usecase. i reported one bug which was in our case temporarily solved by turning off context index. recently i found other very similar issue, which is not reportwd yet as i cant figure out the cause. look pls into catalina.out whether there is not log error in predicate statistics which appears in both mentioned issues. maybe we hit the same problem. best regards, marek From: Stefano Parmesan Sent: 28.3.2013 12:09 To: owlim-discussion@ontotext.com Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load Hi everybody, We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues: Our test repository contains around 1 million triples (around 100 of those are owl:sameAs) and have concurrent applications both inserting and querying (through sesame-console and the sparql endpoint provided by the sesame-workbench). The machine is a 12-core 64GB ram debian machine. Everything worked fine as of today, when something happened while we were submitting a high load to the sparql endpoint. Since then, tomcat7 uses from 200% to 650% of cpu, and the sparql endpoint does not respond with even simple queries. We tried restarting tomcat7 multiple times, but as soon as it comes back the CPU usage increases again and there's no way to do anything (through both sesame-workbench and sesame-console). Could this be due to some misconfiguration? Is this a known issue? How can we know what's really happening (apart from checking .aduna/openrdf-sesame/logs)? We could clear the repository and start from scratch, but as we are evaluating Owlim for production usage we need to find out what's the issue to better understand if it fits our needs. -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Owlim-SE not responding with high CPU load
Thank you Marek, I'll give it a try, I cleaned the repository without updating the conf a couple of hours ago and the issue haven't appeared yet, but as you say this may lead to issues in the future so why not. Thanks 2013/3/28 Marek Šurek marek_su...@yahoo.co.uk Hi, as long as I understand the error and behaviour it causes (I experienced this error few times before so I'm familiar), it can end with two scenarios, but both are considered as blocker/critical bugs: 1. You didn't recieve results which you should recieve (all data indicate the query is correct but even though you don't get all results you should get) 2. As statistics are broken the query which should normally take 1sec now runs e.g 20 minutes. I think the second option fits to you. When query is executed, it is normally running and in some future it will give results, but as it needs the much higher time to return results it blocks database(instead of taking database resources for 1second it uses it for 20 minutes and therefore you see such high CPU usage). The thing that you noticed this behaviour this morning is just lucky concidence and sooner or later you will certainly fall into trouble. I think the statistics which are broken are always related to specific predicates. As you didn't use the predicate which has broken statistics, you didn't notice it. From my previous experience, disabling context-index + also set index compression to -1 could solve some issues (but probably you'll have to reload the database). It is certainly not cure, but it can help you to work with application until the bug will be fixed. Hope I explain it bit to you. Hope the fix will come soon. Best regards, Marek -- *From:* Stefano Parmesan parme...@spaziodati.eu *To:* Marek marek_su...@yahoo.co.uk *Cc:* owlim-discussion@ontotext.com owlim-discussion@ontotext.com *Sent:* Thursday, 28 March 2013, 14:10 *Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU load $ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l 32368 (since the 25th) Apparently the last error is of yesterday afternoon, but we experienced such problems this morning as well, I can't say if they are related. Thanks and regards 2013/3/28 Marek marek_su...@yahoo.co.uk we experienced very similar behaviour with nearly the same usecase. i reported one bug which was in our case temporarily solved by turning off context index. recently i found other very similar issue, which is not reportwd yet as i cant figure out the cause. look pls into catalina.out whether there is not log error in predicate statistics which appears in both mentioned issues. maybe we hit the same problem. best regards, marek -- From: Stefano Parmesan parme...@spaziodati.eu Sent: 28.3.2013 12:09 To: owlim-discussion@ontotext.com Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load Hi everybody, We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues: Our test repository contains around 1 million triples (around 100 of those are owl:sameAs) and have concurrent applications both inserting and querying (through sesame-console and the sparql endpoint provided by the sesame-workbench). The machine is a 12-core 64GB ram debian machine. Everything worked fine as of today, when something happened while we were submitting a high load to the sparql endpoint. Since then, tomcat7 uses from 200% to 650% of cpu, and the sparql endpoint does not respond with even simple queries. We tried restarting tomcat7 multiple times, but as soon as it comes back the CPU usage increases again and there's no way to do anything (through both sesame-workbench and sesame-console). Could this be due to some misconfiguration? Is this a known issue? How can we know what's really happening (apart from checking .aduna/openrdf-sesame/logs)? We could clear the repository and start from scratch, but as we are evaluating Owlim for production usage we need to find out what's the issue to better understand if it fits our needs. -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Owlim-SE not responding with high CPU load
Hi Stefano, We are sorry to hear that you experience problems! We gave a try to reproduce this issue with synthetic data consisting of 1M statements and 100 owl:sameAs links between random entities, performing thousands of small updates in the background of heavy long-running queries and killing the Owlim's process from time to time and then restarting, but couldn't get corrupted predicate statistics. Can you please send us your Owlim config file and the file 'predicates' in your storage folder (as defined in the config; the storage folder contains files like 'entities', 'pso.index', etc.; 'predicates' is a binary file which contains entity IDs + counters) if you keep the corrupted image? Regards, Ruslan On 03/28/2013 04:01 PM, Stefano Parmesan wrote: Thank you Marek, I'll give it a try, I cleaned the repository without updating the conf a couple of hours ago and the issue haven't appeared yet, but as you say this may lead to issues in the future so why not. Thanks 2013/3/28 Marek Šurek marek_su...@yahoo.co.uk Hi, as long as I understand the error and behaviour it causes (I experienced this error few times before so I'm familiar), it can end with two scenarios, but both are considered as blocker/critical bugs: 1. You didn't recieve results which you should recieve (all data indicate the query is correct but even though you don't get all results you should get) 2. As statistics are broken the query which should normally take 1sec now runs e.g 20 minutes. I think the second option fits to you. When query is executed, it is normally running and in some future it will give results, but as it needs the much higher time to return results it blocks database(instead of taking database resources for 1second it uses it for 20 minutes and therefore you see such high CPU usage). The thing that you noticed this behaviour this morning is just lucky concidence and sooner or later you will certainly fall into trouble. I think the statistics which are broken are always related to specific predicates. As you didn't use the predicate which has broken statistics, you didn't notice it. From my previous experience, disabling context-index + also set index compression to -1 could solve some issues (but probably you'll have to reload the database). It is certainly not cure, but it can help you to work with application until the bug will be fixed. Hope I explain it bit to you. Hope the fix will come soon. Best regards, Marek -- *From:* Stefano Parmesan parme...@spaziodati.eu *To:* Marek marek_su...@yahoo.co.uk *Cc:* owlim-discussion@ontotext.com owlim-discussion@ontotext.com *Sent:* Thursday, 28 March 2013, 14:10 *Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU load $ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l 32368 (since the 25th) Apparently the last error is of yesterday afternoon, but we experienced such problems this morning as well, I can't say if they are related. Thanks and regards 2013/3/28 Marek marek_su...@yahoo.co.uk we experienced very similar behaviour with nearly the same usecase. i reported one bug which was in our case temporarily solved by turning off context index. recently i found other very similar issue, which is not reportwd yet as i cant figure out the cause. look pls into catalina.out whether there is not log error in predicate statistics which appears in both mentioned issues. maybe we hit the same problem. best regards, marek -- From: Stefano Parmesan parme...@spaziodati.eu Sent: 28.3.2013 12:09 To: owlim-discussion@ontotext.com Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load Hi everybody, We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues: Our test repository contains around 1 million triples (around 100 of those are owl:sameAs) and have concurrent applications both inserting and querying (through sesame-console and the sparql endpoint provided by the sesame-workbench). The machine is a 12-core 64GB ram debian machine. Everything worked fine as of today, when something happened while we were submitting a high load to the sparql endpoint. Since then, tomcat7 uses from 200% to 650% of cpu, and the sparql endpoint does not respond with even simple queries. We tried restarting tomcat7 multiple times, but as soon as it comes back the CPU usage increases again and there's no way to do anything (through both sesame-workbench and sesame-console). Could this be due to some misconfiguration? Is this a known issue? How can we know what's really happening (apart from checking .aduna/openrdf-sesame/logs)? We could clear the repository and start from scratch, but as we are evaluating Owlim for production usage we need to find out what's the issue to better understand if it fits our needs. -- Dott. Stefano Parmesan Web Developer ~ SpazioDati s.r.l. Via del Brennero, 52 – 38122 Trento – Italy -- Dott. Stefano Parmesan Web Developer
[Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these Rio errors be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
Hi, if you want to see progress in loading, there is and option to use standard curl command instead of openrdf-workbench. It gives you some information what is already loaded. To load files into owlim(from .trig file), run this command in your linux shell : curl -X POST -H Content-Type:application/x-trig -T /path/to/data/datafile.trig localhost:8080/openrdf-sesame/repositories/repository-name/statements If you have xml style data, change content type to application/rdf+xml If you load big amount of data, I recommend to use configuration.xls which is part of OWLIM-SE.zip. It can help you to set datastore properly. Hope this will help. Best regards, Marek From: Joshua Greben jgre...@stanford.edu To: owlim-discussion@ontotext.com Sent: Thursday, 28 March 2013, 22:30 Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE Hello all, I am new to this list and to OWLIM-SE and was wondering if anyone could offer advice for loading a large triple store. I am trying to load 670M triples into a repository using the openrdf-sesame workbench under tomcat6 on a single linux VM with 64-bit hardware and 64GB of memory. My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m Here is the log info for my repository configuration: ... [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-id-size' to '32' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'enable-context-index' to 'false' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'entity-index-size' to '1' [INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 'tuple-index-memory' to '1600m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'cache-memory' to '3200m' [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for tuples: 83886 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for predicates: 0 [INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 'storage-folder' to 'storage' [INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 'in-memory-literal-properties' to 'false' [INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 'repository-type' to 'file-repository' The loading came to a standstill after 19 hours and tomcat threw an OutOfMemoryError: GC overhead limit exceeded. My question is what the application is doing with all this memory and whether I configured my instance correctly for this load to finish. I also see a lot of entries in the main log such as this: [WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] Unescaped backslash in: L\'ambassadrice (314764886, -1) Could these Rio errors be contributing to my troubles? I was also wondering if there was a way to configure logging to be able to track the application's progress. Right now these warnings are the only way I can tell how far the loading has progressed. Advice from anyone who has experience successfully loading a large triplestore is much appreciated! Thanks in advance! - Josh Joshua Greben Library Systems Programmer Analyst Stanford University Libraries (650) 714-1937 jgre...@stanford.edu ___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion___ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion