Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

2013-03-28 Thread Marek
we experienced very similar behaviour with nearly the same usecase. i reported 
one bug which was in our case temporarily solved by turning off context index. 
recently i found other very similar issue, which is not reportwd yet as i cant 
figure out the cause. look pls into catalina.out whether there is not log 
error in predicate statistics which appears in both mentioned issues. maybe 
we hit the same problem.
best regards,
marek

-Original Message-
From: Stefano Parmesan parme...@spaziodati.eu
Sent: ‎28.‎3.‎2013 12:09
To: owlim-discussion@ontotext.com owlim-discussion@ontotext.com
Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load

Hi everybody,


We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues:


Our test repository contains around 1 million triples (around 100 of those are 
owl:sameAs) and have concurrent applications both inserting and querying 
(through sesame-console and the sparql endpoint provided by the 
sesame-workbench). The machine is a 12-core 64GB ram debian machine. Everything 
worked fine as of today, when something happened while we were submitting a 
high load to the sparql endpoint. Since then, tomcat7 uses from 200% to 650% of 
cpu, and the sparql endpoint does not respond with even simple queries.


We tried restarting tomcat7 multiple times, but as soon as it comes back the 
CPU usage increases again and there's no way to do anything (through both 
sesame-workbench and sesame-console).


Could this be due to some misconfiguration? Is this a known issue? How can we 
know what's really happening (apart from checking .aduna/openrdf-sesame/logs)?


We could clear the repository and start from scratch, but as we are evaluating 
Owlim for production usage we need to find out what's the issue to better 
understand if it fits our needs.


-- 

Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

2013-03-28 Thread Marek Šurek
Hi,
as long as I understand the error and behaviour it causes (I experienced this 
error few times before so I'm familiar), it can end with two scenarios, but 
both are considered as blocker/critical bugs:
1. You didn't recieve results which you should recieve (all data indicate the 
query is correct but even though you don't get all results you should get)
2. As statistics are broken the query which should normally take 1sec now runs 
e.g 20 minutes.  

I think the second option fits to you. When query is executed, it is normally 
running and in some future it will give results, but as it needs the much 
higher time to return results it blocks database(instead of taking database 
resources for 1second it uses it for 20 minutes and therefore you see such high 
CPU usage). The thing that you noticed this behaviour this morning is just 
lucky concidence and sooner or later you will certainly fall into trouble. 
I think the statistics which are broken are always related to specific 
predicates. As you didn't use the predicate which has broken statistics, you 
didn't notice it. 

From my previous experience, disabling context-index + also set index 
compression to -1 could solve some issues (but probably you'll have to reload 
the database). It is certainly not cure, but it can help you to work with 
application until the bug will be fixed. 
Hope I explain it bit to you. Hope the fix will come soon.

Best regards,
Marek





 From: Stefano Parmesan parme...@spaziodati.eu
To: Marek marek_su...@yahoo.co.uk 
Cc: owlim-discussion@ontotext.com owlim-discussion@ontotext.com 
Sent: Thursday, 28 March 2013, 14:10
Subject: Re: [Owlim-discussion] Owlim-SE not responding with high CPU load
 

$ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l
32368
(since the 25th)

Apparently the last error is of yesterday afternoon, but we experienced such 
problems this morning as well, I can't say if they are related.


Thanks and regards



2013/3/28 Marek marek_su...@yahoo.co.uk

we experienced very similar behaviour with nearly the same usecase. i reported 
one bug which was in our case temporarily solved by turning off context index. 
recently i found other very similar issue, which is not reportwd yet as i cant 
figure out the cause. look pls into catalina.out whether there is not log 
error in predicate statistics which appears in both mentioned issues. maybe 
we hit the same problem.
best regards,
marek


 From: Stefano Parmesan
Sent: 28.3.2013 12:09
To: owlim-discussion@ontotext.com
Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load


Hi everybody, 


We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues:


Our test repository contains around 1 million triples (around 100 of those are 
owl:sameAs) and have concurrent applications both inserting and querying 
(through sesame-console and the sparql endpoint provided by the 
sesame-workbench). The machine is a 12-core 64GB ram debian machine. 
Everything worked fine as of today, when something happened while we were 
submitting a high load to the sparql endpoint. Since then, tomcat7 uses from 
200% to 650% of cpu, and the sparql endpoint does not respond with even simple 
queries.


We tried restarting tomcat7 multiple times, but as soon as it comes back the 
CPU usage increases again and there's no way to do anything (through both 
sesame-workbench and sesame-console).


Could this be due to some misconfiguration? Is this a known issue? How can we 
know what's really happening (apart from checking .aduna/openrdf-sesame/logs)?


We could clear the repository and start from scratch, but as we are evaluating 
Owlim for production usage we need to find out what's the issue to better 
understand if it fits our needs.

-- 

Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy


-- 

Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

2013-03-28 Thread Stefano Parmesan
Thank you Marek,

I'll give it a try, I cleaned the repository without updating the conf a
couple of hours ago and the issue haven't appeared yet, but as you say this
may lead to issues in the future so why not.

Thanks


2013/3/28 Marek Šurek marek_su...@yahoo.co.uk

 Hi,
 as long as I understand the error and behaviour it causes (I experienced
 this error few times before so I'm familiar), it can end with two
 scenarios, but both are considered as blocker/critical bugs:
 1. You didn't recieve results which you should recieve (all data indicate
 the query is correct but even though you don't get all results you should
 get)
 2. As statistics are broken the query which should normally take 1sec now
 runs e.g 20 minutes.

 I think the second option fits to you. When query is executed, it is
 normally running and in some future it will give results, but as it needs
 the much higher time to return results it blocks database(instead of taking
 database resources for 1second it uses it for 20 minutes and therefore you
 see such high CPU usage). The thing that you noticed this behaviour this
 morning is just lucky concidence and sooner or later you will certainly
 fall into trouble.
 I think the statistics which are broken are always related to specific
 predicates. As you didn't use the predicate which has broken statistics,
 you didn't notice it.

 From my previous experience, disabling context-index + also set index
 compression to -1 could solve some issues (but probably you'll have to
 reload the database). It is certainly not cure, but it can help you to work
 with application until the bug will be fixed.
 Hope I explain it bit to you. Hope the fix will come soon.

 Best regards,
 Marek


   --
 *From:* Stefano Parmesan parme...@spaziodati.eu
 *To:* Marek marek_su...@yahoo.co.uk
 *Cc:* owlim-discussion@ontotext.com owlim-discussion@ontotext.com
 *Sent:* Thursday, 28 March 2013, 14:10
 *Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU
 load

 $ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l
 32368
 (since the 25th)

 Apparently the last error is of yesterday afternoon, but we experienced
 such problems this morning as well, I can't say if they are related.

 Thanks and regards


 2013/3/28 Marek marek_su...@yahoo.co.uk

  we experienced very similar behaviour with nearly the same usecase. i
 reported one bug which was in our case temporarily solved by turning off
 context index. recently i found other very similar issue, which is not
 reportwd yet as i cant figure out the cause. look pls into catalina.out
 whether there is not log error in predicate statistics which appears in
 both mentioned issues. maybe we hit the same problem.
 best regards,
 marek
  --
 From: Stefano Parmesan parme...@spaziodati.eu
 Sent: 28.3.2013 12:09
 To: owlim-discussion@ontotext.com
 Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load

 Hi everybody,

 We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues:

 Our test repository contains around 1 million triples (around 100 of those
 are owl:sameAs) and have concurrent applications both inserting and
 querying (through sesame-console and the sparql endpoint provided by the
 sesame-workbench). The machine is a 12-core 64GB ram debian machine.
 Everything worked fine as of today, when something happened while we were
 submitting a high load to the sparql endpoint. Since then, tomcat7 uses
 from 200% to 650% of cpu, and the sparql endpoint does not respond with
 even simple queries.

 We tried restarting tomcat7 multiple times, but as soon as it comes back
 the CPU usage increases again and there's no way to do anything (through
 both sesame-workbench and sesame-console).

 Could this be due to some misconfiguration? Is this a known issue? How can
 we know what's really happening (apart from
 checking .aduna/openrdf-sesame/logs)?

 We could clear the repository and start from scratch, but as we are
 evaluating Owlim for production usage we need to find out what's the issue
 to better understand if it fits our needs.

 --
 Dott. Stefano Parmesan
 Web Developer ~ SpazioDati s.r.l.
 Via del Brennero, 52 – 38122 Trento – Italy




 --
 Dott. Stefano Parmesan
 Web Developer ~ SpazioDati s.r.l.
 Via del Brennero, 52 – 38122 Trento – Italy





-- 
Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy
___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Owlim-SE not responding with high CPU load

2013-03-28 Thread Ruslan Velkov

Hi Stefano,

We are sorry to hear that you experience problems!
We gave a try to reproduce this issue with synthetic data consisting of 
1M statements and 100 owl:sameAs links between random entities, 
performing thousands of small updates in the background of heavy 
long-running queries and killing the Owlim's process from time to time 
and then restarting, but couldn't get corrupted predicate statistics.


Can you please send us your Owlim config file and the file 'predicates' 
in your storage folder (as defined in the config; the storage folder 
contains files like 'entities', 'pso.index', etc.; 'predicates' is a 
binary file which contains entity IDs + counters) if you keep the 
corrupted image?



Regards,
Ruslan


On 03/28/2013 04:01 PM, Stefano Parmesan wrote:

Thank you Marek,

I'll give it a try, I cleaned the repository without updating the conf a
couple of hours ago and the issue haven't appeared yet, but as you say this
may lead to issues in the future so why not.

Thanks


2013/3/28 Marek Šurek marek_su...@yahoo.co.uk


Hi,
as long as I understand the error and behaviour it causes (I experienced
this error few times before so I'm familiar), it can end with two
scenarios, but both are considered as blocker/critical bugs:
1. You didn't recieve results which you should recieve (all data indicate
the query is correct but even though you don't get all results you should
get)
2. As statistics are broken the query which should normally take 1sec now
runs e.g 20 minutes.

I think the second option fits to you. When query is executed, it is
normally running and in some future it will give results, but as it needs
the much higher time to return results it blocks database(instead of taking
database resources for 1second it uses it for 20 minutes and therefore you
see such high CPU usage). The thing that you noticed this behaviour this
morning is just lucky concidence and sooner or later you will certainly
fall into trouble.
I think the statistics which are broken are always related to specific
predicates. As you didn't use the predicate which has broken statistics,
you didn't notice it.

 From my previous experience, disabling context-index + also set index
compression to -1 could solve some issues (but probably you'll have to
reload the database). It is certainly not cure, but it can help you to work
with application until the bug will be fixed.
Hope I explain it bit to you. Hope the fix will come soon.

Best regards,
Marek


   --
*From:* Stefano Parmesan parme...@spaziodati.eu
*To:* Marek marek_su...@yahoo.co.uk
*Cc:* owlim-discussion@ontotext.com owlim-discussion@ontotext.com
*Sent:* Thursday, 28 March 2013, 14:10
*Subject:* Re: [Owlim-discussion] Owlim-SE not responding with high CPU
load

$ grep ERROR IN PREDICATE STATISTICS catalina.out | wc -l
32368
(since the 25th)

Apparently the last error is of yesterday afternoon, but we experienced
such problems this morning as well, I can't say if they are related.

Thanks and regards


2013/3/28 Marek marek_su...@yahoo.co.uk

  we experienced very similar behaviour with nearly the same usecase. i
reported one bug which was in our case temporarily solved by turning off
context index. recently i found other very similar issue, which is not
reportwd yet as i cant figure out the cause. look pls into catalina.out
whether there is not log error in predicate statistics which appears in
both mentioned issues. maybe we hit the same problem.
best regards,
marek
  --
From: Stefano Parmesan parme...@spaziodati.eu
Sent: 28.3.2013 12:09
To: owlim-discussion@ontotext.com
Subject: [Owlim-discussion] Owlim-SE not responding with high CPU load

Hi everybody,

We are evaluating Owlim-SE 5.3.5849 but we are encountering some issues:

Our test repository contains around 1 million triples (around 100 of those
are owl:sameAs) and have concurrent applications both inserting and
querying (through sesame-console and the sparql endpoint provided by the
sesame-workbench). The machine is a 12-core 64GB ram debian machine.
Everything worked fine as of today, when something happened while we were
submitting a high load to the sparql endpoint. Since then, tomcat7 uses
from 200% to 650% of cpu, and the sparql endpoint does not respond with
even simple queries.

We tried restarting tomcat7 multiple times, but as soon as it comes back
the CPU usage increases again and there's no way to do anything (through
both sesame-workbench and sesame-console).

Could this be due to some misconfiguration? Is this a known issue? How can
we know what's really happening (apart from
checking .aduna/openrdf-sesame/logs)?

We could clear the repository and start from scratch, but as we are
evaluating Owlim for production usage we need to find out what's the issue
to better understand if it fits our needs.

--
Dott. Stefano Parmesan
Web Developer ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy




--
Dott. Stefano Parmesan
Web Developer 

[Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-28 Thread Joshua Greben
Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone could offer 
advice for loading a large triple store. I am trying to load 670M triples into 
a repository using the openrdf-sesame workbench under tomcat6 on a single linux 
VM with 64-bit hardware and 64GB of memory.  

My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:

...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 
'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 
'repository-type' to 'file-repository'

The loading came to a standstill after 19 hours and tomcat threw an 
OutOfMemoryError: GC overhead limit exceeded. 

My question is what the application is doing with all this memory and whether I 
configured my instance correctly for this load to finish.  I also see a lot of 
entries in the main log such as this:

[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
Unescaped backslash in: L\'ambassadrice (314764886, -1)

Could these Rio errors be contributing to my troubles? I was also wondering 
if there was a way to configure logging to be able to track the application's 
progress. Right now these warnings are the only way I can tell how far the 
loading has progressed.

Advice from anyone who has experience successfully loading a large triplestore 
is much appreciated! Thanks in advance!

- Josh


Joshua Greben
Library Systems Programmer  Analyst
Stanford University Libraries
(650) 714-1937
jgre...@stanford.edu


___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE

2013-03-28 Thread Marek Šurek
Hi,
if you want to see progress in loading, there is and option to use standard 
curl command instead of openrdf-workbench. It gives you some information what 
is already loaded.
To load files into owlim(from .trig file), run this command in your linux shell 
:

curl -X POST -H Content-Type:application/x-trig -T 
/path/to/data/datafile.trig 
localhost:8080/openrdf-sesame/repositories/repository-name/statements

If you have xml style data, change content type to application/rdf+xml 



If you load big amount of data, I recommend to use configuration.xls which is 
part of OWLIM-SE.zip. It can help you to set datastore properly.

Hope this will help.

Best regards,
Marek




 From: Joshua Greben jgre...@stanford.edu
To: owlim-discussion@ontotext.com 
Sent: Thursday, 28 March 2013, 22:30
Subject: [Owlim-discussion] Loading a Large Triple Store using OWLIM-SE
 

Hello all,

I am new to this list and to OWLIM-SE and was wondering if anyone could offer 
advice for loading a large triple store. I am trying to load 670M triples into 
a repository using the openrdf-sesame workbench under tomcat6 on a single linux 
VM with 64-bit hardware and 64GB of memory.  

My JVM has the following: -Xms32g -Xmx32g -XX:MaxPermSize=256m

Here is the log info for my repository configuration:


...
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-id-size' to '32'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'enable-context-index' to 'false'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'entity-index-size' to '1'
[INFO ] 2013-03-27 13:57:00,720 [repositories/BFWorks_STF] Configured parameter 
'tuple-index-memory' to '1600m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'cache-memory' to '3200m'
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
tuples: 83886
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Cache pages for 
predicates: 0
[INFO ] 2013-03-27 13:57:00,721 [repositories/BFWorks_STF] Configured parameter 
'storage-folder' to 'storage'
[INFO ] 2013-03-27 13:57:00,741 [repositories/BFWorks_STF] Configured parameter 
'in-memory-literal-properties' to 'false'
[INFO ] 2013-03-27 13:57:00,742 [repositories/BFWorks_STF] Configured parameter 
'repository-type' to 'file-repository'

The loading came to a standstill after 19 hours and tomcat threw an 
OutOfMemoryError: GC overhead limit exceeded. 

My question is what the application is doing with all this memory and whether I 
configured my instance correctly for this load to finish.  I also see a lot of 
entries in the main log such as this:

[WARN ] 2013-03-28 08:50:59,114 [repositories/BFWorks_STF] [Rio error] 
Unescaped backslash in: L\'ambassadrice (314764886, -1)

Could these Rio errors be contributing to my troubles? I was also wondering 
if there was a way to configure logging to be able to track the application's 
progress. Right now these warnings are the only way I can tell how far the 
loading has progressed.

Advice from anyone who has experience successfully loading a large triplestore 
is much appreciated! Thanks in advance!

- Josh


Joshua Greben
Library Systems Programmer  Analyst
Stanford University Libraries                
(650) 714-1937
jgre...@stanford.edu

 

___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion___
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion