Hello everybody,

I'm having a problem with Stanbol trying to enhance a lot of somewhat
"large" documents (40000 to 60000 characters).

Depending on the enhancement chain i use, i get a timeouts earlier or
later. The timeouts is configured by default
(langdetect + token + pos + sentence + dbPedia) = timeout after like the
10th enhancement request.
(langdetect + token + dbPedia) = timeout after 10 min. something like that.

I monitored Stanbol in the first case (langdetect + token + pos + sentence
+ dbPedia) with Yourkit Java Profiler.

I noticed that CPU wise, the hotspots are
  -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
double)  with 11% of the time spent.
  -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.

Memory wise, the hotspots are:
  -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
double)  with 12% of space taken.

I modified the following parameters inf the
{stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
file.
org.apache.felix.eventadmin.ThreadPoolSize="100"
org.apache.felix.eventadmin.CacheSize="2048"

I kinda felt that it would delay the timeouts.

Anyway, I noticed that there would be A LOT of threads being created, then
immidiately going to "waiting" state, then dying after 60 seconds, exactly
the "stanbol.maxEnhancementJobWaitTime" parameter.

What other information can i provide ?

Reply via email to