Its a bit hard to diagnose the problem, but my best guess here is that
for some reason the sample object stream is endless or the feature generation
is very slow.

Can you add a counter to your code which provides the sample object? It should not exceed your number of sentences, if the stream is endless it might be bigger after an hour or two.

Can you measure how many of them are processed per second (should be more than 1k samples per second) ,
if the throughput is too low it might just need a lot of time.

Jörn

On 04/29/2013 10:55 AM, Svetoslav Marinov wrote:
Yes, the process is at 100% CPU utilization and this is the only thing I
get from the jstack, no matter how many times I repeat it:

2013-04-29 10:47:17
Full thread dump OpenJDK 64-Bit Server VM (20.0-b12 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f31a8001000 nid=0xf42b waiting
on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x00007f31d009d800 nid=0xe272
runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f31d009b000 nid=0xe271
waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f31d0098800 nid=0xe270
waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f31d008a000 nid=0xe26f
runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f31d0078000 nid=0xe26e in
Object.wait() [0x00007f31ca3db000]
    java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000400b8f660> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
        - locked <0x0000000400b8f660> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x00007f31d0076000 nid=0xe26d in
Object.wait() [0x00007f31ca4dc000]
    java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000400b8f5f8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000400b8f5f8> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f31d0007800 nid=0xe267 waiting on condition
[0x00007f31d8923000]
    java.lang.Thread.State: RUNNABLE
        at java.util.Arrays.copyOfRange(Arrays.java:3221)
        at java.lang.String.<init>(String.java:233)
        at java.lang.StringBuilder.toString(StringBuilder.java:447)
        at
opennlp.tools.util.featuregen.TokenClassFeatureGenerator.createFeatures(Tok
enClassFeatureGenerator.java:46)
        at
opennlp.tools.util.featuregen.WindowFeatureGenerator.createFeatures(WindowF
eatureGenerator.java:109)
        at
opennlp.tools.util.featuregen.AggregatedFeatureGenerator.createFeatures(Agg
regatedFeatureGenerator.java:79)
        at
opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedF
eatureGenerator.java:69)
        at
opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameCo
ntextGenerator.java:118)
        at
opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameCo
ntextGenerator.java:37)
        at
opennlp.tools.namefind.NameFinderEventStream.generateEvents(NameFinderEvent
Stream.java:103)
        at
opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventSt
ream.java:126)
        at
opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventSt
ream.java:37)
        at
opennlp.tools.util.AbstractEventStream.hasNext(AbstractEventStream.java:71)
        at opennlp.model.HashSumEventStream.hasNext(HashSumEventStream.java:47)
        at
opennlp.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java
:126)
        at opennlp.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:81)
        at opennlp.model.TrainUtil.train(TrainUtil.java:173)
        at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366)
        at opennlptrainer.OpenNLPTrainer.main(OpenNLPTrainer.java:53)

"VM Thread" prio=10 tid=0x00007f31d0071000 nid=0xe26c runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f31d0012800 nid=0xe268
runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f31d0014800 nid=0xe269
runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f31d0016000 nid=0xe26a
runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f31d0018000 nid=0xe26b
runnable

"VM Periodic Task Thread" prio=10 tid=0x00007f31d00a0000 nid=0xe273
waiting on condition

JNI global references: 1139



On 2013-04-29 10:26, "Jörn Kottmann" <[email protected]> wrote:

On 04/29/2013 09:59 AM, Svetoslav Marinov wrote:
Below is a jstack output. It is not the third day it is running and
seems
like the process has hung up somewhere. I still haven't changed the
indexer to be one pass, so it is still two pass.

I just wonder how long I should wait?
Looks like its still fetching the events from the source, the method
we can see in the stack dump are calculating the hash sum of the events,
but I doubt
that this is broken.

Is the process at 100% CPU utilization? Is it still in the hash sum code
if you repeat the jstack command a few times?

Jörn



Reply via email to