I build a custom recaser model that tokenizes the sentence into 
characters. It works well but it is slower than other models. In one 
batch, there is an unexpectedly long source sentence. Line 1017 (of 2400 
lines) has 3,854 characters, which becomes 3,854 tokens as the Moses input.

When it  gets to this long line, all stdout pauses. Top shows Moses, 
configured to run on 6 cores, is running @ 598%. That's great. CPU 
processing continues at <600% CPU load for about 5-6 minutes. Then, it 
throttles back to "only" 100% load.

Is is possible that when Moses drops to 100% load, it has finished 
processing the balance of the 2400 lines and the final thread continues 
processing that extremely long line?
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to