I build a custom recaser model that tokenizes the sentence into characters. It works well but it is slower than other models. In one batch, there is an unexpectedly long source sentence. Line 1017 (of 2400 lines) has 3,854 characters, which becomes 3,854 tokens as the Moses input.
When it gets to this long line, all stdout pauses. Top shows Moses, configured to run on 6 cores, is running @ 598%. That's great. CPU processing continues at <600% CPU load for about 5-6 minutes. Then, it throttles back to "only" 100% load. Is is possible that when Moses drops to 100% load, it has finished processing the balance of the 2400 lines and the final thread continues processing that extremely long line? _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
