Hi dev@, Sitting facing some issues with Thrax using Joshua master branch. I invoke Joshua as follows
/usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero --corpus /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune --test /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test --source en --target ru --readme "Experiment 1 Run 1 of ru --> en model training" --aligner berkeley --tmp /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step thrax --no-prepare --alignment alignments/training.align --hadoop-mem 10g I make the first step thrax as I have previously computed my alignment as indicated by the arguments. My Thrax log is available at https://www.dropbox.com/s/pxld70ki656fn13/thrax.log?dl=0. In the log you will see an exception as follows 16/10/19 22:56:59 WARN mapred.LocalJobRunner: job_local1314413872_0002 java.lang.Exception: java.lang.RuntimeException: Word id 2146928632 out of range 0 1727042 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.RuntimeException: Word id 2146928632 out of range 0 1727042 at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Partition.getPartition(WordLexicalProbabilityCalculator.java:133) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Partition.getPartition(WordLexicalProbabilityCalculator.java:121) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:82) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:28) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I see no other issues until the end of the Thrax log where I see class edu.jhu.thrax.hadoop.jobs.TargetWordGivenSourceWordProbabilityJob FAILED class edu.jhu.thrax.hadoop.jobs.OutputJob PREREQ_FAILED class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob PREREQ_FAILED class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature SUCCESS class edu.jhu.thrax.hadoop.jobs.ExtractionJob SUCCESS class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature SUCCESS class edu.jhu.thrax.hadoop.jobs.VocabularyJob SUCCESS class edu.jhu.thrax.hadoop.jobs.SourceWordGivenTargetWordProbabilityJob FAILED This issue has previously been reported by Matt over on https://github.com/joshua-decoder/thrax/issues/10 Debugging right now folks. Lewis -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney