Having written that, factoring the pipeline would be a good first step to replacing the guts of the pipeline. It's worth noting that many of these are already done:
- alignment is handled by $JOSHUA/scripts/training/paralign.pl - tuning is handled by $JOSHUA/scripts/training/run_tuner.py - there is a script for running Thrax ($JOSHUA/scripts/training/run_thrax.py), but it is not pulled into the decoder yet However, Lewis' basic point stands: the pipeline is a mess, and it would be good to have good interfaces to each of the subtasks, as an intermediate step to replacing the logic of the pipeline with a more versatile (and readable) tool like ducttape. matt > On May 24, 2016, at 7:27 PM, Matt Post (JIRA) <j...@apache.org> wrote: > > > [ > https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299141#comment-15299141 > ] > > Matt Post commented on JOSHUA-270: > ---------------------------------- > > The pipeline is a huge mess, probably not worth salvaging. I'm hoping (maybe > this year?) to rewrite it, perhaps using this: > https://github.com/jhclark/ducttape/ > >> pipeline.pl needs major refactoring >> ----------------------------------- >> >> Key: JOSHUA-270 >> URL: https://issues.apache.org/jira/browse/JOSHUA-270 >> Project: Joshua >> Issue Type: Bug >> Components: pipeline >> Affects Versions: 6.0.5 >> Reporter: Lewis John McGibbney >> Fix For: 6.1 >> >> >> Right now >> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] >> is well over 2000 lines long and extremely difficult to navigate. >> I propose the following >> * All ENV is refactored into an pipeline_environment file >> * All Command line parsing and definitions are refactored into a >> pipeline_cli file >> * Sanity checking is refactored into a pipeline_sanity_check file >> * Dependenct Variable Checking is refactored into >> pipeline_dependent_variable_setting file >> * filter and preprocess corpora is refactored into >> pipeline_filter_preprocess_corpora >> * pipeline_subsampling becomes a file >> * pipeline_alignment becomes a file >> * pipeline_parsing becomes a file >> * pipeline_thrax becomes a file >> * pipeline_tuning becomes a file >> * pipeline_testing becomes a file >> * pipeline_subreoutines becomes a file > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332)