Having written that, factoring the pipeline would be a good first step to 
replacing the guts of the pipeline. It's worth noting that many of these are 
already done:

- alignment is handled by $JOSHUA/scripts/training/paralign.pl
- tuning is handled by $JOSHUA/scripts/training/run_tuner.py
- there is a script for running Thrax ($JOSHUA/scripts/training/run_thrax.py), 
but it is not pulled into the decoder yet

However, Lewis' basic point stands: the pipeline is a mess, and it would be 
good to have good interfaces to each of the subtasks, as an intermediate step 
to replacing the logic of the pipeline with a more versatile (and readable) 
tool like ducttape.

matt


> On May 24, 2016, at 7:27 PM, Matt Post (JIRA) <j...@apache.org> wrote:
> 
> 
>   [ 
> https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299141#comment-15299141
>  ] 
> 
> Matt Post commented on JOSHUA-270:
> ----------------------------------
> 
> The pipeline is a huge mess, probably not worth salvaging. I'm hoping (maybe 
> this year?) to rewrite it, perhaps using this: 
> https://github.com/jhclark/ducttape/
> 
>> pipeline.pl needs major refactoring
>> -----------------------------------
>> 
>>               Key: JOSHUA-270
>>               URL: https://issues.apache.org/jira/browse/JOSHUA-270
>>           Project: Joshua
>>        Issue Type: Bug
>>        Components: pipeline
>>  Affects Versions: 6.0.5
>>          Reporter: Lewis John McGibbney
>>           Fix For: 6.1
>> 
>> 
>> Right now 
>> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
>>  is well over 2000 lines long and extremely difficult to navigate. 
>> I propose the following
>> * All ENV is refactored into an pipeline_environment file
>> * All Command line parsing and definitions are refactored into a 
>> pipeline_cli file
>> * Sanity checking is refactored into a pipeline_sanity_check file
>> * Dependenct Variable Checking is refactored into 
>> pipeline_dependent_variable_setting file
>> * filter and preprocess corpora is refactored into 
>> pipeline_filter_preprocess_corpora
>> * pipeline_subsampling becomes a file
>> * pipeline_alignment becomes a file
>> * pipeline_parsing becomes a file
>> * pipeline_thrax becomes a file
>> * pipeline_tuning becomes a file
>> * pipeline_testing becomes a file
>> * pipeline_subreoutines becomes a file
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)

Reply via email to