[ https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298820#comment-15298820 ]
Thamme Gowda commented on JOSHUA-270: ------------------------------------- Hi [~lewismc], I made a script to setup the environment for pipeline.pl script without touching it . May be helpful for testing and refactoring. {code} #!/usr/bin/env bash echo "STEP: Going to get berkeleyaligner jar" wget https://github.com/apache/incubator-joshua/raw/e70677d2eab23daa7082173e6fe337d68aa12230/lib/berkeleyaligner.jar \ -O $JOSHUA/lib/berkeleyaligner.jar echo "STEP: Going to build GIZA" cd $JOSHUA/ext/giza-pp/ make all make install echo "STEP: Going to build symal" cd $JOSHUA/ext/symal/ make cd $JOSHUA echo "STEP: Going to get Hadoop distribution" wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz \ -O $JOSHUA/lib/hadoop-2.5.2.tar.gz cd $JOSHUA echo "STEP: Getting thrax" mkdir -p thrax wget -O /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip https://github.com/joshua-decoder/thrax/archive/e6195e4a1f60edc58448e8922991fe6938c6daba.zip unzip /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip mv thrax-e6195e4a1f60edc58448e8922991fe6938c6daba $JOSHUA/thrax echo "STEP: Building Thrax" cd $JOSHUA/thrax ant cd $JOSHUA {code} > pipeline.pl needs major refactoring > ----------------------------------- > > Key: JOSHUA-270 > URL: https://issues.apache.org/jira/browse/JOSHUA-270 > Project: Joshua > Issue Type: Bug > Components: pipeline > Affects Versions: 6.0.5 > Reporter: Lewis John McGibbney > Fix For: 6.1 > > > Right now > [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] > is well over 2000 lines long and extremely difficult to navigate. > I propose the following > * All ENV is refactored into an pipeline_environment file > * All Command line parsing and definitions are refactored into a > pipeline_cli file > * Sanity checking is refactored into a pipeline_sanity_check file > * Dependenct Variable Checking is refactored into > pipeline_dependent_variable_setting file > * filter and preprocess corpora is refactored into > pipeline_filter_preprocess_corpora > * pipeline_subsampling becomes a file > * pipeline_alignment becomes a file > * pipeline_parsing becomes a file > * pipeline_thrax becomes a file > * pipeline_tuning becomes a file > * pipeline_testing becomes a file > * pipeline_subreoutines becomes a file -- This message was sent by Atlassian JIRA (v6.3.4#6332)