[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring
[ https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298820#comment-15298820 ] Thamme Gowda commented on JOSHUA-270: - Hi [~lewismc], I made a script to setup the environment for pipeline.pl script without touching it . May be helpful for testing and refactoring. {code} #!/usr/bin/env bash echo "STEP: Going to get berkeleyaligner jar" wget https://github.com/apache/incubator-joshua/raw/e70677d2eab23daa7082173e6fe337d68aa12230/lib/berkeleyaligner.jar \ -O $JOSHUA/lib/berkeleyaligner.jar echo "STEP: Going to build GIZA" cd $JOSHUA/ext/giza-pp/ make all make install echo "STEP: Going to build symal" cd $JOSHUA/ext/symal/ make cd $JOSHUA echo "STEP: Going to get Hadoop distribution" wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz \ -O $JOSHUA/lib/hadoop-2.5.2.tar.gz cd $JOSHUA echo "STEP: Getting thrax" mkdir -p thrax wget -O /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip https://github.com/joshua-decoder/thrax/archive/e6195e4a1f60edc58448e8922991fe6938c6daba.zip unzip /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip mv thrax-e6195e4a1f60edc58448e8922991fe6938c6daba $JOSHUA/thrax echo "STEP: Building Thrax" cd $JOSHUA/thrax ant cd $JOSHUA {code} > pipeline.pl needs major refactoring > --- > > Key: JOSHUA-270 > URL: https://issues.apache.org/jira/browse/JOSHUA-270 > Project: Joshua > Issue Type: Bug > Components: pipeline >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney > Fix For: 6.1 > > > Right now > [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] > is well over 2000 lines long and extremely difficult to navigate. > I propose the following > * All ENV is refactored into an pipeline_environment file > * All Command line parsing and definitions are refactored into a > pipeline_cli file > * Sanity checking is refactored into a pipeline_sanity_check file > * Dependenct Variable Checking is refactored into > pipeline_dependent_variable_setting file > * filter and preprocess corpora is refactored into > pipeline_filter_preprocess_corpora > * pipeline_subsampling becomes a file > * pipeline_alignment becomes a file > * pipeline_parsing becomes a file > * pipeline_thrax becomes a file > * pipeline_tuning becomes a file > * pipeline_testing becomes a file > * pipeline_subreoutines becomes a file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-271) Thrax invocation should not reply upon $HADOOP being set
Lewis John McGibbney created JOSHUA-271: --- Summary: Thrax invocation should not reply upon $HADOOP being set Key: JOSHUA-271 URL: https://issues.apache.org/jira/browse/JOSHUA-271 Project: Joshua Issue Type: Bug Components: pipeline, thrax Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.1 Right now one cannot run thrax unless the $HADOOP env variable is defined. Every time the hadoop script is invoked it means that the path is coded as $HADOOP/bin/hadoop however what happens if you are using a VM (Vagrant) to connect to a cluster for which no $HADOOP env variable is defined? The hadoop script should be on the path and available to use from there. The only check which should be made is whether it is available from the path or not, if it is not then start_hadoop_cluster subroutine can be called. This reduces code and makes more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-270) pipeline.pl needs major refactoring
Lewis John McGibbney created JOSHUA-270: --- Summary: pipeline.pl needs major refactoring Key: JOSHUA-270 URL: https://issues.apache.org/jira/browse/JOSHUA-270 Project: Joshua Issue Type: Bug Components: pipeline Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.1 Right now [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] is well over 2000 lines long and extremely difficult to navigate. I propose the following * All ENV is refactored into an pipeline_environment file * All Command line parsing and definitions are refactored into a pipeline_cli file * Sanity checking is refactored into a pipeline_sanity_check file * Dependenct Variable Checking is refactored into pipeline_dependent_variable_setting file * filter and preprocess corpora is refactored into pipeline_filter_preprocess_corpora * pipeline_subsampling becomes a file * pipeline_alignment becomes a file * pipeline_parsing becomes a file * pipeline_thrax becomes a file * pipeline_tuning becomes a file * pipeline_testing becomes a file * pipeline_subreoutines becomes a file -- This message was sent by Atlassian JIRA (v6.3.4#6332)