[GitHub] incubator-joshua pull request: Removed a few unneeded allocations

2016-05-24 Thread mjpost
Github user mjpost commented on the pull request:

https://github.com/apache/incubator-joshua/pull/11#issuecomment-221328502
  
This was merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (JOSHUA-270) pipeline.pl needs major refactoring

2016-05-24 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-270:
---

 Summary: pipeline.pl needs major refactoring
 Key: JOSHUA-270
 URL: https://issues.apache.org/jira/browse/JOSHUA-270
 Project: Joshua
  Issue Type: Bug
  Components: pipeline
Affects Versions: 6.0.5
Reporter: Lewis John McGibbney
 Fix For: 6.1


Right now 
[pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
 is well over 2000 lines long and extremely difficult to navigate. 
I propose the following
 * All ENV is refactored into an pipeline_environment file
 * All Command line parsing and definitions are refactored into a pipeline_cli 
file
 * Sanity checking is refactored into a pipeline_sanity_check file
 * Dependenct Variable Checking is refactored into 
pipeline_dependent_variable_setting file
 * filter and preprocess corpora is refactored into 
pipeline_filter_preprocess_corpora
 * pipeline_subsampling becomes a file
 * pipeline_alignment becomes a file
 * pipeline_parsing becomes a file
 * pipeline_thrax becomes a file
 * pipeline_tuning becomes a file
 * pipeline_testing becomes a file
 * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-271) Thrax invocation should not reply upon $HADOOP being set

2016-05-24 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-271:
---

 Summary: Thrax invocation should not reply upon $HADOOP being set
 Key: JOSHUA-271
 URL: https://issues.apache.org/jira/browse/JOSHUA-271
 Project: Joshua
  Issue Type: Bug
  Components: pipeline, thrax
Affects Versions: 6.0.5
Reporter: Lewis John McGibbney
 Fix For: 6.1


Right now one cannot run thrax unless the $HADOOP env variable is defined. 
Every time the hadoop script is invoked it means that the path is coded as 
$HADOOP/bin/hadoop however what happens if you are using a VM (Vagrant) to 
connect to a cluster for which no $HADOOP env variable is defined? 
The hadoop script should be on the path and available to use from there. The 
only check which should be made is whether it is available from the path or 
not, if it is not then start_hadoop_cluster subroutine can be called. This 
reduces code and makes more sense.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring

2016-05-24 Thread Thamme Gowda (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298820#comment-15298820
 ] 

Thamme Gowda commented on JOSHUA-270:
-

Hi [~lewismc], I made a script to setup the environment for pipeline.pl script 
without touching it .
 May be helpful for testing and refactoring.

{code}
#!/usr/bin/env bash

echo "STEP: Going to get berkeleyaligner jar"
wget  
https://github.com/apache/incubator-joshua/raw/e70677d2eab23daa7082173e6fe337d68aa12230/lib/berkeleyaligner.jar
 \
-O $JOSHUA/lib/berkeleyaligner.jar

echo "STEP: Going to build GIZA"
cd $JOSHUA/ext/giza-pp/
make all
make install

echo "STEP: Going to build symal"
cd $JOSHUA/ext/symal/
make


cd $JOSHUA
echo "STEP: Going to get Hadoop distribution"
wget 
http://apache.mirrors.tds.net/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz \
 -O $JOSHUA/lib/hadoop-2.5.2.tar.gz

cd $JOSHUA
echo "STEP: Getting thrax"
mkdir -p thrax
wget -O /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip 
https://github.com/joshua-decoder/thrax/archive/e6195e4a1f60edc58448e8922991fe6938c6daba.zip
unzip /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip
mv thrax-e6195e4a1f60edc58448e8922991fe6938c6daba $JOSHUA/thrax
echo "STEP: Building Thrax"
cd $JOSHUA/thrax
ant

cd $JOSHUA

{code}

> pipeline.pl needs major refactoring
> ---
>
> Key: JOSHUA-270
> URL: https://issues.apache.org/jira/browse/JOSHUA-270
> Project: Joshua
>  Issue Type: Bug
>  Components: pipeline
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now 
> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
>  is well over 2000 lines long and extremely difficult to navigate. 
> I propose the following
>  * All ENV is refactored into an pipeline_environment file
>  * All Command line parsing and definitions are refactored into a 
> pipeline_cli file
>  * Sanity checking is refactored into a pipeline_sanity_check file
>  * Dependenct Variable Checking is refactored into 
> pipeline_dependent_variable_setting file
>  * filter and preprocess corpora is refactored into 
> pipeline_filter_preprocess_corpora
>  * pipeline_subsampling becomes a file
>  * pipeline_alignment becomes a file
>  * pipeline_parsing becomes a file
>  * pipeline_thrax becomes a file
>  * pipeline_tuning becomes a file
>  * pipeline_testing becomes a file
>  * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring

2016-05-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298836#comment-15298836
 ] 

Lewis John McGibbney commented on JOSHUA-270:
-

Thanks, have you looked at pipeline.pl? pipeline.pl does not build any 
software, it actually runs a Joshua processing pipeline.

> pipeline.pl needs major refactoring
> ---
>
> Key: JOSHUA-270
> URL: https://issues.apache.org/jira/browse/JOSHUA-270
> Project: Joshua
>  Issue Type: Bug
>  Components: pipeline
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now 
> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
>  is well over 2000 lines long and extremely difficult to navigate. 
> I propose the following
>  * All ENV is refactored into an pipeline_environment file
>  * All Command line parsing and definitions are refactored into a 
> pipeline_cli file
>  * Sanity checking is refactored into a pipeline_sanity_check file
>  * Dependenct Variable Checking is refactored into 
> pipeline_dependent_variable_setting file
>  * filter and preprocess corpora is refactored into 
> pipeline_filter_preprocess_corpora
>  * pipeline_subsampling becomes a file
>  * pipeline_alignment becomes a file
>  * pipeline_parsing becomes a file
>  * pipeline_thrax becomes a file
>  * pipeline_tuning becomes a file
>  * pipeline_testing becomes a file
>  * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring

2016-05-24 Thread Thamme Gowda (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298856#comment-15298856
 ] 

Thamme Gowda commented on JOSHUA-270:
-

Yes I tried to make it work with maven build. I saw that pipeline.pl requires 
many external libraries so made that previous script to get and place them 
in-place.

I followed http://joshua.incubator.apache.org/6.0/quick-start.html, but it 
failed after many steps. I couldn't completely fix it because of my limited 
perl knowledge.


> pipeline.pl needs major refactoring
> ---
>
> Key: JOSHUA-270
> URL: https://issues.apache.org/jira/browse/JOSHUA-270
> Project: Joshua
>  Issue Type: Bug
>  Components: pipeline
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now 
> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
>  is well over 2000 lines long and extremely difficult to navigate. 
> I propose the following
>  * All ENV is refactored into an pipeline_environment file
>  * All Command line parsing and definitions are refactored into a 
> pipeline_cli file
>  * Sanity checking is refactored into a pipeline_sanity_check file
>  * Dependenct Variable Checking is refactored into 
> pipeline_dependent_variable_setting file
>  * filter and preprocess corpora is refactored into 
> pipeline_filter_preprocess_corpora
>  * pipeline_subsampling becomes a file
>  * pipeline_alignment becomes a file
>  * pipeline_parsing becomes a file
>  * pipeline_thrax becomes a file
>  * pipeline_tuning becomes a file
>  * pipeline_testing becomes a file
>  * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-joshua pull request: Removed a few unneeded allocations

2016-05-24 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/incubator-joshua/pull/11#issuecomment-221427785
  
You can automatically close a PR by adding `This closes #` when 
committing to ASF Git.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring

2016-05-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299141#comment-15299141
 ] 

Matt Post commented on JOSHUA-270:
--

The pipeline is a huge mess, probably not worth salvaging. I'm hoping (maybe 
this year?) to rewrite it, perhaps using this: 
https://github.com/jhclark/ducttape/

> pipeline.pl needs major refactoring
> ---
>
> Key: JOSHUA-270
> URL: https://issues.apache.org/jira/browse/JOSHUA-270
> Project: Joshua
>  Issue Type: Bug
>  Components: pipeline
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now 
> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
>  is well over 2000 lines long and extremely difficult to navigate. 
> I propose the following
>  * All ENV is refactored into an pipeline_environment file
>  * All Command line parsing and definitions are refactored into a 
> pipeline_cli file
>  * Sanity checking is refactored into a pipeline_sanity_check file
>  * Dependenct Variable Checking is refactored into 
> pipeline_dependent_variable_setting file
>  * filter and preprocess corpora is refactored into 
> pipeline_filter_preprocess_corpora
>  * pipeline_subsampling becomes a file
>  * pipeline_alignment becomes a file
>  * pipeline_parsing becomes a file
>  * pipeline_thrax becomes a file
>  * pipeline_tuning becomes a file
>  * pipeline_testing becomes a file
>  * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-252) Make it possible to use Maven to build Joshua

2016-05-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299142#comment-15299142
 ] 

Matt Post commented on JOSHUA-252:
--

Are there any updates on this? I'd love to get this pulled into master. We have 
one other change for master, and then we might call that the first Apache 
release, version 6.1. We have a lot of other ideas for the version 7 roadmap, 
I'll create a ticket tomorrow.

> Make it possible to use Maven to build Joshua
> -
>
> Key: JOSHUA-252
> URL: https://issues.apache.org/jira/browse/JOSHUA-252
> Project: Joshua
>  Issue Type: Improvement
>  Components: build
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.1
>
>
> As per discussion on the dev@ list for now Ant is the official build tool for 
> Joshua however we would like to possibly switch to Maven if / when someone is 
> able to do so.
> Assigning to me for now as I could be able to look into this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-271) Thrax invocation should not reply upon $HADOOP being set

2016-05-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299181#comment-15299181
 ] 

Matt Post commented on JOSHUA-271:
--

It looks like there are just two lines where this occurs. I will remove the 
"$HADOOP/bin/" portions of the invocation and push to master soon.

> Thrax invocation should not reply upon $HADOOP being set
> 
>
> Key: JOSHUA-271
> URL: https://issues.apache.org/jira/browse/JOSHUA-271
> Project: Joshua
>  Issue Type: Bug
>  Components: pipeline, thrax
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now one cannot run thrax unless the $HADOOP env variable is defined. 
> Every time the hadoop script is invoked it means that the path is coded as 
> $HADOOP/bin/hadoop however what happens if you are using a VM (Vagrant) to 
> connect to a cluster for which no $HADOOP env variable is defined? 
> The hadoop script should be on the path and available to use from there. The 
> only check which should be made is whether it is available from the path or 
> not, if it is not then start_hadoop_cluster subroutine can be called. This 
> reduces code and makes more sense.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-joshua pull request:

2016-05-24 Thread mjpost
Github user mjpost commented on the pull request:


https://github.com/apache/incubator-joshua/commit/d0f7b5308264e746741301cd3c5981041a2c4d2d#commitcomment-17608002
  
This closed #11.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request: Removed a few unneeded allocations

2016-05-24 Thread mjpost
Github user mjpost commented on the pull request:

https://github.com/apache/incubator-joshua/pull/11#issuecomment-221487292
  
@hsaputra, I'm familiar with that notation for issues, but pull requests 
should close automatically when git notices that the commit ID has become part 
of the history for master...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---