[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

Lewis John McGibbney (JIRA) Wed, 24 Aug 2016 08:44:47 -0700

    [ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435133#comment-15435133
 ]


Lewis John McGibbney commented on JOSHUA-304:
---------------------------------------------

It may help for me to post the options available within the current berkeley 
aligner jar which was built when I installed Joshua
{code}
lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ java -jar 
./lib/berkeleyaligner.jar  -help
Usage:
  log.maxIndLevel                <  int> : Maximum indent level. [10]
  log.msPerLine                  <  int> : Maximum number of milliseconds 
between consecutive lines of output. [1000]
  log.file                       <  str> : File to write log. []
  log.stdout                     < bool> : Whether to output to the console. 
[true]
  log.note                       <  str> : Dummy placeholder for a comment []
  log.forcePrint                 < bool> : Force printing from logs* [false]
  log.maxPrintErrors             <  int> : Maximum number of errors (via 
error()) to print [10000]
  EMWordAligner.nullProb         <  dbl> : How to assign null-word 
probabilities (=1 means 1/n) [1.0E-6]
  EMWordAligner.usePosteriorDecoding < bool> : Use posterior decoding 
(recommended for best performance). [true]
  EMWordAligner.posteriorDecodingThreshold <  dbl> : Threshold in [0,1] for 
deciding whether an alignment should exist. [0.5]
  EMWordAligner.mergeConsiderNull < bool> : When merging expected sufficient 
statistics, take into account the NULL (fix). [false]
  EMWordAligner.handleUnknownWords < bool> : Don't crash with unknown words 
(better to train on test set). [false]
  EMWordAligner.priorFraction    <  dbl> : Fraction of a count to add for links 
in dictionary prior (1 works well). [0.0]
  EMWordAligner.numThreads       <  int> : Number of concurrent threads to use 
during E-step (set to number of processors). [1]
  EMWordAligner.safeConcurrency  < bool> : Safe concurrency (gets rid of 
concurrency warnings at the expense of speed) [false]
  EMWordAligner.evaluateDuringTraining < bool> : Whether to evaluate the model 
after each training iteration (slower, more memory). [false]
  TreeWalkModel.usePushProbabilities < bool> : Separate parameters for moving 
and pushing. [true]
  TreeWalkModel.conditionOnTag   < bool> : Whether to condition distortion on 
the tag types. [true]
  TreeWalkModel.cacheTreePaths   < bool> : Whether to cache paths through trees 
(uses lots of memory; faster). [false]
  Evaluator.searchForThreshold   < bool> : Evaluate using line search [false]
  Evaluator.thresholdIntervals   <  int> : Sets the number of intervals for 
posterior threshold line search [20]
  Evaluator.saveAlignmentObjects < bool> : Save object files for proposed 
alignments (large files) [false]
  Main.trainSources              < str*> : Directories or files containing 
training files. [example/train]
  Main.testSources               < str*> : Directory or file containing testing 
files. [example/test]
  Main.sentences                 <  int> : Maximum number of the training 
sentences to use [2147483647]
  Main.offsetTrainingSentences   <  int> : Skip this number of the first 
training sentences [0]
  Main.maxTestSentences          <  int> : Maximum number of the test sentences 
to use [2147483647]
  Main.offsetTestSentences       <  int> : Skip this number of the first test 
sentences [0]
  Main.foreignSuffix             <  str> : Foreign language file suffix [f]
  Main.englishSuffix             <  str> : English language file suffix [e]
  Main.itgTrainTestSplitPoint    <  int> : When writing test (ITG) posteriors, 
where to divide train/test data? [0]
  Main.itgInputDir               <  str> : What directory should we dump ITG 
test data to? []
  Main.reverseAlignments         < bool> : Reverse test set alignments (i.e., 
foreign to english) [false]
  Main.oneIndexed                < bool> : Are alignments one-indexed (default 
== no, 0-indexed) [false]
  Main.lowercaseWords            < bool> : Convert all words to lowercase 
[false]
  Main.leaveTrainingOnDisk       < bool> : Don't load and store the training 
set upfront (slower, but less memory) [false]
  Main.saveRejects               < bool> : Save rejected sentence pairs [false]
  Main.forwardModels             <enum*> : Which word alignment model to use in 
the forward direction. [MODEL1 HMM]
  Main.reverseModels             <enum*> : Which word alignment model to use in 
the backward direction. [MODEL1 HMM]
  Main.iters                     < int*> : Number of iterations to run the 
model. [5 5]
  Main.mode                      <enum*> : Whether to train the two models 
jointly or independently. [JOINT JOINT]
  Main.trainingCacheMaxSize      <  int> : Max sentence length for caching the 
HMM trellis (efficiency only). [100]
  Main.loadParamsDir             <  str> : Directory to load parameters from. []
  Main.loadLexicalModelOnly      < bool> : When true, the lexical model is 
loaded, but the distortion model is not. [true]
  Main.saveParams                < bool> : Whether to save parameters. [true]
  Main.saveAlignOutput           < bool> : Whether to save test alignments 
produced by the system. [true]
  Main.alignTraining             < bool> : Produce two GIZA files and a Pharaoh 
file for translation [false]
  Main.writePosteriors           < bool> : Produce posterior alignment weight 
file when aligning training (lots of disk space) [false]
  Main.writePosteriorsThreshold  <  dbl> : In outputting posteriors, where do 
we threshold them (0.0 == all posteriors) [0.0]
  Main.saveLexicalWeights        < bool> : Produce two lexical translation 
tables for lexical weighting (unsupported) [false]
  Main.competitiveThresholding   < bool> : Use competitive thresholding to 
eliminate distributed many-to-one alignments [false]
  Main.evaluateDirectionalModels < bool> : Evaluate directional models alone 
[false]
  Main.evaluateHardCombination   < bool> : Evaluate hard alignment combinations 
[false]
  Main.evaluateSoftCombination   < bool> : Evaluate soft alignment combinations 
[false]
  Main.dictionary                <  str> : Bilingual dictionary file (e.g., 
en-ch.dict) [example/en-ch.dict]
  Main.splitDefinitions          < bool> : Breaks up multi-word definitions and 
enters each word into the dictionary map [false]
  Main.rantOutput                < bool> : Output a lot of junk (largely 
unsupported) [false]
  exec.create                    < bool> : Whether to create a directory for 
this run; if not, don't generate output files [false]
  exec.monitor                   < bool> : Whether to create a thread to 
monitor the status. [false]
  exec.execDir                   <  str> : Directory to put all output files; 
if blank, use execPoolDir. []
  exec.execPoolDir               <  str> : Directory which contains all the 
executions (or symlinks). []
  exec.actualExecPoolDir         <  str> : Directory which actually holds the 
executions. []
  exec.overwriteExecDir          < bool> : Overwrite the contents of the 
execDir if it doesn't exist (e.g., when running a thunk). [false]
  exec.useStandardExecPoolDirStrategy < bool> : Assume in the run directory, 
automatically set execPoolDir and actualExecPoolDir [false]
  exec.printOptionsAndExit       < bool> : Simply print options and exit. 
[false]
  exec.miscOptions               < str*> : Miscellaneous options (written to 
options.map and output.map, displayed in servlet); example: a=3 b=4 []
  exec.addToView                 < str*> : Name of the view to add this 
execution to in the servlet []
  exec.recordPath                <  str> : Record file to write to []
  exec.charEncoding              <  str> : Character encoding []
  exec.jarFiles                  < str*> : Name of jar files to load prior to 
execution []
  exec.dontInitializeJars        < bool> : Skip initialization of jars [false]
  exec.initializeJarsAfterDirCreation < bool> : Initialize from jars after 
copying them to a newly created execDir [false]
  exec.makeThunk                 < bool> : Make a thunk (a delayed 
computation). [false]
  exec.thunkAutoQueue            < bool> : A note to the servlet to 
automatically run the thunk when it sees it [false]
  exec.thunkPriority             <  int> : Priority of the thunk. [0]
  exec.thunkMainClassName        <  str> : Launch this class []
  exec.thunkJavaOpts             <  str> : Java options to pass to Java when 
later running the thunk []
  exec.thunkUseScala             < bool> : Use Scala to run rather than Java 
[false]
  exec.thunkReqMemory            <  int> : Use Scala to run rather than Java 
(in MB) [1024]
  exec.dontCatchExceptions       < bool> : Whether to catch exceptions (ignored 
when making a thunk) [false]
{code}

> word-align.conf alignment template file not compatible with berkeley aligner
> ----------------------------------------------------------------------------
>
>                 Key: JOSHUA-304
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-304
>             Project: Joshua
>          Issue Type: Bug
>          Components: alignment, berkeley, templates
>    Affects Versions: 6.0.5
>            Reporter: Lewis John McGibbney
>            Priority: Blocker
>             Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>       at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>       at java.lang.Integer.parseInt(Integer.java:580)
>       at java.lang.Integer.parseInt(Integer.java:615)
>       at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>       at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>       at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>       at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>       at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>       at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>       at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

Reply via email to