One more observation/suggestion about the scripts and then I'll give it a break.
Several scripts including: $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl reference the common ./nonbreaking_prefixes folder. The split-sentences.perl script uses FindBin, but it would fail because there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ subfolder. Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it be a good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared resources? In this case, it could be $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/ On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar <tah...@precisiontranslationtools.com> wrote: > I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin > qw($Bin);": > $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl > $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl > $SCRIPTS_ROOTDIR/training/train-model.perl > $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl > $SCRIPTS_ROOTDIR/training/mert-moses.pl > $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl > $SCRIPTS_ROOTDIR/training/zmert-moses.pl > $SCRIPTS_ROOTDIR/analysis/weight-scan.pl > $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl > $SCRIPTS_ROOTDIR/ems/experiment.perl > $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl > $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl > > I tested the following with "use FindBin qw($RealBin);" and > associated updates: > tokenizer.perl > mert-moses.pl > train-model.perl > > When a user calls the scripts directly in a terminal/console, $Bin > and $RealBin versions function identically. If a user calls the > $RealBin version via a symlink, the script resolves the scripts' real > path, finds the relative paths to dependencies and runs fine. The > $Bin > version resolves to the symlink's path, can't find the relative paths > to dependencies and fails. > > I'd like to propose that with the planned changes to eliminate > references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the Style > Guide include an update for scripts to reference their $RealBin vice > the $Bin to support the use of symlinks for all users. > > > > On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar > <tah...@precisiontranslationtools.com> wrote: >> Ok, I see now. train-model.perl uses: >> >> use FindBin qw($Bin); >> my $SCRIPTS_ROOTDIR = $Bin; >> >> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into the >> $prefix/bin folder. By default $prefix resolves to /usr/local. >> Therefore, $prefix/bin is in $PATH. Our approach confuses the >> relative >> path references in the script that rely on $Bin (without a separate >> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation >> on >> line 1439 (step 5) caused my system to break because it resolved to >> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which wasn't >> there. >> >> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR, >> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was >> the >> latter, not the former, that enabled the script run. My bad in my >> email. There are similar challenges with other scripts, such as >> tokenizer.perl and detokenizer.perl which reference subfolders >> relative to their location. >> >> Also, thanks for sharing the goal. Effectively, you'll have $prefixA >> and $prefixB. Our goals are a little different. We're trying to >> install moses & all components into a hierarchy complaint to the >> Linux >> Foundation's Filesystem Hierarchy Standard (FHA), in such a way it's >> usable across most other Posix systems. Here's what we've come up >> with: >> >> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their >> resources, as well as the moses and MGIZA++ "make install" paths, >> share the same $prefix/bin, $prefix/lib, $prefix/include (etc) >> subfolders. We had to play some tricks with $SRILM to support >> $prefix/sbin and the $MACHINE_TYPE references, but that support has >> been there a long time and works well. Amazingly, we've been lucky >> that there are no filename conflicts except for mkcls in GIZA++ and >> MGIZA++. >> >> We place all component's original scripts under >> $prefix/lib/<component>, as laid out by the component's authors. For >> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp and >> configure moses' $SCRIPTS_ROOTDIR becomes $prefix/lib/mosesdecoder, >> etc. We use of symlinks in $prefix/bin to reference various scripts >> in >> $PATH (as above). >> >> According to http://perldoc.perl.org/FindBin.html, it looks like >> changing to the script's "real" location vice command line reference >> is possible with: >> >> use FindBin qw($RealBin); >> >> This change will eliminate our need to symlink subfolders in >> $prefix/bin, and still allow other Moses users to move $prefixA tree >> anywhere they like. However, it might require a bit more editing in >> each script to verify/resolve any relative references. >> >> For now, I'll continue using folder symlinks. I'll give you access >> to >> a preview copy of our new binary install program via FTP with some >> instructions how to make $prefix a private user folder instead of a >> system level. Our approaches might give you some ideas about how >> moses >> can support "linguistic programs over time". For example, in >> addition >> to the "standard" Moses components (giza-pp, mgizapp, irstlm, >> randlm, >> srilm), we currently install DoMY CE (corpus preparation and >> translation workflow), BerkeleyAligner (phrase alignment), >> Champollion >> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word >> seg), MeCab (Japanese word seg), SWATH (open source Thai word seg), >> and Langmatch (language ID) add-ons with this approach. We've also >> mapped out support for m4loc, Okapi Framework, and the moses team's >> sentence aligner. >> >> Regards, >> Tom >> >> >> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang >> <fishandfrol...@gmail.com> wrote: >>> On 21/06/2012 16:46, Tom Hoar wrote: >>>> Hieu, >>>> >>>> We're implementing these changes into DoMY. Some of these broke >>>> our >>>> layout, but that's okay. We're adapt to your changes. >>> thanks, much appreciated. >>>> Updating -external-bin-dir was easy. Then, we scrapped our >>>> references >>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. >>>> This, >>>> however, caused step 5 to break. On closer inspection, a reference >>>> to >>>> $SCRIPTS_ROOTDIR is still necessary at this point. >>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's >>> set in >>> line 16-20. I didn't change these lines, I just removed the ability >>> to >>> override it with some other value from the command line. >>> >>> are you sure step 5 breaks? >>>> >>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR >>>> value? >>>> Since all of the scripts are in subfolders from $SCRIPTS_ROOTDIR, >>>> do >>>> you think it's possible or feasible to set $SCRIPTS_ROOTDIR == >>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring >>>> bjam >>>> for a build. However, if you have another layout in mind, would >>>> this >>>> cause conflicts? >>> I'm aiming for everyone to set up moses like so >>> [directory A]/scripts >>> [directory A]/bin >>> [directory B] = external bin directory >>> the external bin directory has giza/mgiza (and hopefully linguistic >>> programs over time). >>> >>> when you update moses, just replace scripts/ and bin/ . The >>> external bin >>> directory can stay constant >>> >>>> >>>> Tom >>>> >>>> >>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang >>>> <fishandfrol...@gmail.com> wrote: >>>>> Hi all >>>>> >>>>> if you're checking out the latest github code, there are some >>>>> changes >>>>> you should be aware of: >>>>> 1. There is a new argument to train-model.perl >>>>> -external-bin-dir [path] >>>>> This points to the directory where Giza++/mgiza lives. >>>>> Setting >>>>> this is MANDATORY if you're using train-model.perl to do the word >>>>> alignment. It used to be hardcoded in the perl code itself. >>>>> 2. All the training programs have been moved into the >>>>> directory >>>>> [MOSES-ROOT]/bin >>>>> They should be run from there, not from wherever the >>>>> source >>>>> code is. >>>>> 3. To roll out, simply copy the 2 directories >>>>> [MOSES-ROOT]/bin >>>>> [MOSES-ROOT]/scripts >>>>> to wherever you want, eg. >>>>> /home/hieu/moses/bin >>>>> /home/hieu/moses/scripts >>>>> 4. If you don't want to move it anywhere, you can run it from >>>>> where >>>>> you downloaded. >>>>> 5. The EMS and example files have been updated. >>>>> >>>>> Hope this is ok for everyone. It may break some people's setup. >>>>> If >>>>> possible, please change your setup. It's gonna help us all in the >>>>> long >>>>> run. If not, flame me & i'll see what I can do >>>>> >>>>> HH >>>>> >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support