One more observation/suggestion about the scripts and then I'll give it 
 a break.

 Several scripts including:
 $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
 $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
 $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl

 reference the common ./nonbreaking_prefixes folder. The 
 split-sentences.perl script uses FindBin, but it would fail because 
 there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ subfolder.

 Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it be a 
 good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly 
 $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared 
 resources? In this case, it could be 
 $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/


 On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar 
 <tah...@precisiontranslationtools.com> wrote:
> I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin
> qw($Bin);":
>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl
>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl
>   $SCRIPTS_ROOTDIR/training/train-model.perl
>   $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl
>   $SCRIPTS_ROOTDIR/training/mert-moses.pl
>   $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl
>   $SCRIPTS_ROOTDIR/training/zmert-moses.pl
>   $SCRIPTS_ROOTDIR/analysis/weight-scan.pl
>   $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
>   $SCRIPTS_ROOTDIR/ems/experiment.perl
>   $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl
>   $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
>
> I tested the following with "use FindBin qw($RealBin);" and
> associated updates:
>   tokenizer.perl
>   mert-moses.pl
>   train-model.perl
>
> When a user calls the scripts directly in a terminal/console, $Bin
> and $RealBin versions function identically. If a user calls the
> $RealBin version via a symlink, the script resolves the scripts' real
> path, finds the relative paths to dependencies and runs fine. The 
> $Bin
> version resolves to the symlink's path, can't find the relative paths
> to dependencies and fails.
>
> I'd like to propose that with the planned changes to eliminate
> references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the Style
> Guide include an update for scripts to reference their $RealBin vice
> the $Bin to support the use of symlinks for all users.
>
>
>
> On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar
> <tah...@precisiontranslationtools.com> wrote:
>> Ok, I see now. train-model.perl uses:
>>
>>    use FindBin qw($Bin);
>>    my $SCRIPTS_ROOTDIR = $Bin;
>>
>> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into the
>> $prefix/bin folder. By default $prefix resolves to /usr/local.
>> Therefore, $prefix/bin is in $PATH. Our approach confuses the 
>> relative
>> path references in the script that rely on $Bin (without a separate
>> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation 
>> on
>> line 1439 (step 5) caused my system to break because it resolved to
>> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which wasn't
>> there.
>>
>> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR,
>> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was 
>> the
>> latter, not the former, that enabled the script run. My bad in my
>> email. There are similar challenges with other scripts, such as
>> tokenizer.perl and detokenizer.perl which reference subfolders
>> relative to their location.
>>
>> Also, thanks for sharing the goal. Effectively, you'll have $prefixA
>> and $prefixB. Our goals are a little different. We're trying to
>> install moses & all components into a hierarchy complaint to the 
>> Linux
>> Foundation's Filesystem Hierarchy Standard (FHA), in such a way it's
>> usable across most other Posix systems. Here's what we've come up
>> with:
>>
>> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their
>> resources, as well as the moses and MGIZA++ "make install" paths,
>> share the same $prefix/bin, $prefix/lib, $prefix/include (etc)
>> subfolders. We had to play some tricks with $SRILM to support
>> $prefix/sbin and the $MACHINE_TYPE references, but that support has
>> been there a long time and works well. Amazingly, we've been lucky
>> that there are no filename conflicts except for mkcls in GIZA++ and
>> MGIZA++.
>>
>> We place all component's original scripts under
>> $prefix/lib/<component>, as laid out by the component's authors. For
>> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp and
>> configure moses' $SCRIPTS_ROOTDIR becomes $prefix/lib/mosesdecoder,
>> etc. We use of symlinks in $prefix/bin to reference various scripts 
>> in
>> $PATH (as above).
>>
>> According to http://perldoc.perl.org/FindBin.html, it looks like
>> changing to the script's "real" location vice command line reference
>> is possible with:
>>
>>    use FindBin qw($RealBin);
>>
>> This change will eliminate our need to symlink subfolders in
>> $prefix/bin, and still allow other Moses users to move $prefixA tree
>> anywhere they like. However, it might require a bit more editing in
>> each script to verify/resolve any relative references.
>>
>> For now, I'll continue using folder symlinks. I'll give you access 
>> to
>> a preview copy of our new binary install program via FTP with some
>> instructions how to make $prefix a private user folder instead of a
>> system level. Our approaches might give you some ideas about how 
>> moses
>> can support "linguistic programs over time". For example, in 
>> addition
>> to the "standard" Moses components (giza-pp, mgizapp, irstlm, 
>> randlm,
>> srilm), we currently install DoMY CE (corpus preparation and
>> translation workflow), BerkeleyAligner (phrase alignment), 
>> Champollion
>> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word
>> seg), MeCab (Japanese word seg), SWATH (open source Thai word seg),
>> and Langmatch (language ID) add-ons with this approach. We've also
>> mapped out support for m4loc, Okapi Framework, and the moses team's
>> sentence aligner.
>>
>> Regards,
>> Tom
>>
>>
>> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang
>> <fishandfrol...@gmail.com> wrote:
>>> On 21/06/2012 16:46, Tom Hoar wrote:
>>>> Hieu,
>>>>
>>>> We're implementing these changes into DoMY. Some of these broke 
>>>> our
>>>> layout, but that's okay. We're adapt to your changes.
>>> thanks, much appreciated.
>>>> Updating -external-bin-dir was easy. Then, we scrapped our 
>>>> references
>>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. 
>>>> This,
>>>> however, caused step 5 to break. On closer inspection, a reference 
>>>> to
>>>> $SCRIPTS_ROOTDIR is still necessary at this point.
>>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's 
>>> set in
>>> line 16-20. I didn't change these lines, I just removed the ability 
>>> to
>>> override it with some other value from the command line.
>>>
>>> are you sure step 5 breaks?
>>>>
>>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR 
>>>> value?
>>>> Since all of the scripts are in subfolders from $SCRIPTS_ROOTDIR, 
>>>> do
>>>> you think it's possible or feasible to set  $SCRIPTS_ROOTDIR ==
>>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring 
>>>> bjam
>>>> for a build. However, if you have another layout in mind, would 
>>>> this
>>>> cause conflicts?
>>> I'm aiming for everyone to set up moses like so
>>>     [directory A]/scripts
>>>     [directory A]/bin
>>>     [directory B]   = external bin directory
>>> the external bin directory has giza/mgiza (and hopefully linguistic
>>> programs over time).
>>>
>>> when you update moses, just replace scripts/ and bin/ . The 
>>> external bin
>>> directory can stay constant
>>>
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang
>>>> <fishandfrol...@gmail.com> wrote:
>>>>> Hi all
>>>>>
>>>>> if you're checking out the latest github code, there are some 
>>>>> changes
>>>>> you should be aware of:
>>>>>     1. There is a new argument to train-model.perl
>>>>>             -external-bin-dir [path]
>>>>>          This points to the directory where Giza++/mgiza lives. 
>>>>> Setting
>>>>> this is MANDATORY if you're using train-model.perl to do the word
>>>>> alignment. It used to be hardcoded in the perl code itself.
>>>>>     2. All the training programs have been moved into the 
>>>>> directory
>>>>>            [MOSES-ROOT]/bin
>>>>>         They should be run from there, not from wherever the 
>>>>> source
>>>>> code is.
>>>>>     3. To roll out, simply copy the 2 directories
>>>>>            [MOSES-ROOT]/bin
>>>>>            [MOSES-ROOT]/scripts
>>>>>         to wherever you want, eg.
>>>>>            /home/hieu/moses/bin
>>>>>            /home/hieu/moses/scripts
>>>>>     4. If you don't want to move it anywhere, you can run it from 
>>>>> where
>>>>> you downloaded.
>>>>>     5. The EMS and example files have been updated.
>>>>>
>>>>> Hope this is ok for everyone. It may break some people's setup. 
>>>>> If
>>>>> possible, please change your setup. It's gonna help us all in the 
>>>>> long
>>>>> run. If not, flame me & i'll see what I can do
>>>>>
>>>>> HH
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to