tuning now
so working fine so far

btw, in SMB there was another issue with the split command in extraction.



Le 29/10/2015 21:44, Vincent Nguyen a écrit :
> I'll mount NFS instead and will confirm if working.
> thanks
>
> Le 29/10/2015 21:31, Kenneth Heafield a écrit :
>> Hi,
>>
>>      The way I do temporary files is mkstemp, unlink, and then use them.
>> That way the kernel will still clean up if the process meets an untimely
>> death.
>>
>>      Given that this issue appears on a SAMBA filesystem (aka SMB) but not
>> on a POSIX filesystem, I'm guessing it has to do with SAMBA
>> infelicities.  Like this old bug:
>> https://bugzilla.samba.org/show_bug.cgi?id=998 .
>>
>>      I'd like to make it work, but temporary files on SAMBA is pretty low
>> priority.  However, if you can provide a backtrace (after compiling with
>> "debug" added to the command) I can try to turn that segfault into an
>> error message.
>>
>> Kenneth
>>
>> On 10/29/2015 08:15 PM, Vincent Nguyen wrote:
>>> it's the same machine in my last test ...
>>>
>>> let me explain :
>>>
>>> Master = Ubuntu 14.04 which is my original machine with Moses and all my
>>> other langue tools in /home/moses
>>>
>>> I shared with a smb sharepoint /home/moses as "mosesshare"
>>>
>>> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/
>>>
>>> did the same on master
>>>
>>> so cd/netshr shows the content of /home/moses absolutely perfectly on
>>> the 3 machines (master and 2 nodes)
>>>
>>> I think you should ne able to replicate without having to handle sge or
>>> nodes. Just on 1 machine.
>>>
>>>
>>> Le 29/10/2015 20:59, Kenneth Heafield a écrit :
>>>> Yes.
>>>>
>>>> Also this is all very odd.  What file system is /netshr ?
>>>>
>>>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote:
>>>>> Hi,
>>>>> Do you think in the meantime can I just use -T with a local temporary
>>>>> directory ?
>>>>>
>>>>> -------- Message transféré --------
>>>>> Sujet :     Re: [Moses-support] Moses on SGE clarification
>>>>> Date :     Thu, 29 Oct 2015 17:45:01 +0100
>>>>> De :     Vincent Nguyen <vngu...@neuf.fr>
>>>>> Pour :     moses-support@mit.edu
>>>>>
>>>>>
>>>>>
>>>>> Ken,
>>>>>
>>>>> I just did some further testing on the master node that HAS all
>>>>> installed.
>>>>> same error as is.
>>>>>
>>>>> /netshr/mosesdecoder/bin/lmplz --text
>>>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>>> /netshr/working-en-fr/lm -S 20%
>>>>>
>>>>> /netshr is a mounting point of /home/moses/
>>>>>
>>>>> so what I did is that I replaced /netshr/ by /home/moses/
>>>>> first 2 instances => same error
>>>>>
>>>>> if I replace in the -T option /netshr by /home/moses
>>>>> it works.
>>>>>
>>>>> so obviously there is an issue here
>>>>>
>>>>>
>>>>>
>>>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit :
>>>>>> So we're clear, it runs correctly on the local machine but not when you
>>>>>> run it through SGE?  In that case, I suspect it's library version
>>>>>> differences.
>>>>>>
>>>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>>>>>>> I get this error :
>>>>>>>
>>>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz
>>>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>>>>> /netshr/working-en-fr/lm -S 20%
>>>>>>> === 1/5 Counting and sorting n-grams ===
>>>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7
>>>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>>>>>>
>>>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @
>>>>>>> ****************************************************************************************************
>>>>>>>
>>>>>>> Segmentation fault (core dumped)
>>>>>>> moses@sgenode1:/netshr/working-en-fr$
>>>>>>>
>>>>>>> I installed the libgoogle-pertools-dev but same error.
>>>>>>> Just to be clear, all these packages below are just necessary to build
>>>>>>> Moses, do I need specific packages
>>>>>>> to run one or other binary ?
>>>>>>> confused....
>>>>>>>
>>>>>>>
>>>>>>>           Ubuntu
>>>>>>>
>>>>>>> Install the following packages using the command
>>>>>>>
>>>>>>>        sudo apt-get install [package name]
>>>>>>>
>>>>>>> Packages:
>>>>>>>
>>>>>>>        g++
>>>>>>>        git
>>>>>>>        subversion
>>>>>>>        automake
>>>>>>>        libtool
>>>>>>>        zlib1g-dev
>>>>>>>        libboost-all-dev
>>>>>>>        libbz2-dev
>>>>>>>        liblzma-dev
>>>>>>>        python-dev
>>>>>>>        graphviz
>>>>>>>        imagemagick
>>>>>>>        libgoogle-perftools-dev (for tcmalloc)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit :
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> make sure that all the paths are valid on all the nodes --- so
>>>>>>>> definitely no relative paths.
>>>>>>>> And of course, the binaries need to be executable on all nodes as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> -phi
>>>>>>>>
>>>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <vngu...@neuf.fr
>>>>>>>>>>> <mailto:vngu...@neuf.fr>> wrote:
>>>>>>>>         OK guys, not an easy stuff ...
>>>>>>>>         I fought to get the prerequisites working but but now at least
>>>>>>>>         jobs start .....
>>>>>>>>
>>>>>>>>         and crash.
>>>>>>>>
>>>>>>>>         I'll post later the details of the preliminary steps, could
>>>>>>>> be useful.
>>>>>>>>
>>>>>>>>         my crash is when lplmz starts.
>>>>>>>>
>>>>>>>>         I have a sharepoint mounted on my nodes and all bin are well
>>>>>>>> seen
>>>>>>>>         from the nodes, including the lplmz program.
>>>>>>>>
>>>>>>>>         but I was thinking, do I need to actually install some
>>>>>>>> packages on
>>>>>>>>         the nodes themselves ? I mean packages that do not fall under
>>>>>>>>         /mosesdecoder/ folder ?
>>>>>>>>
>>>>>>>>
>>>>>>>>         thanks,
>>>>>>>>
>>>>>>>>         V
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>         Le 29/10/2015 13:26, Philipp Koehn a écrit :
>>>>>>>>>         Hi,
>>>>>>>>>
>>>>>>>>>         these machine names are just there for convenience.
>>>>>>>>>
>>>>>>>>>         If you want experiment.perl to submit jobs per qsub,
>>>>>>>>>         all you have to do is to run experiment.perl with the
>>>>>>>>>         additional switch "-cluster".
>>>>>>>>>
>>>>>>>>>         You can also put the head node's name into the
>>>>>>>>>         experiment.machines file, then you do not need to
>>>>>>>>>         use the switch anymore.
>>>>>>>>>
>>>>>>>>>         -phi
>>>>>>>>>
>>>>>>>>>         On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen
>>>>>>>>> <vngu...@neuf.fr >>>>      <mailto:vngu...@neuf.fr>> wrote:
>>>>>>>>>
>>>>>>>>>             Hi there,
>>>>>>>>>
>>>>>>>>>             I need some clarification before screwing up  some files.
>>>>>>>>>             I just setup a SGE cluster with a Master + 2 Nodes.
>>>>>>>>>
>>>>>>>>>             to make it clear let say my cluster name is "default",
>>>>>>>>> my master
>>>>>>>>>             headnode is "master", my 2 other nodes are "node1" and
>>>>>>>>> "node2"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             for EMS :
>>>>>>>>>
>>>>>>>>>             I opened the default experiment.machines file and I see :
>>>>>>>>>
>>>>>>>>>             cluster: townhill seville hermes lion seville sannox
>>>>>>>>> lutzow
>>>>>>>>>             frontend
>>>>>>>>>             multicore-4: freddie
>>>>>>>>>             multicore-8: tyr thor odin crom
>>>>>>>>>             multicore-16: saxnot vali vili freyja bragi hoenir
>>>>>>>>>             multicore-24: syn hel skaol saga buri loki sif magni
>>>>>>>>>             multicore-32: gna snotra lofn thrud
>>>>>>>>>
>>>>>>>>>             townhill and others are what ? name machines / nodes ?
>>>>>>>>> name
>>>>>>>>>             of several
>>>>>>>>>             clusters ?
>>>>>>>>>             should I just put "default" or "master node1 node2" ?
>>>>>>>>>
>>>>>>>>>             multicore-X: should I put machine names here
>>>>>>>>>             if my 3 machines are 8 cores each
>>>>>>>>>             multicore-8: master node1 node2
>>>>>>>>>             right ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             then in the config file for EMS:
>>>>>>>>>
>>>>>>>>>             #generic-parallelizer =
>>>>>>>>>             $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>>>>>>             #generic-parallelizer =
>>>>>>>>>            
>>>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>>>>>
>>>>>>>>>             which one should  take if my nodes are multicore ?
>>>>>>>>> still the
>>>>>>>>>             first one ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             ### cluster settings (if run on a cluster machine)
>>>>>>>>>             # number of jobs to be submitted in parallel
>>>>>>>>>             #
>>>>>>>>>             #jobs = 10
>>>>>>>>>             should I count approx 1 job per core on the total cores
>>>>>>>>> of my
>>>>>>>>>             3 machines ?
>>>>>>>>>
>>>>>>>>>             # arguments to qsub when scheduling a job
>>>>>>>>>             #qsub-settings = ""
>>>>>>>>>             can this stay empty ?
>>>>>>>>>
>>>>>>>>>             # project for priviledges and usage accounting
>>>>>>>>>             #qsub-project = iccs_smt
>>>>>>>>>             standard value ?
>>>>>>>>>
>>>>>>>>>             # memory and time
>>>>>>>>>             #qsub-memory = 4
>>>>>>>>>             #qsub-hours = 48
>>>>>>>>>             4 what ? GB ?
>>>>>>>>>
>>>>>>>>>             ### multi-core settings
>>>>>>>>>             # when the generic parallelizer is used, the number of
>>>>>>>>> cores
>>>>>>>>>             # specified here
>>>>>>>>>             cores = 4
>>>>>>>>>             is this ignored if generic-parallelizer.perl is chosen ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             is there a way to put more load on one specific node ?
>>>>>>>>>
>>>>>>>>>             Many thanks,
>>>>>>>>>             V.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             _______________________________________________
>>>>>>>>>             Moses-support mailing list
>>>>>>>>>             Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>>>>>             http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to