tuning now so working fine so far btw, in SMB there was another issue with the split command in extraction.
Le 29/10/2015 21:44, Vincent Nguyen a écrit : > I'll mount NFS instead and will confirm if working. > thanks > > Le 29/10/2015 21:31, Kenneth Heafield a écrit : >> Hi, >> >> The way I do temporary files is mkstemp, unlink, and then use them. >> That way the kernel will still clean up if the process meets an untimely >> death. >> >> Given that this issue appears on a SAMBA filesystem (aka SMB) but not >> on a POSIX filesystem, I'm guessing it has to do with SAMBA >> infelicities. Like this old bug: >> https://bugzilla.samba.org/show_bug.cgi?id=998 . >> >> I'd like to make it work, but temporary files on SAMBA is pretty low >> priority. However, if you can provide a backtrace (after compiling with >> "debug" added to the command) I can try to turn that segfault into an >> error message. >> >> Kenneth >> >> On 10/29/2015 08:15 PM, Vincent Nguyen wrote: >>> it's the same machine in my last test ... >>> >>> let me explain : >>> >>> Master = Ubuntu 14.04 which is my original machine with Moses and all my >>> other langue tools in /home/moses >>> >>> I shared with a smb sharepoint /home/moses as "mosesshare" >>> >>> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/ >>> >>> did the same on master >>> >>> so cd/netshr shows the content of /home/moses absolutely perfectly on >>> the 3 machines (master and 2 nodes) >>> >>> I think you should ne able to replicate without having to handle sge or >>> nodes. Just on 1 machine. >>> >>> >>> Le 29/10/2015 20:59, Kenneth Heafield a écrit : >>>> Yes. >>>> >>>> Also this is all very odd. What file system is /netshr ? >>>> >>>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote: >>>>> Hi, >>>>> Do you think in the meantime can I just use -T with a local temporary >>>>> directory ? >>>>> >>>>> -------- Message transféré -------- >>>>> Sujet : Re: [Moses-support] Moses on SGE clarification >>>>> Date : Thu, 29 Oct 2015 17:45:01 +0100 >>>>> De : Vincent Nguyen <vngu...@neuf.fr> >>>>> Pour : moses-support@mit.edu >>>>> >>>>> >>>>> >>>>> Ken, >>>>> >>>>> I just did some further testing on the master node that HAS all >>>>> installed. >>>>> same error as is. >>>>> >>>>> /netshr/mosesdecoder/bin/lmplz --text >>>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa >>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T >>>>> /netshr/working-en-fr/lm -S 20% >>>>> >>>>> /netshr is a mounting point of /home/moses/ >>>>> >>>>> so what I did is that I replaced /netshr/ by /home/moses/ >>>>> first 2 instances => same error >>>>> >>>>> if I replace in the -T option /netshr by /home/moses >>>>> it works. >>>>> >>>>> so obviously there is an issue here >>>>> >>>>> >>>>> >>>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit : >>>>>> So we're clear, it runs correctly on the local machine but not when you >>>>>> run it through SGE? In that case, I suspect it's library version >>>>>> differences. >>>>>> >>>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote: >>>>>>> I get this error : >>>>>>> >>>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz >>>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa >>>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T >>>>>>> /netshr/working-en-fr/lm -S 20% >>>>>>> === 1/5 Counting and sorting n-grams === >>>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7 >>>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>>>>>> >>>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @ >>>>>>> **************************************************************************************************** >>>>>>> >>>>>>> Segmentation fault (core dumped) >>>>>>> moses@sgenode1:/netshr/working-en-fr$ >>>>>>> >>>>>>> I installed the libgoogle-pertools-dev but same error. >>>>>>> Just to be clear, all these packages below are just necessary to build >>>>>>> Moses, do I need specific packages >>>>>>> to run one or other binary ? >>>>>>> confused.... >>>>>>> >>>>>>> >>>>>>> Ubuntu >>>>>>> >>>>>>> Install the following packages using the command >>>>>>> >>>>>>> sudo apt-get install [package name] >>>>>>> >>>>>>> Packages: >>>>>>> >>>>>>> g++ >>>>>>> git >>>>>>> subversion >>>>>>> automake >>>>>>> libtool >>>>>>> zlib1g-dev >>>>>>> libboost-all-dev >>>>>>> libbz2-dev >>>>>>> liblzma-dev >>>>>>> python-dev >>>>>>> graphviz >>>>>>> imagemagick >>>>>>> libgoogle-perftools-dev (for tcmalloc) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit : >>>>>>>> Hi, >>>>>>>> >>>>>>>> make sure that all the paths are valid on all the nodes --- so >>>>>>>> definitely no relative paths. >>>>>>>> And of course, the binaries need to be executable on all nodes as >>>>>>>> well. >>>>>>>> >>>>>>>> -phi >>>>>>>> >>>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <vngu...@neuf.fr >>>>>>>>>>> <mailto:vngu...@neuf.fr>> wrote: >>>>>>>> OK guys, not an easy stuff ... >>>>>>>> I fought to get the prerequisites working but but now at least >>>>>>>> jobs start ..... >>>>>>>> >>>>>>>> and crash. >>>>>>>> >>>>>>>> I'll post later the details of the preliminary steps, could >>>>>>>> be useful. >>>>>>>> >>>>>>>> my crash is when lplmz starts. >>>>>>>> >>>>>>>> I have a sharepoint mounted on my nodes and all bin are well >>>>>>>> seen >>>>>>>> from the nodes, including the lplmz program. >>>>>>>> >>>>>>>> but I was thinking, do I need to actually install some >>>>>>>> packages on >>>>>>>> the nodes themselves ? I mean packages that do not fall under >>>>>>>> /mosesdecoder/ folder ? >>>>>>>> >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> V >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Le 29/10/2015 13:26, Philipp Koehn a écrit : >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> these machine names are just there for convenience. >>>>>>>>> >>>>>>>>> If you want experiment.perl to submit jobs per qsub, >>>>>>>>> all you have to do is to run experiment.perl with the >>>>>>>>> additional switch "-cluster". >>>>>>>>> >>>>>>>>> You can also put the head node's name into the >>>>>>>>> experiment.machines file, then you do not need to >>>>>>>>> use the switch anymore. >>>>>>>>> >>>>>>>>> -phi >>>>>>>>> >>>>>>>>> On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen >>>>>>>>> <vngu...@neuf.fr >>>> <mailto:vngu...@neuf.fr>> wrote: >>>>>>>>> >>>>>>>>> Hi there, >>>>>>>>> >>>>>>>>> I need some clarification before screwing up some files. >>>>>>>>> I just setup a SGE cluster with a Master + 2 Nodes. >>>>>>>>> >>>>>>>>> to make it clear let say my cluster name is "default", >>>>>>>>> my master >>>>>>>>> headnode is "master", my 2 other nodes are "node1" and >>>>>>>>> "node2" >>>>>>>>> >>>>>>>>> >>>>>>>>> for EMS : >>>>>>>>> >>>>>>>>> I opened the default experiment.machines file and I see : >>>>>>>>> >>>>>>>>> cluster: townhill seville hermes lion seville sannox >>>>>>>>> lutzow >>>>>>>>> frontend >>>>>>>>> multicore-4: freddie >>>>>>>>> multicore-8: tyr thor odin crom >>>>>>>>> multicore-16: saxnot vali vili freyja bragi hoenir >>>>>>>>> multicore-24: syn hel skaol saga buri loki sif magni >>>>>>>>> multicore-32: gna snotra lofn thrud >>>>>>>>> >>>>>>>>> townhill and others are what ? name machines / nodes ? >>>>>>>>> name >>>>>>>>> of several >>>>>>>>> clusters ? >>>>>>>>> should I just put "default" or "master node1 node2" ? >>>>>>>>> >>>>>>>>> multicore-X: should I put machine names here >>>>>>>>> if my 3 machines are 8 cores each >>>>>>>>> multicore-8: master node1 node2 >>>>>>>>> right ? >>>>>>>>> >>>>>>>>> >>>>>>>>> then in the config file for EMS: >>>>>>>>> >>>>>>>>> #generic-parallelizer = >>>>>>>>> $moses-script-dir/ems/support/generic-parallelizer.perl >>>>>>>>> #generic-parallelizer = >>>>>>>>> >>>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl >>>>>>>>> >>>>>>>>> which one should take if my nodes are multicore ? >>>>>>>>> still the >>>>>>>>> first one ? >>>>>>>>> >>>>>>>>> >>>>>>>>> ### cluster settings (if run on a cluster machine) >>>>>>>>> # number of jobs to be submitted in parallel >>>>>>>>> # >>>>>>>>> #jobs = 10 >>>>>>>>> should I count approx 1 job per core on the total cores >>>>>>>>> of my >>>>>>>>> 3 machines ? >>>>>>>>> >>>>>>>>> # arguments to qsub when scheduling a job >>>>>>>>> #qsub-settings = "" >>>>>>>>> can this stay empty ? >>>>>>>>> >>>>>>>>> # project for priviledges and usage accounting >>>>>>>>> #qsub-project = iccs_smt >>>>>>>>> standard value ? >>>>>>>>> >>>>>>>>> # memory and time >>>>>>>>> #qsub-memory = 4 >>>>>>>>> #qsub-hours = 48 >>>>>>>>> 4 what ? GB ? >>>>>>>>> >>>>>>>>> ### multi-core settings >>>>>>>>> # when the generic parallelizer is used, the number of >>>>>>>>> cores >>>>>>>>> # specified here >>>>>>>>> cores = 4 >>>>>>>>> is this ignored if generic-parallelizer.perl is chosen ? >>>>>>>>> >>>>>>>>> >>>>>>>>> is there a way to put more load on one specific node ? >>>>>>>>> >>>>>>>>> Many thanks, >>>>>>>>> V. >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Moses-support mailing list >>>>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> Moses-support@mit.edu >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> Moses-support@mit.edu >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>>> > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support