Hi, Why is your system using 7 GB of swap out of 9 GB? Moses is only taking 147 GB out of 252 GB physical. I smell other processes taking up RAM, possibly those 5 stopped and 1 zombie.
Kenneth On 04/12/2016 12:45 PM, Jorg Tiedemann wrote: > >> >> Did you remove all "lazyken" arguments from moses.ini? > > Yes, I did. > >> >> Is the network filesystem Lustre? If so, mmap will perform terribly and >> you should use load=read or (better) load=parallel_read since reading >> from Lustre is CPU-bound. >> > > Yes, I think so. Interesting with the parallel_read option. Can this > hurt for some setups or could I use this as my standard? > > >> Does the cluster management software/job scheduler/sysadmin impose a >> resident memory limit? >> > > I don’t really know. I don’t really think so but I need to find out > > >> Can you copy-paste `top' when it's running slow and the stderr at that >> time? > > > Here is top of my top when running on my test node: > > top - 14:39:03 up 50 days, 5:47, 0 users, load average: 1.97, 2.09, 3.85 > Tasks: 814 total, 3 running, 805 sleeping, 5 stopped, 1 zombie > Cpu(s): 6.9%us, 6.2%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 264493500k total, 263614188k used, 879312k free, 68680k buffers > Swap: 9775548k total, 7198920k used, 2576628k free, 69531796k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 42528 tiedeman 20 0 147g 147g 800 R 100.0 58.4 31:25.01 moses > > stderr doesn’t say anything new besides of the message from starting the > feature function loading > > FeatureFunction: LM0 start: 16 end: 16 > line=KENLM load=parallel_read name=LM1 factor=0 > path=/homeappl/home/tiedeman/research/SMT/wmt16/fi-en/data/monolingual/cc.tok.3.en.trie.kenlm > order=3 > > > I try with /tmp/ now as well (it takes time to shuffle around the big > files though). > > Jörg > > >> >> On 04/12/2016 08:26 AM, Jorg Tiedemann wrote: >>> >>> No, it’s definitely not waiting for input … the same setup works for >>> smaller models. >>> >>> I have the models on a work partition on our cluster. >>> This is probably not good enough and I will try to move data to local >>> tmp on the individual nodes before executing. >>> Hopefully this helps. How would you do this if you want to distribute >>> tuning? >>> >>> Thanks! >>> Jörg >>> >>> >>> >>> >>> >>>> On 12 Apr 2016, at 09:34, Ondrej Bojar <bo...@ufal.mff.cuni.cz >>>> <mailto:bo...@ufal.mff.cuni.cz> >>>> <mailto:bo...@ufal.mff.cuni.cz>> wrote: >>>> >>>> Random suggestion: isn't it waiting for stdin for some strange >>>> reason? ;-) >>>> >>>> O. >>>> >>>> >>>> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <hieuho...@gmail.com >>>> <mailto:hieuho...@gmail.com> >>>> <mailto:hieuho...@gmail.com>> wrote: >>>>> I assume that it's on local disk rather than a network drive. >>>>> >>>>> Are you sure it's still in the loading stage, and that it's loading >>>>> kenlm, >>>>> rather than the pt or lexicalized reordering model etc? >>>>> >>>>> If there's a way to make the model files available for download or to >>>>> give >>>>> me access your machine, i might be able to debug it >>>>> >>>>> Hieu Hoang >>>>> http://www.hoang.co.uk/hieu >>>>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <tiede...@gmail.com >>>>> <mailto:tiede...@gmail.com>> wrote: >>>>> >>>>>> >>>>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours >>>>> now >>>>>> and no sign to start decoding. >>>>>> The disk is not terribly slow. cat worked without problem. I don’t >>>>> know >>>>>> what to do but I think that I have to give up for now. >>>>>> Am I the only one who is experiencing such slow loading times? >>>>>> >>>>>> Thanks again for your help! >>>>>> >>>>>> Jörg >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <mo...@kheafield.com >>>>>> <mailto:mo...@kheafield.com>> >>>>> wrote: >>>>>> >>>>>> With load=read: >>>>>> >>>>>> Act like normal RAM as part of the Moses process. >>>>>> >>>>>> Supports huge pages via transparent huge pages, so it's slightly >>>>> faster. >>>>>> >>>>>> Before loading cat file >/dev/null will just put things into cache >>>>> that >>>>>> were going to be read more or less like cat anyway. >>>>>> >>>>>> After loading cat file >/dev/null will hurt since there's the >>>>> potential >>>>>> to load the file into RAM twice and swap out bits of Moses. >>>>>> >>>>>> Memory is shared between threads, just not with the disk cache (ok >>>>>> maybe, but only if they get huge pages support to work well) or other >>>>>> processes that independently read the file. >>>>>> >>>>>> With load=populate: >>>>>> >>>>>> Load upfront, map it into the process, kernel seems to evict it >>>>> first. >>>>>> >>>>>> Before loading cat file >/dev/null might help, but in theory >>>>>> MAP_POPULATE should be doing much the same thing. >>>>>> >>>>>> After loading or during slow loading cat file >/dev/null can help >>>>>> because it forces the data back into RAM. This is particularly >>>>> useful >>>>>> if the Moses process came under memory pressure after loading, which >>>>> can >>>>>> include heavy disk activity even if RAM isn't full. >>>>>> >>>>>> Memory is shared with all other processes that mmap. >>>>>> >>>>>> With load=lazy: >>>>>> >>>>>> Map into the process with lazy loading (i.e. mmap without >>>>> MAP_POPULATE). >>>>>> Not recommended for decoding, but useful if you've got a 6 TB file >>>>> and >>>>>> want to send it a few 1000 queries. >>>>>> >>>>>> cat will definitely help here at any time. >>>>>> >>>>>> Memory is shared with all other processes that mmap. >>>>>> >>>>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote: >>>>>> >>>>>> Thanks for the quick reply. >>>>>> I will try the load option. >>>>>> >>>>>> Quick question: You said that the memory will not be shared across >>>>>> processes with that option. Does that mean that it will load the LM >>>>> for >>>>>> each thread? That would mean a lot in my setup. >>>>>> >>>>>> By the way, I also did the cat >/dev/null thing but I didn’t have the >>>>>> impression that this changed a lot. Does it really help and how much >>>>>> would you usually gain? Thanks again! >>>>>> >>>>>> >>>>>> Jörg >>>>>> >>>>>> >>>>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <mo...@kheafield.com >>>>>> <mailto:mo...@kheafield.com> >>>>>> <mailto:mo...@kheafield.com <mo...@kheafield.com >>>>>> <mailto:mo...@kheafield.com> >>>>>> <mailto:mo...@kheafield.com>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I'm assuming you have enough RAM to fit everything. The kernel seems >>>>>> to preferentially evict mmapped pages as memory usage approaches full >>>>>> (it doesn't have to be full). To work around this, use >>>>>> >>>>>> load=read >>>>>> >>>>>> in your moses.ini line for the models. REMOVE any "lazyken" argument >>>>>> which is deprecated and might override the load= argument. >>>>>> >>>>>> The effect of load=read is to malloc (ok, anonymous mmap which is how >>>>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize >>>>> for >>>>>> huge pages) and read() the file into that memory. It will no longer >>>>>> share across processes, but memory will have the same swapiness as >>>>> the >>>>>> rest of the Moses process. >>>>>> >>>>>> Lazy loading will only make things worse here. >>>>>> >>>>>> Kenneth >>>>>> >>>>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have a large language model from the common crawl data set and it >>>>>> takes forever to load when running moses. >>>>>> My model is a trigram kenlm binarized with quantization, trie >>>>> structures >>>>>> and pointer compression (-a 22 -q 8 -b 8). >>>>>> The model is about 140GB and it takes hours to load (I’m still >>>>> waiting). >>>>>> I run on a machine with 256GB RAM ... >>>>>> >>>>>> I also tried lazy loading without success. Is this normal or do I do >>>>>> something wrong? >>>>>> Thanks for your help! >>>>>> >>>>>> Jörg >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu >>>>>> <Moses-support@mit.edu >>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>>> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu >>>>>> <Moses-support@mit.edu >>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>>> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> Moses-support@mit.edu >>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> -- >>>> Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz >>>> <mailto:bo...@ufal.mff.cuni.cz> >>>> <mailto:bo...@ufal.mff.cuni.cz>) >>>> http://www.cuni.cz/~obo >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support