Re: [Moses-support] loading time for large LMs

Kenneth Heafield Tue, 12 Apr 2016 04:59:10 -0700

Hi,

        Why is your system using 7 GB of swap out of 9 GB?  Moses is only
taking 147 GB out of 252 GB physical.  I smell other processes taking up
RAM, possibly those 5 stopped and 1 zombie.


Kenneth

On 04/12/2016 12:45 PM, Jorg Tiedemann wrote:
> 
>>
>> Did you remove all "lazyken" arguments from moses.ini?
> 
> Yes, I did.
> 
>>
>> Is the network filesystem Lustre?  If so, mmap will perform terribly and
>> you should use load=read or (better) load=parallel_read since reading
>> from Lustre is CPU-bound.
>>
> 
> Yes, I think so. Interesting with the parallel_read option. Can this
> hurt for some setups or could I use this as my standard?
> 
> 
>> Does the cluster management software/job scheduler/sysadmin impose a
>> resident memory limit?
>>
> 
> I don’t really know. I don’t really think so but I need to find out
> 
> 
>> Can you copy-paste `top' when it's running slow and the stderr at that
>> time?
> 
> 
> Here is top of my top when running on my test node:
> 
> top - 14:39:03 up 50 days,  5:47,  0 users,  load average: 1.97, 2.09, 3.85
> Tasks: 814 total,   3 running, 805 sleeping,   5 stopped,   1 zombie
> Cpu(s):  6.9%us,  6.2%sy,  0.0%ni, 86.9%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Mem:  264493500k total, 263614188k used,   879312k free,    68680k buffers
> Swap:  9775548k total,  7198920k used,  2576628k free, 69531796k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 42528 tiedeman  20   0  147g 147g  800 R 100.0 58.4  31:25.01 moses
> 
> stderr doesn’t say anything new besides of the message from starting the
> feature function loading
> 
> FeatureFunction: LM0 start: 16 end: 16
> line=KENLM load=parallel_read name=LM1 factor=0
> path=/homeappl/home/tiedeman/research/SMT/wmt16/fi-en/data/monolingual/cc.tok.3.en.trie.kenlm
> order=3
> 
> 
> I try with /tmp/ now as well (it takes time to shuffle around the big
> files though).
> 
> Jörg
> 
> 
>>
>> On 04/12/2016 08:26 AM, Jorg Tiedemann wrote:
>>>
>>> No, it’s definitely not waiting for input … the same setup works for
>>> smaller models.
>>>
>>> I have the models on a work partition on our cluster.
>>> This is probably not good enough and I will try to move data to local
>>> tmp on the individual nodes before executing.
>>> Hopefully this helps. How would you do this if you want to distribute
>>> tuning?
>>>
>>> Thanks!
>>> Jörg
>>>
>>>
>>>
>>>
>>>
>>>> On 12 Apr 2016, at 09:34, Ondrej Bojar <bo...@ufal.mff.cuni.cz
>>>> <mailto:bo...@ufal.mff.cuni.cz>
>>>> <mailto:bo...@ufal.mff.cuni.cz>> wrote:
>>>>
>>>> Random suggestion: isn't it waiting for stdin for some strange
>>>> reason? ;-)
>>>>
>>>> O.
>>>>
>>>>
>>>> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <hieuho...@gmail.com
>>>> <mailto:hieuho...@gmail.com>
>>>> <mailto:hieuho...@gmail.com>> wrote:
>>>>> I assume that it's on local disk rather than a network drive.
>>>>>
>>>>> Are you sure it's still in the loading stage, and that it's loading
>>>>> kenlm,
>>>>> rather than the pt or lexicalized reordering model etc?
>>>>>
>>>>> If there's a way to make the model files available for download or to
>>>>> give
>>>>> me access your machine, i might be able to debug it
>>>>>
>>>>> Hieu Hoang
>>>>> http://www.hoang.co.uk/hieu
>>>>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <tiede...@gmail.com
>>>>> <mailto:tiede...@gmail.com>> wrote:
>>>>>
>>>>>>
>>>>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours
>>>>> now
>>>>>> and no sign to start decoding.
>>>>>> The disk is not terribly slow. cat worked without problem. I don’t
>>>>> know
>>>>>> what to do but I think that I have to give up for now.
>>>>>> Am I the only one who is experiencing such slow loading times?
>>>>>>
>>>>>> Thanks again for your help!
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <mo...@kheafield.com
>>>>>> <mailto:mo...@kheafield.com>>
>>>>> wrote:
>>>>>>
>>>>>> With load=read:
>>>>>>
>>>>>> Act like normal RAM as part of the Moses process.
>>>>>>
>>>>>> Supports huge pages via transparent huge pages, so it's slightly
>>>>> faster.
>>>>>>
>>>>>> Before loading cat file >/dev/null will just put things into cache
>>>>> that
>>>>>> were going to be read more or less like cat anyway.
>>>>>>
>>>>>> After loading cat file >/dev/null will hurt since there's the
>>>>> potential
>>>>>> to load the file into RAM twice and swap out bits of Moses.
>>>>>>
>>>>>> Memory is shared between threads, just not with the disk cache (ok
>>>>>> maybe, but only if they get huge pages support to work well) or other
>>>>>> processes that independently read the file.
>>>>>>
>>>>>> With load=populate:
>>>>>>
>>>>>> Load upfront, map it into the process, kernel seems to evict it
>>>>> first.
>>>>>>
>>>>>> Before loading cat file >/dev/null might help, but in theory
>>>>>> MAP_POPULATE should be doing much the same thing.
>>>>>>
>>>>>> After loading or during slow loading cat file >/dev/null can help
>>>>>> because it forces the data back into RAM.  This is particularly
>>>>> useful
>>>>>> if the Moses process came under memory pressure after loading, which
>>>>> can
>>>>>> include heavy disk activity even if RAM isn't full.
>>>>>>
>>>>>> Memory is shared with all other processes that mmap.
>>>>>>
>>>>>> With load=lazy:
>>>>>>
>>>>>> Map into the process with lazy loading (i.e. mmap without
>>>>> MAP_POPULATE).
>>>>>> Not recommended for decoding, but useful if you've got a 6 TB file
>>>>> and
>>>>>> want to send it a few 1000 queries.
>>>>>>
>>>>>> cat will definitely help here at any time.
>>>>>>
>>>>>> Memory is shared with all other processes that mmap.
>>>>>>
>>>>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote:
>>>>>>
>>>>>> Thanks for the quick reply.
>>>>>> I will try the load option.
>>>>>>
>>>>>> Quick question: You said that the memory will not be shared across
>>>>>> processes with that option. Does that mean that it will load the LM
>>>>> for
>>>>>> each thread? That would mean a lot in my setup.
>>>>>>
>>>>>> By the way, I also did the cat >/dev/null thing but I didn’t have the
>>>>>> impression that this changed a lot. Does it really help and how much
>>>>>> would you usually gain? Thanks again!
>>>>>>
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <mo...@kheafield.com
>>>>>> <mailto:mo...@kheafield.com>
>>>>>> <mailto:mo...@kheafield.com <mo...@kheafield.com
>>>>>> <mailto:mo...@kheafield.com>
>>>>>> <mailto:mo...@kheafield.com>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm assuming you have enough RAM to fit everything.  The kernel seems
>>>>>> to preferentially evict mmapped pages as memory usage approaches full
>>>>>> (it doesn't have to be full).  To work around this, use
>>>>>>
>>>>>> load=read
>>>>>>
>>>>>> in your moses.ini line for the models.  REMOVE any "lazyken" argument
>>>>>> which is deprecated and might override the load= argument.
>>>>>>
>>>>>> The effect of load=read is to malloc (ok, anonymous mmap which is how
>>>>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize
>>>>> for
>>>>>> huge pages) and read() the file into that memory.  It will no longer
>>>>>> share across processes, but memory will have the same swapiness as
>>>>> the
>>>>>> rest of the Moses process.
>>>>>>
>>>>>> Lazy loading will only make things worse here.
>>>>>>
>>>>>> Kenneth
>>>>>>
>>>>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a large language model from the common crawl data set and it
>>>>>> takes forever to load when running moses.
>>>>>> My model is a trigram kenlm binarized with quantization, trie
>>>>> structures
>>>>>> and pointer compression (-a 22 -q 8 -b 8).
>>>>>> The model is about 140GB and it takes hours to load (I’m still
>>>>> waiting).
>>>>>> I run on a machine with 256GB RAM ...
>>>>>>
>>>>>> I also tried lazy loading without success. Is this normal or do I do
>>>>>> something wrong?
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu
>>>>>> <Moses-support@mit.edu
>>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>>>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu
>>>>>> <Moses-support@mit.edu
>>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>>>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> <mailto:Moses-support@mit.edu> <mailto:Moses-support@mit.edu>
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>> -- 
>>>> Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz
>>>> <mailto:bo...@ufal.mff.cuni.cz>
>>>> <mailto:bo...@ufal.mff.cuni.cz>)
>>>> http://www.cuni.cz/~obo
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] loading time for large LMs

Reply via email to