On 2010-06-10, at 08:48, Cory Spitz wrote:
> Slightly off-topic, but did anyone else notice that readahead is triggering 
> the shrinking and page writeout?  ll_read_ahead_page() clears __GFP_WAIT but 
> it seems sane to me that it should also drop __GFP_IO.  In my opinion, Lustre
> shouldn't speculatively force other pages out.  Only when there is an actual,
> demonstrated need, should it force out the other pages.

We used to have a kernel patch (and more recently I implemented this using 
generic kernel EXPORT_FUNCTION() operations) to implement 
grab_cache_page_nowait_gfp() to allow specifying the GFP mask when allocating 
pages for readahead.  Without that, the kernel uses the GFP mask from the 
address space, which we have no control over.

That said, disabling memory pressure from readahead has a negative side effect 
also.  When the client memory is full (i.e. all the time) there is NO readahead 
generated because the readahead grab_cache_page_nowait_gfp() calls always fail, 
and this degrades performance significantly, since the reads are now 
synchronous and a single stream, instead of pipelined.

While it is true that some speculative readahead may result in evicting other 
useful pages from cache, it is more likely to be prefetching useful pages that 
the current process wants to use immediately and evicting old/useless pages.  

The readahead algorithms definitely need some improvement, and it is possible 
that it is over-zealous in this case, but it isn't possible to say in this 
case.  

I'd say the core problem is that no reclaim is being triggered and/or the 
reclaim is deadlocked on the cache cleaning, and that is the first issue to 
focus on here.


> Jason Rappleye wrote:
> [...]
>> When we first saw this problem a few weeks ago it appeared that client  
>> processes were stuck in uninterruptible sleep in blk_congestion_wait,  
>> but upon further examination we saw they were still issuing 1-2 I/Os  
>> per second. The kernel stack trace looked like this:
>> 
>> <ffffffff8013d6ec>{internal_add_timer+21}
>> <ffffffff8030fbc4>{schedule_timeout+138}
>> <ffffffff8013def0>{process_timeout+0}
>> <ffffffff8030f3ec>{io_schedule_timeout+88}
>> <ffffffff801f1d74>{blk_congestion_wait+102}
>> <ffffffff80148d46>{autoremove_wake_function+0}
>> <ffffffff8016851d>{throttle_vm_writeout+33}
>> <ffffffff8016aa0e>{remove_mapping+133}
>> <ffffffff8016b8e8>{shrink_zone+3367}
>> <ffffffff80218799>{find_next_bit+96}
>> <ffffffff8016c435>{zone_reclaim+430}
>> <ffffffff8843c3ba>{:ptlrpc:ldlm_lock_decref+154}
>> <ffffffff8852df5a>{:osc:cache_add_extent+1178}
>> <ffffffff8860f838>{:lustre:ll_removepage+488}
>> <ffffffff8852152a>{:osc:osc_prep_async_page+426}
>> <ffffffff8860c953>{:lustre:llap_shrink_cache+1715}
>> <ffffffff88524224>{:osc:osc_queue_group_io+644}
>> <ffffffff801671a2>{get_page_from_freelist+222}
>> <ffffffff8016756d>{__alloc_pages+113}
>> <ffffffff80162416>{add_to_page_cache+57}
>> <ffffffff80162c49>{grab_cache_page_nowait+53}
>> <ffffffff8860e368>{:lustre:ll_readahead+2584}
>> <ffffffff8851db55>{:osc:osc_check_rpcs+773}
>> <ffffffff8012c52c>{__wake_up+56}
>> <ffffffff88515db1>{:osc:loi_list_maint+225}
>> <ffffffff88330288>{:libcfs:cfs_alloc+40}
>> <ffffffff88615557>{:lustre:ll_readpage+4775}
>> <ffffffff885b3109>{:lov:lov_fini_enqueue_set+585}
>> <ffffffff88438cc7>{:ptlrpc:ldlm_lock_add_to_lru+119}
>> <ffffffff8843719e>{:ptlrpc:lock_res_and_lock+190}
>> <ffffffff883d792f>{:obdclass:class_handle_unhash_nolock+207}
>> <ffffffff8843bb1c>{:ptlrpc:ldlm_lock_decref_internal+1356}
>> <ffffffff885b235f>{:lov:lov_finish_set+1695}
>> <ffffffff801629bd>{do_generic_mapping_read+525}
>> <ffffffff8016476e>{file_read_actor+0}
>> <ffffffff8016328b>{__generic_file_aio_read+324}
>> <ffffffff80164576>{generic_file_readv+143}
>> <ffffffff885b07c9>{:lov:lov_merge_lvb+281}
>> <ffffffff80148d46>{autoremove_wake_function+0}
>> <ffffffff8019f156>{__touch_atime+118}
>> <ffffffff885ef821>{:lustre:ll_file_readv+6385}
>> <ffffffff80216f4f>{__up_read+16}
>> <ffffffff885efada>{:lustre:ll_file_read+26}
>> <ffffffff801878f0>{vfs_read+212}
>> <ffffffff80187cd0>{sys_read+69}
>> <ffffffff8010ae5e>{system_call+126}
>> 
> [...]


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to