Hello all,

As a follow up to this thread 
https://www.spinics.net/lists/dri-devel/msg410740.html, I looked further into 
the idea of a shared LRU list for both ttm/bo and svm (to achieve a mutual 
eviction b/t them). I came up a rough design which I think better to align with 
you before I move too far.

As illustrated in below diagram:


  1.  There will be a global drm_lru_manager to maintain the shared LRU list. 
Each memory type will have a list, i.e., system memory has a list, gpu memory 
has a list. On system which has multiple gpu memory regions, we can have 
multiple GPU LRU
  2.  Move the LRU operation functions (such as bulk_move related) from 
ttm_resource_manager to drm_lru_manager
  3.  Drm_lru_manager should be initialized during device initialization. Ttm 
layer or svm layer can have weak reference to it for convenience.
  4.  Abstract a drm_lru_entity: This is supposed to be embedded in 
ttm_resource and svm_resource struct, as illustrated. Since ttm_resource and 
svm_resource are quite different in nature (ttm_resource is coupled with bo and 
svm_resource is struct page/pfn based), we can't provide unified eviction 
function for them. So a evict_func pointer is introduced in drm_lru_entity[Note 
1].
  5.  Lru_lock. Currently the lru_lock is in ttm_device structure. Ideally this 
can be moved to drm_lru_manager. But besides the lru list, lru_lock also 
protect other ttm specific thing such as ttm_device's pinned list. The current 
plan is to move lru_lock to xe_device/amdgpu_device and ttm_device or svm can 
have a weak reference for convenience.

[cid:image001.png@01DA0285.844FA910]


Note 1: I have been considering a structure like below. Each hmm/svm resource 
page is backed by a struct page and struct page already has a lru member. So 
theoretically  the LRU list can be as below. This way we don't need to 
introduce the drm_lru_entity struct. The difficulty is, without modify the 
linux struct page, we can't cast a lru node to struct page or struct 
ttm_resource, since we don't know whether this node is used by ttm or svm. This 
is why I had to introduce drm_lru_entity to hold an evict_function above. But 
let me know if you have better idea.

[cid:image002.png@01DA0289.9AD5D110]

Thanks,
Oak

Reply via email to