On 17 Feb 2019, at 3:29, Matthew Wilcox wrote:

On Fri, Feb 15, 2019 at 02:08:26PM -0800, Zi Yan wrote:
+struct page_flags {
+       unsigned int page_error :1;
+       unsigned int page_referenced:1;
+       unsigned int page_uptodate:1;
+       unsigned int page_active:1;
+       unsigned int page_unevictable:1;
+       unsigned int page_checked:1;
+       unsigned int page_mappedtodisk:1;
+       unsigned int page_dirty:1;
+       unsigned int page_is_young:1;
+       unsigned int page_is_idle:1;
+       unsigned int page_swapcache:1;
+       unsigned int page_writeback:1;
+       unsigned int page_private:1;
+       unsigned int __pad:3;
+};

I'm not sure how to feel about this. It's a bit fragile versus somebody adding new page flags. I don't know whether it's needed or whether you can just
copy page->flags directly because you're holding PageLock.

I agree with you that current way of copying page flags individually could miss new page flags. I will try to come up with something better. Copying page->flags as a whole might not simply work, since the upper part of page->flags has the page node information, which should not be changed. I think I need to add a helper function to just copy/exchange
all page flags, like calling migrate_page_stats() twice.

+static void exchange_page(char *to, char *from)
+{
+       u64 tmp;
+       int i;
+
+       for (i = 0; i < PAGE_SIZE; i += sizeof(tmp)) {
+               tmp = *((u64 *)(from + i));
+               *((u64 *)(from + i)) = *((u64 *)(to + i));
+               *((u64 *)(to + i)) = tmp;
+       }
+}

I have a suspicion you'd be better off allocating a temporary page and
using copy_page().  Some architectures have put a lot of effort into
making copy_page() run faster.

When I am doing exchange_pages() between two NUMA nodes on a x86_64 machine, I actually can saturate the QPI bandwidth with this operation. I think cache
prefetching was doing its job.

The purpose of proposing exchange_pages() is to avoid allocating any new page, so that we would not trigger any potential page reclaim or memory compaction.
Allocating a temporary page defeats the purpose.


+               xa_lock_irq(&to_mapping->i_pages);
+
+               to_pslot = radix_tree_lookup_slot(&to_mapping->i_pages,
+                       page_index(to_page));

This needs to be converted to the XArray.  radix_tree_lookup_slot() is
going away soon.  You probably need:

        XA_STATE(to_xas, &to_mapping->i_pages, page_index(to_page));

Thank you for pointing this out. I will do the change.


This is a lot of code and I'm still trying to get my head aroud it all.
Thanks for putting in this work; it's good to see this approach being
explored.

Thank you for taking a look at the code.

--
Best Regards,
Yan Zi

Reply via email to