Hi Will, On 2017/10/17 17:23, Will Deacon wrote: > On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote: >> I'm not sure if this is the problem on arm64 numa. What do you think ? >> By the way, this testcase can be successful in any case on x86. > > To be honest, this isn't a particularly helpful bug report. I appreciate > that a test is reporting failure, but it doesn't look like you've spent > very much effort to understand what the test is trying to do and why it > thinks it's failed to do it. All I can sensibly do with your bug report > is run the test myself, and it passes on the systems I have available. > > So, you need to: > > 1. Understand what the test is doing. > 2. Figure out which bit isn't doing what it's supposed to > 3. See if that part can be isolated to trigger the problem > > At that point, it should be possible to describe the unexpected behaviour > at a level which we can actually investigate if necessary. This test case is to test whether we should migrate successfully if user call SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to migrate to node 4. And this should return -EINVAL.
however, the kernel will migrate the memory to node 0 and return ok(e.g. 0). The root cause is for nodes_subset(*new, node_states[N_MEMORY]) will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4. And this is common issue, and I also can reproduce at certain config on X86-64 e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system. IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following patch may fix this problem: From: Yisheng Xie <xieyishe...@huawei.com> Date: Tue, 17 Oct 2017 20:53:55 +0800 Subject: [PATCH] bitmap: fix corner case of bitmap_subset As Xiaojun reported the ltp of migrate_pages01 will failed in system whoes has 4 node with CONFIG_NODES_SHIFT=2: migrate_pages01 0 TINFO : test_invalid_nodes migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly and the root cause is nodes_subset(*new, node_states[N_MEMORY]) will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4. Fix it by correct the corner case of bitmap_subset, which makes 0x0 or 0x10, 0xFF..F0 not a subset of bitmap when bitmap lenth is 4. Reported-by: Tan Xiaojun <tanxiao...@huawei.com> Signed-off-by: Yisheng Xie <xieyishe...@huawei.com> --- include/linux/bitmap.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 700cf5f..bc66978 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1, static inline int bitmap_subset(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { + if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits))) + return false; if (small_const_nbits(nbits)) return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits)); else -- 1.7.12.4 Thanks Yisheng Xie > > Will > >> On 2017/10/16 19:42, Tan Xiaojun wrote: >>> Hi all, >>> >>> I test ltp in Hisilicon D05 board and get a failed result about the >>> testcase "migrate_pages01". >>> >>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to >>> find a invalid numa node and migrate memory pages to this node via syscall >>> of "migrate_pages". >>> The expected result of this case is returning "-1", but it actually return >>> "0". >>> >>> -------------------------------------------------------- >>> # ./migrate_pages01 >>> migrate_pages01 0 TINFO : test_empty_mask >>> migrate_pages01 1 TPASS : expected ret success: returned value = 0 >>> migrate_pages01 0 TINFO : test_invalid_pid -1 >>> migrate_pages01 2 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No >>> such process >>> migrate_pages01 0 TINFO : test_invalid_pid unused pid >>> migrate_pages01 4 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No >>> such process >>> migrate_pages01 0 TINFO : test_invalid_masksize >>> migrate_pages01 6 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): >>> Invalid argument >>> migrate_pages01 0 TINFO : test_invalid_mem -1 >>> migrate_pages01 8 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): >>> Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem invalid prot >>> migrate_pages01 10 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): >>> Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem unmmaped >>> migrate_pages01 12 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): >>> Bad address >>> migrate_pages01 0 TINFO : test_invalid_nodes >>> migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected >>> failure - returned value = 0, expected: -1 >>> migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded >>> unexpectedly >>> migrate_pages01 0 TINFO : test_invalid_perm >>> migrate_pages01 16 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): >>> Operation not permitted >>> -------------------------------------------------------- >>> >>> I debug and find a interesting thing, this case does not always fail. >>> >>> 1) If one or several numa nodes have no memory, this case will run >>> successfully like below: >>> >>> -------------------- >>> available: 4 nodes (0-3) >>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >>> node 0 size: 65309 MB >>> node 0 free: 61650 MB >>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 >>> node 1 size: 65404 MB >>> node 1 free: 61377 MB >>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 >>> node 2 size: 65401 MB >>> node 2 free: 62316 MB >>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 >>> node 3 size: 0 MB >>> node 3 free: 0 MB >>> node distances: >>> node 0 1 2 3 >>> 0: 10 15 20 20 >>> 1: 15 10 20 20 >>> 2: 20 20 10 15 >>> 3: 20 20 15 10 >>> --------------------- >>> >>> This testcase will find node number 3 and migrate pages to node 3. And >>> syscall of "migrate_pages" return -1, test succeeded. >>> >>> 2) In most cases, all nodes have memory, and the testcase will get >>> non-existent node like node number 4. The syscall of "migrate_pages" should >>> also return -1, but return 0 actually. >>> So the testcase failed. >>> >>> I think it is a problem in arm64. But I am not familiar with numa, so I ask >>> for help from you. >>> >>> Thanks. >>> Xiaojun. >>> >>> >>> . >>> >> >> > > . >