Hi Martin and Marc,

On 12/14/2016 10:58 AM, Martin Steigerwald wrote:
I didn´t post to LKML about the bug I reported in kernel upstream bugtracker
yet.

I ponder whether to report the second issue, the one about netlink, that
Gerlof pointed out to the Debian bugtracker, since for Gerlof it does only
happen with the Debian kernel and not with the CentOS kernel he tried and I
also didn´t see this issue with my self-compiled kernel.


Yesterday I did some further investigation about the second issue in kernel 4.8.4:

NETLINK command TASKSTATS_CMD_GET with nla_type TASKSTATS_CMD_ATTR_REGISTER_CPUMASK results in error EINVAL (-22).

Conclusion: it is a kernel bug.
Find my analysis below (I hope it is not too detailed :-).

Best regards,
Gerlof


Since there are many reasons for the error EINVAL in this piece kernel code involved, I added many printk's to find out where the error is generated exactly.

My analysis:

1) The NETLINK command mentioned above is meant to specify the range of CPUs for which taskstats has to be activated. In the system call, this is specified via a string which is "0-3" for all four CPUs in my system. In the kernel function 'cmd_attr_register_cpumask' (source file kernel/taskstats.c), this string is converted into a bitmap (via a call to function 'parse'). In this case, the bitmap contains 0x000000000000000f. This bitmap is one unsigned long because it provides enough bits for four CPUs. However, if needed a bitmap might consist of more unsigned longs.

2) After creating the bitmap,the function 'cmd_attr_register_cpumask' calls function 'add_del_listener' to register the bitmap.
The first call issued by this function is:

        if (!cpumask_subset(mask, cpu_possible_mask))
                return -EINVAL;

and this is the call that causes error EINVAL.


3) The function 'cpumask_subset' (inline function in source file include/linux/cpumask.h) contains:

static inline int cpumask_subset(const struct cpumask *src1p,
                                 const struct cpumask *src2p)
{
        return bitmap_subset(cpumask_bits(src1p), cpumask_bits(src2p),
nr_cpumask_bits);
}

4) The function 'bitmap_subset' (inline function in source file include/linux/bitmap.h) verifies if the bits that are set in the first bitmap are a subset of the bits set in the second bitmap. That second bitmap is in this case the variable 'cpu_possible_mask'. This is a bitmap that contains a bit *for every possible CPU* that might be present. The number of bits in this bitmap is defined by the configuration parameter CONFIG_NR_CPUS=512 (see .config) which contains 512 on my system (default).

So for this specific call to 'bitmap_subset', the first pointer refers to the bitmask of one unsigned long describing the four CPUs that I specified in my NETLINK call while the second pointer refers to 'cpu_possible_mask' consisting of 8 unsigned longs for 512 CPUs. The third argument specifies the number of bits to be checked which is 'nr_cpumask_bits' that contains 512! IMHO, this is not correct. It means that the bits in the two bitmaps are compared over a length of 8 unsigned longs, while the first bitmap only contains one unsigned long.

5) I verified the conclusion of point 4) by the printk's in front of the call mentioned for point 2):


static int add_del_listener(pid_t pid, const struct cpumask *mask, int isadd)
{
        struct listener_list *listeners;
        struct listener *s, *tmp, *s2;
        unsigned int cpu, i;
        int ret = 0;

        printk(KERN_INFO "GELA: NR_CPUS=%d\n", NR_CPUS);
        printk(KERN_INFO "GELA: nr_cpumask_bits=%d", nr_cpumask_bits);
        printk(KERN_INFO "GELA: nr_cpu_ids=%d", nr_cpu_ids);

        for (i=0; i < BITS_TO_LONGS(nr_cpumask_bits); i++) {
                printk(KERN_INFO "GELA: %016lx %016lx",
                        *(cpumask_bits(mask)+i),
                        *(cpumask_bits(cpu_possible_mask)+i) );
        }

        if (!cpumask_subset(mask, cpu_possible_mask))
                return -EINVAL;



When I run my test program, in dmesg I find the following messages:

[   36.016248] GELA: NR_CPUS=512
[   36.016249] GELA: nr_cpumask_bits=512
[   36.016250] GELA: nr_cpu_ids=4
[   36.016252] GELA: 000000000000000f 000000000000000f
[ 36.016253] GELA: ffff88005a3cae14 0000000000000000 <-- from here bits in the first bitmap are not part of that bitmap!
[   36.016254] GELA: ffff88005fdd7d40 0000000000000000
[   36.016255] GELA: ffffffff81612630 0000000000000000
[   36.016256] GELA: 000000000000000c 0000000000000000
[   36.016257] GELA: ffffffff8131c193 0000000000000000
[   36.016258] GELA: ffffffff816125c0 0000000000000000
[   36.016259] GELA: ffffffff818e57c0 0000000000000000

This output also shows the variable 'nr_cpu_ids' that defines how many CPUs are currently present in the system. When I use this variable as third parameter in the call to 'bitmap_subset' (see code in point 3), the problem does not occur any more because only 4 bits will be check in both bitmaps.

6) With the original code, I modified the configuration parameter CONFIG_NR_CPUS to 8 instead of 512 to see if I can bypass the problem. This means that the bitmap 'cpu_possible_mask' only consists of one unsigned long now. No problem occurs for the application now and taskstats works fine.

7) I do not know exactly why the problem did not occur with older kernel versions. I suspect configuration parameter CONFIG_CPUMASK_OFFSTACK which really influences the code related to the cpumask handling. In all kernels that had NO problems, this parameter was set to 'y'. In the kernel WITH problems, this configuration parameter is not present at all (so effectively 'n').

Reply via email to