On Fri, Dec 25, 2015 at 5:56 PM, Hal Rosenstock <h...@dev.mellanox.co.il> wrote: > On 12/25/2015 9:50 AM, Hal Rosenstock wrote: >> On 12/24/2015 11:09 AM, Matan Barak wrote: >>> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <mat...@dev.mellanox.co.il> >>> wrote: >>>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerl...@mellanox.com> wrote: >>>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote: >>>>>> >>>>>> >>>>>>>> This patch seems to generate a list corruption [1] when I test >>>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this? >>>>>>> >>>>>>> >>>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1], >>>>>> >>>>>> >>>>>> Then something else broke it. Can people check their patches on doug's >>>>>> tree? At the moment it's unusable... >>>>> >>>> >>>> Leon and I have checked Doug's tree with mlx4_ib disabled and we >>>> didn't encounter any error. >>>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly. >>>> >>>>> >>>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches >>>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping, >>>>> ibv_rc_pingpong over top of mlx4 VPI) >>>>> >>> >>> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it >>> seems like the bug is introduced in the 64bit counters test. Here's a >>> proposal: >>> >>> diff --git a/drivers/infiniband/core/sysfs.c >>> b/drivers/infiniband/core/sysfs.c >>> index 539040f..8da3c83 100644 >>> --- a/drivers/infiniband/core/sysfs.c >>> +++ b/drivers/infiniband/core/sysfs.c >>> @@ -714,11 +714,12 @@ err: >>> * Figure out which counter table to use depending on >>> * the device capabilities. >>> */ >>> -static struct attribute_group *get_counter_table(struct ib_device *dev) >>> +static struct attribute_group *get_counter_table(struct ib_device *dev, >>> + int port_num) >>> { >>> struct ib_class_port_info cpi; >>> >>> - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, >>> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO, >>> &cpi, 40, sizeof(cpi)) >= 0) { >> >> Your proposal is similar to earlier version of Christoph's patch but was >> changed since ClassPortInfo attribute does not have PortSelect field >> like other PerfMgt attributes which is where this port num would be >> placed. In ClassPortInfo attribute, that location would be the >> ClassVersion field that would be set to port number in PerfMgt Get query. > > In actuality, I don't think it really matters as this is a Get not a Set > and the PMA would do the right thing even if some field in the CPI were > stepped on. >
Well, it does matter as it calls the vendor driver with port_num = 0. Since the kernel is trusted, the vendor driver expects a valid port number. Giving it an invalid number might result in memory corruptions, as demonstrated in this case. >> -- Hal Matan >> >>> >>> if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) >>> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int >>> port_num, >>> goto err_put; >>> } >>> >>> - p->pma_table = get_counter_table(device); >>> + p->pma_table = get_counter_table(device, port_num); >>> ret = sysfs_create_group(&p->kobj, p->pma_table); >>> if (ret) >>> goto err_put_gid_attrs; >>> >>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html