Re: [PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay
> /* > * Proper multiline comments look like this not like > * the above. > */ Got it, will fix next time around. > That aside. Why are you trying to do heuristics on the delta? > > We have way better information than that. The watchdog timer expiry time is > known and we can determine the exact delay of the timer. > > The watchdog clocksource provides the maximum 'idle' time, i.e. the time > between two reads, in clocksource::max_idle_ns. That value is filled in > when the clocksource is configured. > > So without doing speculation we can make an informed decision: > > elapsed = jiffies_to_nsec(jiffies - watchdog_timer->expires) + > WATCHDOG_INTERVAL_NS; > > if (elapsed > wdcs->max_idle_ns) { > Skip .. > } Yes, that makes more sense than what I was doing, although I'm not sure on the details. Just missed that idea. Why are you adding the watchdog interval to the calculated elapsed time? It seems we have an issue exactly if jiffies - watchdog_timer->expires is too big, without adding the interval we tried to wait in on top. Also I think we might want to be careful that jiffies is >= the expires time - or is it not possible that a timer fires one jiffy early? Also for full generality it seems we should check against the clocksource max_idle_ns as well - for x86 TSC is wider than HPET but there may be other architectures that could hit the same problem, just with the clocksource being checked wrapping around instead of the watchdog clocksource. Right? Thanks! Roland
[tip:x86/timers] x86/hpet: Remove unused FSEC_PER_NSEC define
Commit-ID: d999c0ec2498e54b9328db6b2c1037710025add1 Gitweb: https://git.kernel.org/tip/d999c0ec2498e54b9328db6b2c1037710025add1 Author: Roland Dreier AuthorDate: Fri, 30 Nov 2018 13:14:50 -0800 Committer: Borislav Petkov CommitDate: Tue, 4 Dec 2018 12:17:21 +0100 x86/hpet: Remove unused FSEC_PER_NSEC define The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code"). Remove it. Signed-off-by: Roland Dreier Signed-off-by: Borislav Petkov Acked-by: Thomas Gleixner Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: x86-ml Link: https://lkml.kernel.org/r/20181130211450.5200-1-rol...@purestorage.com --- arch/x86/kernel/hpet.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index b0acb22e5a46..dfd3aca82c61 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -21,10 +21,6 @@ #define HPET_MASK CLOCKSOURCE_MASK(32) -/* FSEC = 10^-15 - NSEC = 10^-9 */ -#define FSEC_PER_NSEC 100L - #define HPET_DEV_USED_BIT 2 #define HPET_DEV_USED (1 << HPET_DEV_USED_BIT) #define HPET_DEV_VALID 0x8
[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code"). Signed-off-by: Roland Dreier --- arch/x86/kernel/hpet.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index b0acb22e5a46..dfd3aca82c61 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -21,10 +21,6 @@ #define HPET_MASK CLOCKSOURCE_MASK(32) -/* FSEC = 10^-15 - NSEC = 10^-9 */ -#define FSEC_PER_NSEC 100L - #define HPET_DEV_USED_BIT 2 #define HPET_DEV_USED (1 << HPET_DEV_USED_BIT) #define HPET_DEV_VALID 0x8 -- 2.19.1
[PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay
On a modern x86 system, the TSC is used as a clocksource, with HPET used in the clocksource watchdog to make sure that the TSC is stable. If the clocksource watchdog_timer is delayed for an extremely long time (for example if softirqs are being serviced in ksoftirqd, and realtime threads are starving ksoftirqd), then the 32-bit HPET counter may wrap around. For example, with an HPET running at 24 MHz, 2^32 cycles is about 179 seconds - a long time for timers to be starved, but possible with a poorly behaved realtime thread. If this happens, since the TSC is a 64-bit counter and won't wrap, the watchdog will detect skew - the TSC interval will be 179 seconds longer than the HPET interval - and will mark the TSC as unstable. This causes the system to switch to the HPET as a clocksource, which has a huge negative performance impact. In this case, switching to the HPET just makes a bad situation (timers starved) that the system might recover from turn permanently even worse (more expensive clock_gettime() calls), due to a spurious false positive detection of TSC instability. To improve this, add some heuristics to detect cases where the watchdog is delayed long enough for the instability detection to be likely to be wrong: - If the clocksource being tested (eg TSC) has counted so many cycles that converting to nsecs will overflow multiplication, *AND* the watchdog clocksource (eg HPET) shows that the watchdog timer has missed its interval by at least a factor of 3, skip marking the clocksource as unstable for a timer interation. This is not perfect - for example it is possible for the watchdog clocksource to wrap around and show a small interval - but at least in the specific x86 it is unlikely, since the watchdog interval is a small fraction of the wraparound interval. - If there is a skew between the clocksource being tested and the watchdog clocksource that is at least as big as the wraparound interval for the watchdog clocksource, then don't mark the clocksource as unstable. Again, this might fail to mark a clocksource as unstable for one iteration, but it is unlikely that the instability is bad enough that we will see a larger skew than the wraparound interval for many iterations. These heuristics are imperfect but are chosen to make false detection of instability much less likely, while leaving detection of true instability very likely within a few clocksource watchdog iterations. Signed-off-by: Roland Dreier --- kernel/time/clocksource.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index ffe081623aec..f1b3d8ff2437 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -243,12 +243,47 @@ static void clocksource_watchdog(struct timer_list *unused) watchdog->shift); delta = clocksource_delta(csnow, cs->cs_last, cs->mask); + + /* If the cycle delta is beyond what we can safely +* convert to nsecs, and the watchdog clocksource +* suggests that we've overslept, skip checking this +* iteration to avoid marking a clocksource as +* unstable because of a severely delayed timer. */ + if (delta > cs->max_cycles && + wd_nsec > 3 * jiffies_to_nsecs(WATCHDOG_INTERVAL)) { + pr_warn("timekeeping watchdog: Clocksource '%s' not checked due to apparent long timer delay:\n", + cs->name); + pr_warn(" Delta %llx > max_cycles %llx, wd_nsec %lld\n", + delta, cs->max_cycles, wd_nsec); + continue; + } + cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift); wdlast = cs->wd_last; /* save these in case we print them */ cslast = cs->cs_last; cs->cs_last = csnow; cs->wd_last = wdnow; + /* If the clocksource interval is far off from the +* watchdog clocksource interval but the interval is +* big enough that the watchdog may have wrapped +* around (again due to a severely delayed timer), +* skip this iteration. For example, this saves us +* from marking the TSC as unstable just because the +* 32-bit HPET wrapped around on x86. */ + if (abs(cs_nsec - wd_nsec) > + clocksource_cyc2ns(watchdog->max_cycles, watchdog->mult, + watchdog->shift) - WATCHDOG_THRESHOLD) { + pr_warn("timekeeping watchdog: Clocksource '%s
[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code"). Signed-off-by: Roland Dreier --- arch/x86/kernel/hpet.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index b0acb22e5a46..dfd3aca82c61 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -21,10 +21,6 @@ #define HPET_MASK CLOCKSOURCE_MASK(32) -/* FSEC = 10^-15 - NSEC = 10^-9 */ -#define FSEC_PER_NSEC 100L - #define HPET_DEV_USED_BIT 2 #define HPET_DEV_USED (1 << HPET_DEV_USED_BIT) #define HPET_DEV_VALID 0x8 -- 2.19.1
Re: [PATCH 0/3] Provide more fine grained control over multipathing
> The sensible thing to do in nvme is to use different paths for > different queues. That is e.g. in the RDMA case use the HCA closer > to a given CPU by default. We might allow to override this for > cases where the is a good reason, but what I really don't want is > configurability for configurabilities sake. That makes sense but I'm not sure it covers everything. Probably the most common way to do NVMe/RDMA will be with a single HCA that has multiple ports, so there's no sensible CPU locality. On the other hand we want to keep both ports to the fabric busy. Setting different paths for different queues makes sense, but there may be single-threaded applications that want a different policy. I'm not saying anything very profound, but we have to find the right balance between too many and too few knobs. - R.
Re: [PATCH 0/3] Provide more fine grained control over multipathing
> Moreover, I also wanted to point out that fabrics array vendors are > building products that rely on standard nvme multipathing (and probably > multipathing over dispersed namespaces as well), and keeping a knob that > will keep nvme users with dm-multipath will probably not help them > educate their customers as well... So there is another angle to this. As a vendor who is building an NVMe-oF storage array, I can say that clarity around how Linux wants to handle NVMe multipath would definitely be appreciated. It would be great if we could all converge around the upstream native driver but right now it doesn't look adequate - having only a single active path is not the best way to use a multi-controller storage system. Unfortunately it looks like we're headed to a world where people have to write separate "best practices" documents to cover RHEL, SLES and other vendors. We plan to implement all the fancy NVMe standards like ANA, but it seems that there is still a requirement to let the host side choose policies about how to use paths (round-robin vs least queue depth for example). Even in the modern SCSI world with VPD pages and ALUA, there are still knobs that are needed. Maybe NVMe will be different and we can find defaults that work in all cases but I have to admit I'm skeptical... - R.
Re: KASAN: use-after-free Read in __list_add_valid (5)
> Still reproducible on Linus' tree (commit 66e1c94db3cd4e) and on linux-next > (next-20180511). Here's a simplified reproducer: Thanks! That's a fantastic test case. The issue is a race where rdma_listen() sees invalid state in the middle of an rdma_bind_addr() call that will ultimately fail. I'll send a proposed patch shortly. - R.
Re: [Patch v2 00/19] CIFS: Implement SMBDirect
> Starting with SMB2 dialect 3.0, Microsoft introduced SMBDirect transport > protocol for transferring upper layer (SMB2) payload over RDMA via > Infiniband, RoCE or iWARP. The prococol is published in [MS-SMBD] > (https://msdn.microsoft.com/en-us/library/hh536346.aspx). This is great to see. Is there a Linux implementation of the server side (in Samba?) so that the client can be tested without needing a Windows server? - R.
Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation
On Fri, Jul 8, 2016 at 9:51 AM, Jason Gunthorpe wrote: > So, it appears, the dst and neigh can be used for all performances cases. > > For the non performance dst == null case, can we just burn cycles and > stuff the daddr in front of the packet at hardheader time, even if we > have to copy? OK, sounds interesting. Unfortunately the scope of this work has gotten to the point where I can't take it on right now. My system is running 4.4.y for now (before struct skb_gso_cb grew) so I think shrinking struct skb_gso_cb to 8 bytes plus changing SKB_SGO_CB_OFFSET to 20 will work for now. Hope someone is able to come up with a real fix before I need to upgrade to 4.10.y... - R.
Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation
On Thu, Jul 7, 2016 at 4:14 PM, Jason Gunthorpe wrote: > We have neighbour_priv, and ndo_neigh_construct/destruct now .. > > A first blush that would seem to be enough to let ipoib store the AH > and other path information in the neigh and avoid the cb? At least the > example in clip sure looks like what ipoib needs to do. Do you think those new facilities let us go back to using the neigh and still avoid the issues that led to commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")? - R.
Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation
>> struct skb_gso_cb { >> int mac_offset; >> int encap_level; >> __u16 csum_start; >> }; > This is based on an out-dated version of this struct. The 4.7 RC > kernel has a few more fields that were added to support local checksum > offload for encapsulated frames. Thanks for pointing that out. I hit the perf regression on 4.4.y (stable) and looked at the struct there. I see that latest upstream has changed, and I agree that this struct really can't shrink below 10 bytes. Since IP needs 20 bytes, GSO needs 10 bytes and IPoIB needs 20 bytes, we're 2 bytes over the 48 that are available in cb[]. So this is harder to fix than just changing skb_gso_cb and SKB_SGO_CB_OFFSET unfortunately. >> What is the best way to keep the crash fix but not kill IPoIB performance? > > It seems like what would probably need to happen is to move where the > IPoIB address is stored. I'm not sure the control buffer is really > the best place for it since the cb gets overwritten at various levels, > and storing 20 bytes makes it hard to avoid bumping up against the > size restrictions of the buffer. Seeing as how the IPoIB hwaddr is > generated around the same time we generate the L2 header for the > frame, I wonder if you couldn't get away with using a bit of extra skb > headroom to store it and then use a offset from the MAC header to > access it. An added bonus would be that with a few tricks with > SKB_GSO_CB(skb)->mac_offset you might even be able to set things up so > that you copy the hwaddr when you copy the header for each fragment > instead of having to go and copy the hwaddr out of the cb and clone it > for each frame. Can we assume there are 20 bytes of skb headroom available? What if we're forwarding an skb received on an Ethernet device? The reason we moved to the cb storage is that in the past, trying to hide some data in the actual skb buffer that we don't actually send led to some awkward-at-best code. (As I recall GRO was difficult to handle before commit 936d7de3d736 "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses") But maybe there's a third way to handle this other than the old way and the skb->cb way. - R.
Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation
On Thu, Jan 7, 2016 at 3:00 AM, Konstantin Khlebnikov wrote: > Or just shift GSO CB and add couple checks like > BUILD_BUG_ON(sizeof(SKB_GSO_CB(skb)->room) < sizeof(*IPCB(skb))); Resurrecting this old thread, because the patch that ultimately went upstream (commit 9207f9d45b0a / net: preserve IP control block during GSO segmentation) causes a huge IPoIB performance regression (to the point of being unusable): https://bugzilla.kernel.org/show_bug.cgi?id=111921 I don't think anyone has explained what goes wrong or why IPoIB works the way it does. The underlying difference that IPoIB has from other drivers is that there are two levels of address resolution. First, normal ARP / ND resolves an IP address to a "hardware" address. The difference is that in IPoIB, the hardware address is an IB GID (plus a QPN, but we can ignore that). To actually send data to that GID, the IPoIB driver has to do a second lookup - it needs to ask the IB subnet manager for a path record that tells it how to reach that GID. In particular this means that "destination address" (as the IP / ARP layer understands it) actually isn't in the packet anywhere - there's nothing like an ethernet header as there is for "normal" network drivers. Instead, the driver stashes the address in skb->cb during hard_header_ops->create() and then looks at it in the xmit routine - this was designed way back around when commit a0417fa3a18a / net: Make qdisc_skb_cb upper size bound explicit. was merged. The expectation was that the part of the cb after sizeof (struct qdisc_skb_cb) would be preserved. The problem with commit 9207f9d45b0a is that GSO operations now access cb after SKB_SGO_CB_OFFSET==32, which lands right in the middle of where IPoIB stashes its hwaddr. It seems that the intent of the commit is to preserve the IP control block - struct inet_skb_parm (and presumably struct inet6_skb_parm) - even when using SKB_GSO_CB(). Seems like both inet_skb_parm and inet6_skb_parm are 20 bytes. IPoIB uses the part of cb after 28 bytes, so if we could squeeze struct skb_gso_cb down to 8 bytes and set SKB_SGO_CB_OFFSET to 20, then everything would work. The struct is struct skb_gso_cb { int mac_offset; int encap_level; __u16 csum_start; }; is it feasible to make encap_level a __u16 (which would make the overall struct exactly 8 bytes)? If I understand this correctly, 64K nested encapsulations seems like quite a bit for a packet... Or, earlier in this thread, having the GSO in ip_output and other gso paths save and restore the IP/IP6 control block was suggested as an alternate approach. I don't know if there are performance implications to that. What is the best way to keep the crash fix but not kill IPoIB performance? Thanks! - R.
[PATCH] iommu/vt-d: Don't reject NTB devices due to scope mismatch
From: Roland Dreier On a system with an Intel PCIe port configured as an NTB device, iommu initialization fails with DMAR: Device scope type does not match for :80:03.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): [0A0h 0160 1] Device Scope Entry Type : 02 [0A1h 0161 1] Entry Length : 08 [0A2h 0162 2] Reserved : [0A4h 0164 1] Enumeration ID : 00 [0A5h 0165 1] PCI Bus Number : 80 [0A6h 0166 2] PCI Path : 03,00 but the device has a type 0 PCI header: 80:03.0 Bridge [0680]: Intel Corporation Device [8086:2f0d] (rev 02) 00: 86 80 0d 2f 00 00 10 00 02 00 80 06 10 00 80 00 10: 0c 00 c0 00 c0 38 00 00 0c 00 00 00 80 38 00 00 20: 00 00 00 c8 00 00 10 c8 00 00 00 00 86 80 00 00 30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Use the class 0x0680 ("Other bridge device") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: Roland Dreier --- drivers/iommu/dmar.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 6a86b5d1defa..2eff7b6c6c98 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -241,8 +241,20 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, if (!dmar_match_pci_path(info, scope->bus, path, level)) continue; - if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^ - (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL)) { + /* +* We expect devices with endpoint scope to have normal PCI +* headers, and devices with bridge scope to have bridge PCI +* headers. However PCI NTB devices may be listed in the +* DMAR table with bridge scope, even though they have a +* normal PCI header. NTB devices are identified by class +* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch +* for this special case. +*/ + if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT && +info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || + (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && +(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && + info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); return -EINVAL; -- 2.7.4
Re: Regression in IO resource allocation
On Tue, May 31, 2016 at 3:31 PM, Rafael J. Wysocki wrote: > It may not be called at all if _PTC is used on that system, for example. Yes, that's exactly the case on my system. So from my POV: Tested-by: Roland Dreier Thanks!
Re: Regression in IO resource allocation
On Tue, May 31, 2016 at 2:11 PM, Rafael J. Wysocki wrote: > Can you please try the appended patch (untested)? Thanks for the quick reply. Patch looks OK on my system... it boots (which is very good :) and I see system 00:01: [io 0x0400-0x047f] has been reserved however I don't see the "ACPI CPU throttle" region reserved in /proc/ioports... haven't debugged why acpi_processor_get_throttling() isn't getting called or what is happening yet. Will dig a bit deeper and let you know. - R.
Regression in IO resource allocation
Hi, I recently updated one of my systems from 3.10.y to 4.4.11, and discovered a regression that stops it from booting. It's actually very similar to https://bugzilla.kernel.org/show_bug.cgi?id=99831 (which I reported about the same system last year). The problem is that commit ac212b6980d8 ("ACPI / processor: Use common hotplug infrastructure") changes the order that the ACPI processor and PnP initialization run. pnp_system_init() is run at fs_initcall time, while acpi_processor_init() is run from acpi_scan_init(), earlier at subsys_initcall time. Pre-ac212b6980d8, the ACPI processor initialization all ran from acpi_processor_init() at module_init time. So the processor driver initialization has flipped from after to before pnp_system_init(). Just as before, the failure is that the resource allocation code puts some AHCI IO BARs around 0x400, and reservation fails because some other ACPI stuff is also there. The problem is that when acpi_processor_init() runs, it reserves a range 0x410 - 0x415 for "ACPI CPU throttle", and if that happens before pnp_system_init(), then I get system 00:01: [io 0x0400-0x047f] could not be reserved because that overlaps the already-reserved range. Then the PCI resource allocation code is free to put PCI resources into that range and tons of things go south after that. For now I've worked around it by commenting out the request_region() in acpi_processor.c but that doesn't seem like a very good long-term solution. Does it make sense to resurrect the patches you had to let ACPI and PnP coexist in resource reservation? Or could we move the request_region() for CPU throttle into the still-modular initialization done from acpi_processor_driver_init()? Thanks! Roland
Re: Running out of IO space because of innocuous-looking DSDT change
On Mon, Oct 19, 2015 at 10:00 AM, Yinghai Lu wrote: > I would suggest to expand standard_io_resources[] to include all > possible conflict that we should avoid, like the io port for serial and > cf8/cf9. > > Then we could just set PCIBIOS_MIN_IO to 0 for x86. That would work on my system, which is a well-behaved standard server. But I thought the issue was weird vendor-specific stuff (Sony laptops?) where there are undocumented nonstandard IO resources that also aren't reserved in ACPI? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Running out of IO space because of innocuous-looking DSDT change
I recently ran into an interesting issue with IO space allocation, and I'm looking for opinions on whether this is a BIOS issue, a kernel issue, both, or neither ;) What happened is that a BIOS update for my system changed the DSDT from having three ranges in PCI0: WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange, 0x, // Granularity 0x, // Range Minimum 0x03AF, // Range Maximum 0x, // Translation Offset 0x03B0, // Length ,, , TypeStatic) WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange, 0x, // Granularity 0x03E0, // Range Minimum 0x0CF7, // Range Maximum 0x, // Translation Offset 0x0918, // Length ,, , TypeStatic) WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange, 0x, // Granularity 0x03B0, // Range Minimum 0x03DF, // Range Maximum 0x, // Translation Offset 0x0030, // Length ,, , TypeStatic) to a single range: WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange, 0x, // Granularity 0x, // Range Minimum 0x0CF7, // Range Maximum 0x, // Translation Offset 0x0CF8, // Length ,, , TypeStatic) Naively it seems like this shouldn't make a difference, since in the end we've covered the space 0...0xCF7. However because of the code min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM; /* First, try exact prefetching match.. */ ret = pci_bus_alloc_resource(bus, res, size, align, min, IORESOURCE_PREFETCH, pcibios_align_resource, dev); in pci_bus_alloc_resource(), the single range ultimately means we end up running out of IO space for our devices (we have various devices asking for IO space as well as quite a few downstream PCI switch ports that get allocated IO space). What happens is that PCIBIOS_MIN_IO is 0x1000, so that code means with the new BIOS we can't allocate any IO in the range 0...0xCF7; with the old BIOS we only ruled out the range 0...0x3AF and happily put small IO resources (for SMBus controller devices etc) at places like 0x480 etc. Looking at the code and history, I see that the code with PCIBIOS_MIN_IO is there to deal with systems where not all resources are declared and the kernel might accidentally allocate something that clashes with strange hardware. However in my case I'm pretty confident there isn't anything in the range we used to use (since my system didn't blow up, and I know there isn't any weird proprietary stuff anyway). Would it make sense to change the kernel to reduce PCIBIOS_MIN_IO in my case? I could make it generic and send it upstream, or just hack it locally. Or (given my ignorance of ACPI in the real world) is this a broken BIOS change that I should ask my BIOS vendor to revert? Or... ? Thanks! Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] target/iscsi: fix digest computation for chained SGs
On Tue, Jul 21, 2015 at 1:57 AM, Sagi Grimberg wrote: > How were you able to get a chained SG list in the target code? Local hack. So this bug can't be hit in current mainline code, but patch improves the code and removes a hidden booby-trap, so I think it makes sense to apply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 3.10.80 vs. 3.10.79
On Sat, Jun 13, 2015 at 9:56 AM, Roland Dreier wrote: > Below is a more sophisticated, so to speak, version of it with a changelog and > all. It works for me, but more testing would be much appreciated. Yes, the patch works as expected: Tested-by: Roland Dreier It does change /proc/ioports heirarchy to 0400-0403 : ACPI PM1a_EVT_BLK 0404-0405 : ACPI PM1a_CNT_BLK 0406-0407 : pnp 00:06 0408-040b : ACPI PM_TMR 040c-041f : pnp 00:06 0410-0415 : ACPI CPU throttle 0420-042f : ACPI GPE0_BLK 0430-044f : pnp 00:06 0430-0433 : iTCO_wdt 0430-0433 : iTCO_wdt 0450-0450 : ACPI PM2_CNT_BLK 0451-047f : pnp 00:06 0460-047f : iTCO_wdt 0460-047f : iTCO_wdt where the old kernel had 0400-047f : pnp 00:06 0400-0403 : ACPI PM1a_EVT_BLK 0404-0405 : ACPI PM1a_CNT_BLK 0408-040b : ACPI PM_TMR 0410-0415 : ACPI CPU throttle 0420-042f : ACPI GPE0_BLK 0430-0433 : iTCO_wdt 0430-0433 : iTCO_wdt 0450-0450 : ACPI PM2_CNT_BLK 0460-047f : iTCO_wdt 0460-047f : iTCO_wdt but I don't think that matters. Thanks, - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 3.10.80 vs. 3.10.79
On Fri, Jun 12, 2015 at 7:52 PM, Rafael J. Wysocki wrote: > Below is a more sophisticated, so to speak, version of it with a changelog and > all. It works for me, but more testing would be much appreciated. Great, I'm convinced by your reasoning that this makes sense. I'm building 3.10.80 patched with this (needed a tiny bit of context adjustment because acpi_dev_filter_resource_type() hadn't been added to 3.10 yet), and will confirm that it fixes the issue I saw. Thanks! Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 3.10.80 vs. 3.10.79
On Thu, Jun 11, 2015 at 1:50 PM, Rafael J. Wysocki wrote: > Changing the ordering between those two routines would work around that > problem, > but in my view that wouldn't be a proper fix. In fact, the role of > reserve_range() > is to reserve the resources so as to prevent them from being used going > forward, > so they need not be reserved each in one piece. Instead, we can just check > if they > overlap with the ones reserved by acpi_reserve_resources() and only request > the > non-overlapping parts of them to avoid conflicts. > > So I wonder if the patch below makes any difference? I will give this a try and make sure it fixes my system, although I'm pretty sure it will. However I'm not sure I agree that this is a better fix than just having pnp reserve ranges before acpi. It already creates a special relationship between pnp and acpi, and acpi_reserve_region is a bunch of extra code. Could we really have a system where the hierarchy of acpi being a subset of a pnp bus doesn't work? I looked at a few other systems I have, and things like the following seem quite common: supermicro: 03e0-0cf7 : PCI Bus :00 03f8-03ff : serial 0400-0453 : pnp 00:0c 0400-0403 : ACPI PM1a_EVT_BLK 0404-0405 : ACPI PM1a_CNT_BLK 0408-040b : ACPI PM_TMR 0410-0415 : ACPI CPU throttle 0420-042f : ACPI GPE0_BLK 0430-0433 : iTCO_wdt 0450-0450 : ACPI PM2_CNT_BLK dell: 03e0-0cf7 : PCI Bus :00 03f8-03ff : serial 0800-087f : pnp 00:06 0800-0803 : ACPI PM1a_EVT_BLK 0804-0805 : ACPI PM1a_CNT_BLK 0808-080b : ACPI PM_TMR 0810-0815 : ACPI CPU throttle 0820-082f : ACPI GPE0_BLK 0830-0833 : iTCO_wdt 0830-0833 : iTCO_wdt 0850-0850 : ACPI PM2_CNT_BLK 0860-087f : iTCO_wdt 0860-087f : iTCO_wdt but I wasn't able to find anything that required more generality... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 3.10.80 vs. 3.10.79
On Wed, Jun 10, 2015 at 4:23 PM, Rafael J. Wysocki wrote: > Can you please file a bug at bugzilla.kernel.org to track this and attach > the output of acpidump from the affected system in there? Done: https://bugzilla.kernel.org/show_bug.cgi?id=99831 Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in 3.10.80 vs. 3.10.79
On Tue, Jun 9, 2015 at 4:43 PM, Roland Dreier wrote: > I understand that the change here fixed another regression, but I'm > wondering if there's a way to make everyone happy here? I can provide > debugging info from my system as required... Maybe sent my mail too quickly, as I have some thoughts after looking at the code. >From the link order, drivers/acpi init wll be called before drivers/pnp init, right? In my case, the acpi resources ("ACPI PM1a_EVT_BLK") etc are under a pnp bus. But if acpi requests the resources first, then pnp can't request the enclosing range. Is the right fix to make sure the pnp init happens before acpi requests resources? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Regression in 3.10.80 vs. 3.10.79
Hi, I recently updated from 3.10.79 to 3.10.80, and my system wouldn't boot any more. I tracked this down to commit 92c934b10ec3 ("ACPI / init: Fix the ordering of acpi_reserve_resources()"). With that commit reverted, my system is OK again. What happens is that ahci fails to initialize because pcim_iomap_regions_request_all() fails with EBUSY, due to a resource conflict on the first IO region of the ahci device. Since my root device is on ahci, that's the end of that. I'm sure this is due to a BIOS / ACPI table bug on my particular platform, but that's scant comfort when the system won't boot :) I patched 3.10.80 so that ahci continues to initialize after the EBUSY, and relevant parts of the kernel log seem to be: [3.836643,26] system 00:06: [io 0x0400-0x047f] could not be reserved ... [3.844112,26] pci :00:1f.2: BAR 0: assigned [io 0x0410-0x0417] ... [6.020040,00] ahci :00:1f.2: BAR 0: can't reserve [io 0x0410-0x0417] and /proc/ioports shows 0410-0415 : ACPI CPU throttle So if I'm understanding properly, for some reason we discover but fail to reserve the region with the ACPI resources, then PCI decides to assign ahci IO ports into that range, then ACPI loads and reserves 0x0410-0x0415, and then ahci fails to load. If I fully revert the patch, then I see [3.853857,08] system 00:06: [io 0x0400-0x047f] has been reserved ... [3.861806,08] pci :00:1f.2: BAR 0: assigned [io 0x0820-0x0827] We're able to reserve the range, and then PCI assigns ahci into a non-conflicting range. I understand that the change here fixed another regression, but I'm wondering if there's a way to make everyone happy here? I can provide debugging info from my system as required... Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA updates for 4.1: - IPoIB fixes from Doug Ledford and Erez Shitrit - iSER updates from Sagi Grimberg - mlx4 GUID handling changes from Yishai Hadas - other misc fixes Bart Van Assche (1): IB/srp: Use P_Key cache for P_Key lookups Doug Ledford (11): IB/ipoib: factor out ah flushing IB/ipoib: change init sequence ordering IB/ipoib: Consolidate rtnl_lock tasks in workqueue IB/ipoib: Make the carrier_on_task race aware IB/ipoib: Use dedicated workqueues per interface IB/ipoib: No longer use flush as a parameter IB/ipoib: fix MCAST_FLAG_BUSY usage IB/ipoib: deserialize multicast joins IB/ipoib: drop mcast_mutex usage ib_srpt: convert printk's to pr_* functions Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1 Erez Shitrit (6): IB/ipoib: Use one linear skb in RX flow IB/ipoib: Update broadcast record values after each successful join request IB/ipoib: Handle QP in SQE state IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's IB/ipoib: Remove IPOIB_MCAST_RUN bit IB/mlx4: Fix WQE LSO segment calculation Honggang LI (1): mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures Sagi Grimberg (18): IB/iser: Fix unload during ep_poll wrong dereference IB/iser: Handle fastreg/local_inv completion errors IB/iser: Fix wrong calculation of protection buffer length IB/iser: Remove redundant cmd_data_len calculation IB/iser: Remove a redundant struct iser_data_buf IB/iser: Don't pass ib_device to fall_to_bounce_buff routine IB/iser: Move memory reg/dereg routines to iser_memory.c IB/iser: Remove redundant assignments in iser_reg_page_vec IB/iser: Get rid of struct iser_rdma_regd IB/iser: Merge build page-vec into register page-vec IB/iser: Move fastreg descriptor pool get/put to helper functions IB/iser: Move PI context alloc/free to routines IB/iser: Make fastreg pool cache friendly IB/iser: Modify struct iser_mem_reg members IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr IB/iser: Remove code duplication for a single DMA entry IB/iser: Bump version to 1.6 IB/iser: Rewrite bounce buffer code path Sebastian Ott (1): infiniband/mlx4: check for mapping error Selvin Xavier (1): MAINTAINERS: Adding list of maintainers for ocrdma Stephen Hemminger (1): rdma: replace deprecated ifconfig in doc Sébastien Dugué (1): ib_uverbs: Fix pages leak when using XRC SRQs Yann Droneaud (2): IB/core: disallow registering 0-sized memory region IB/core: don't disallow registering region starting at 0x0 Yishai Hadas (9): IB/mlx4: Alias GUID adding persistency support net/mlx4_core: Manage alias GUID per VF net/mlx4_core: Set initial admin GUIDs for VFs IB/mlx4: Manage admin alias GUID upon admin request IB/mlx4: Change init flow to request alias GUIDs for active VFs IB/mlx4: Request alias GUID on demand net/mlx4_core: Raise slave shutdown event upon FLR net/mlx4_core: Return the admin alias GUID upon host view request IB/mlx4: Change alias guids default to be host assigned Documentation/filesystems/nfs/nfs-rdma.txt | 9 +- MAINTAINERS| 9 + drivers/infiniband/core/umem.c | 7 +- drivers/infiniband/core/uverbs_main.c | 22 +- drivers/infiniband/hw/mlx4/alias_GUID.c| 457 +- drivers/infiniband/hw/mlx4/mad.c | 9 + drivers/infiniband/hw/mlx4/main.c | 26 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 14 +- drivers/infiniband/hw/mlx4/qp.c| 7 +- drivers/infiniband/hw/mlx4/sysfs.c | 44 +- drivers/infiniband/ulp/ipoib/ipoib.h | 31 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c| 18 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 195 drivers/infiniband/ulp/ipoib/ipoib_main.c | 73 ++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 520 ++-- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 44 +- drivers/infiniband/ulp/iser/iscsi_iser.h | 66 +-- drivers/infiniband/ulp/iser/iser_initiator.c | 66 ++- drivers/infiniband/ulp/iser/iser_memory.c | 523 - drivers/infiniband/ulp/iser/iser_verbs.c | 220 +++-- drivers/infiniband/ulp/srp/ib_srp.c| 9 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 188 drive
Re: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib
On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe wrote: >> We can give client->add() callback a return value and make >> ib_register_device() return -ENOMEM when it failed, just wondering >> why we don't do this at first, any special reason? > No idea, but having ib_register_device fail and unwind if a client > fails to attach makes sense to me. It seems a bit unfriendly to fail an entire device if one ULP has a problem. Let's say you have a system whose main network connection is IPoIB. Would you want that connection to come up even if, say, the NFS/RDMA server fails to find the memory registration type it likes? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus One 4.0 RDMA change: - Fix for exploitable integer overflow in uverbs interface. Shachar Raindel (1): IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic drivers/infiniband/core/umem.c | 8 1 file changed, 8 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA changes for 3.20 merge window: - Re-enable on-demand paging changes with stable ABI - Fairly large set of ocrdma HW driver fixes - Some qib HW driver fixes - Other miscellaneous changes Andreea-Cristina Bernat (2): IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_qp.c IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_keys.c Ariel Nahum (1): IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called Bart Van Assche (1): MAINTAINERS: Update SRP initiator entry Dan Carpenter (2): IB/mlx5: Fix error code in get_port_caps() RDMA/ocrdma: Fix off by one in ocrdma_query_gid() Devesh Sharma (4): RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac RDMA/ocrdma: set vlan present bit for user AH Eli Cohen (1): IB/core: Add support for extended query device caps Haggai Eran (3): IB/core: Properly handle registration of on-demand paging MRs after dereg IB/core: Add on demand paging caps to ib_uverbs_ex_query_device IB/mlx5: Enable the ODP capability query verb Hariprasad S (2): RDMA/cxgb4: Serialize CQ event upcalls with CQ destruction RDMA/cxgb4: Don't hang threads forever waiting on WR replies Ilya Nelkenbaum (1): IB/core: When marshaling ucma path from user-space, clear unused fields Jack Morgenstein (1): IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian order Majd Dibbiny (3): IB/mlx4: Fix memory leak in __mlx4_ib_modify_qp IB/mlx4: Bug fixes in mlx4_ib_resize_cq IB/mlx5: Update the dev in reg_create Mike Marciniszyn (3): IB/qib: Fix sizeof checkpatch warnings IB/qib: Fix checkpatch warnings IB/qib: Add blank line after declaration Mitesh Ahuja (7): RDMA/ocrdma: Add support for IB stack compliant stats in sysfs. RDMA/ocrdma: Increase the GID table size. RDMA/ocrdma: Move PD resource management to driver. RDMA/ocrdma: Host crash on destroying device resources RDMA/ocrdma: Add support for interrupt moderation RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure RDMA/ocrdma: Update the ocrdma module version string Mitko Haralanov (1): IB/qib: Do not write EEPROM Moshe Lazer (1): IB/core: Fix deadlock on uverbs modify_qp error flow Or Gerlitz (1): IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detach Padmanabh Ratnakar (1): RDMA/ocrdma: Report correct state in ibv_query_qp Rasmus Villemoes (2): RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit RDMA/ocrdma: Use unsigned for bit index Rickard Strandqvist (1): IB/ipath: Remove unused function in ipath_wc_ppc64 Roi Dayan (1): IB/iser: Use correct dma direction when unmapping SGs Roland Dreier (1): Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 'qib' and 'srp' into for-next Sagi Grimberg (1): IB/iser: Fix memory regions possible leak Selvin Xavier (2): RDMA/ocrdma: Debugfs enhancments for ocrdma driver RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP Vinit Agnihotri (1): IB/qib: Add support for the new QMH7360 card MAINTAINERS | 2 +- drivers/infiniband/core/ucma.c| 3 + drivers/infiniband/core/umem_odp.c| 3 +- drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 158 + drivers/infiniband/core/uverbs_main.c | 1 + drivers/infiniband/hw/cxgb4/ev.c | 9 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h| 29 ++- drivers/infiniband/hw/ipath/ipath_kernel.h| 3 - drivers/infiniband/hw/ipath/ipath_wc_ppc64.c | 13 -- drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 15 -- drivers/infiniband/hw/mlx4/cm.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 7 +- drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/infiniband/hw/mlx4/qp.c | 6 +- drivers/infiniband/hw/mlx5/main.c | 4 +- drivers/infiniband/hw/mlx5/mr.c | 1 + drivers/infiniband/hw/ocrdma/ocrdma.h | 38 +++- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 38 +++- drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 6 + drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 312 ++ driver
Re: linux-next: build failure after merge of the infiniband tree
On Tue, Feb 17, 2015 at 6:32 PM, Stephen Rothwell wrote: > After merging the livepatching tree, today's linux-next build (powerpc > allyesconfig) failed like this: > > In file included from drivers/infiniband/hw/qib/qib_cq.c:41:0: > drivers/infiniband/hw/qib/qib.h: In function 'qib_flush_wc': > drivers/infiniband/hw/qib/qib.h:1470:1: error: expected ';' before '}' token > } > ^ > > and it went badly down hill from there :-( Weird, I could have sworn I fixed that before I pushed the tree out. Anyway I'll try adding the missing ';' again and push it out again :( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] IB/mthca: remove deprecated use of pci api
On Wed, Feb 4, 2015 at 6:09 AM, Quentin Lambert wrote: > - dev->eq_table.icm_dma = pci_map_page(dev->pdev, > dev->eq_table.icm_page, 0, > - PAGE_SIZE, > PCI_DMA_BIDIRECTIONAL); > - if (pci_dma_mapping_error(dev->pdev, dev->eq_table.icm_dma)) { > + dev->eq_table.icm_dma = dma_map_page(&dev->pdev->dev, > + dev->eq_table.icm_page, 0, > + PAGE_SIZE, > + (enum > dma_data_direction)PCI_DMA_BIDIRECTIONAL); Surely this can't be right? Shouldn't the direction just change to DMA_BIDIRECTIONAL? Are we really sweeping through the kernel and getting rid of pci_map_ etc. calls? If so please respin your semantic patch so that it doesn't add crazy stuff like (enum dma_data_direction)PCI_DMA_BIDIRECTIONAL and resend the change. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus One more last-second RDMA change for 3.19: - Yann realized that the previous revert of new userspace ABI did not go far enough, and we're still exposing a change that we don't want. Revert even closer to 3.18 interface to make sure we get things right in the long run. Sorry for sending this at the very end of the release cycle, but we didn't realize the scope of the required fix until just now. Yann Droneaud (1): Revert "IB/core: Add support for extended query device caps" drivers/infiniband/core/uverbs.h | 1 - drivers/infiniband/core/uverbs_cmd.c | 137 +++ drivers/infiniband/hw/mlx5/main.c| 2 - include/rdma/ib_verbs.h | 5 +- include/uapi/rdma/ib_user_verbs.h| 27 --- 5 files changed, 42 insertions(+), 130 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Last minute InfiniBand/RDMA changes for 3.19: - Revert IPoIB driver back to 3.18 state. We had a number of fixes go into 3.19, but they introduced regressions. We tried to get everything fixed up but ran out of time, so we'll try again for 3.20. - Similarly, turn off the new "extended query port" verb. Late in the cycle we realized the ABI is not quite right, and rather than freeze something in a rush and make a mistake, we'll take a bit more time and get it right in 3.20. Haggai Eran (1): IB/core: Temporarily disable ex_query_device uverb Roland Dreier (9): Revert "IPoIB: No longer use flush as a parameter" Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue" Revert "IPoIB: Use dedicated workqueues per interface" Revert "IPoIB: change init sequence ordering" Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race" Revert "IPoIB: fix MCAST_FLAG_BUSY usage" Revert "IPoIB: Make the carrier_on_task race aware" Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue" Merge branches 'ipoib' and 'odp' into for-next drivers/infiniband/core/uverbs_main.c | 1 - drivers/infiniband/ulp/ipoib/ipoib.h | 19 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c| 18 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 27 +-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 49 ++--- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 + drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 22 +-- 7 files changed, 134 insertions(+), 241 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.19: - On-demand paging support in core midlayer and mlx5 driver. This lets userspace create non-pinned memory regions and have the adapter HW trigger page faults. - iSER and IPoIB updates and fixes. - Low-level HW driver updates for cxgb4, mlx4 and ocrdma. - Other miscellaneous fixes. Ariel Nahum (2): IB/iser: Collapse cleanup and disconnect handlers IB/iser: Fix possible NULL derefernce ib_conn->device in session_create Devesh Sharma (1): RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs Doug Ledford (8): IPoIB: Consolidate rtnl_lock tasks in workqueue IPoIB: Make the carrier_on_task race aware IPoIB: fix MCAST_FLAG_BUSY usage IPoIB: fix mcast_dev_flush/mcast_restart_task race IPoIB: change init sequence ordering IPoIB: Use dedicated workqueues per interface IPoIB: Make ipoib_mcast_stop_thread flush the workqueue IPoIB: No longer use flush as a parameter Eli Cohen (1): IB/core: Add support for extended query device caps Haggai Eran (14): IB/mlx5: Remove per-MR pas and dma pointers IB/mlx5: Enhance UMR support to allow partial page table update IB/core: Replace ib_umem's offset field with a full address IB/core: Add umem function to read data from user-space IB/mlx5: Add function to read WQE from user-space IB/core: Implement support for MMU notifiers regarding on demand paging regions mlx5_core: Add support for page faults events and low level handling IB/mlx5: Implement the ODP capability query verb IB/mlx5: Changes in memory region creation to support on-demand paging IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation IB/mlx5: Page faults handling infrastructure IB/mlx5: Handle page faults IB/mlx5: Add support for RDMA read/write responder page faults IB/mlx5: Implement on demand paging by adding support for MMU notifiers Hariprasad S (1): RDMA/cxgb4: Handle NET_XMIT return codes Hariprasad Shenai (2): RDMA/cxgb4: Fix locking issue in process_mpa_request RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices Jack Morgenstein (2): IB/core: Fix mgid key handling in SA agent multicast data-base IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr Max Gurtovoy (1): IB/iser: Fix possible SQ overflow Minh Tran (1): IB/iser: Re-adjust CQ and QP send ring sizes to HW limits Mitesh Ahuja (1): RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs Moni Shoua (1): IB/core: Do not resolve VLAN if already resolved Or Gerlitz (1): IB/iser: Bump version to 1.5 Or Kehati (1): IB/addr: Improve address resolution callback scheduling Pramod Kumar (2): RDMA/cxgb4: Increase epd buff size for debug interface RDMA/cxgb4: Configure 0B MRs to match HW implementation Roland Dreier (2): mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next Sagi Grimberg (13): IB/iser: Fix catastrophic error flow hang IB/iser: Decrement CQ's active QPs accounting when QP creation fails IB/iser: Fix sparse warnings IB/iser: Fix race between iser connection teardown and scsi TMFs IB/iser: Terminate connection before cleaning inflight tasks IB/iser: Centralize memory region invalidation to a function IB/iser: Remove redundant is_mr indicator IB/iser: Use more completion queues IB/iser: Micro-optimize iser logging IB/iser: Micro-optimize iser_handle_wc IB/iser: DIX update IB/core: Add flags for on demand paging support IB/srp: Allow newline separator for connection string Shachar Raindel (1): IB/core: Add support for on demand paging regions Steve Wise (1): RDMA/cxgb4: Wake up waiters after flushing the qp Yuval Shaia (1): mlx4_core: Check for DPDP violation only when DPDP is not supported drivers/infiniband/Kconfig | 11 + drivers/infiniband/core/Makefile | 1 + drivers/infiniband/core/addr.c | 4 +- drivers/infiniband/core/multicast.c| 11 +- drivers/infiniband/core/umem.c | 72 ++- drivers/infiniband/core/umem_odp.c | 668 + drivers/infiniband/core/umem_rbtree.c | 94 +++ drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 171 -- drivers/infiniband/core/u
Re: linux-next: build failure after merge of the infiniband tree
On Mon, Dec 15, 2014 at 5:56 PM, Roland Dreier wrote: > I'll add a partial revert of that patch to my tree to get back the > now-used enum values. I rebased my tree on top of the merge-window merge of davem's tree, and added the missing flag on top of the "remove this flag" commit. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the infiniband tree
On Mon, Dec 15, 2014 at 5:47 PM, Stephen Rothwell wrote: > Hi all, > > After merging the infiniband tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_query_device': > drivers/infiniband/hw/mlx5/main.c:248:34: error: > 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function) > if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG) > ^ > drivers/net/ethernet/mellanox/mlx5/core/fw.c: In function > 'mlx5_query_odp_caps': > drivers/net/ethernet/mellanox/mlx5/core/fw.c:79:30: error: > 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function) > if (!(dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)) > ^ > drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_start_eqs': > drivers/net/ethernet/mellanox/mlx5/core/eq.c:459:28: error: > 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function) > if (dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG) > ^ > > Really? Code added half way though the merge window not even build > tested? It's not quite as bad as it seems. The infiniband tree itself builds, the problem is the merged tree. The Mellanox guys merged the "cleanup" commit 0c7aac854f52 Author: Eli Cohen Date: Tue Dec 2 02:26:14 2014 net/mlx5_core: Remove unused dev cap enum fields These enumerations are not used so remove them. Signed-off-by: Eli Cohen Signed-off-by: David S. Miller through davem's tree, and then went ahead and used at least MLX5_DEV_CAP_FLAG_ON_DMND_PG (which that patch removes) in patches they merged through my tree. I'll add a partial revert of that patch to my tree to get back the now-used enum values. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main set of InfiniBand/RDMA updates for 3.18 merge window: - Large set of iSER initiator improvements - Hardware driver fixes for cxgb4, mlx5 and ocrdma - Small fixes to core midlayer Ariel Nahum (3): IB/iser: Unbind at conn_stop stage IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release IB/iser: Change iscsi_conn_stop log level to info Devesh Sharma (3): RDMA/ocrdma: Add default GID at index 0 RDMA/ocrdma: Convert kernel VA to PA for mmap in user IB/core: Clear AH attr variable to prevent garbage data Eli Cohen (5): IB/mlx5: Clear umr resources after ib_unregister_device IB/mlx5: Improve debug prints in mlx5_ib_reg_user_mr IB/core: Avoid leakage from kernel to user space IB/mlx5: Fix possible array overflow IB/mlx5: Remove duplicate code from mlx5_set_path Hariprasad S (3): RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss RDMA/cxgb4: Add missing neigh_release in find_route RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line Jack Morgenstein (1): IB/core: Fix XRC race condition in ib_uverbs_open_qp Jes Sorensen (3): RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc() RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis Li RongQing (1): RDMA/ocrdma: Remove a unused-label warning Or Gerlitz (1): IB/iser: Bump version, add maintainer Roi Dayan (1): IB/iser: Remove unused variables and dead code Roland Dreier (1): Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next Sagi Grimberg (23): IB/iser: Rename ib_conn -> iser_conn IB/iser: Re-introduce ib_conn IB/iser: Extend iser_free_ib_conn_res() IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon IB/iser: Don't bound release_work completions timeouts IB/iser: Protect tasks cleanup in case IB device was already released IB/iser: Signal iSCSI layer that transport is broken in error completions IB/iser: Centralize iser completion contexts IB/iser: Use internal polling budget to avoid possible live-lock IB/iser: Use single CQ for RX and TX IB/iser: Use beacon to indicate all completions were consumed IB/iser: Optimize completion polling IB/iser: Suppress scsi command send completions IB/iser: Nit - add space after __func__ in iser logging IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h IB/iser: Fix/add kernel-doc style description in iscsi_iser.c IB/mlx5: Use enumerations for PI copy mask IB/iser: Remove redundant assignment IB/iser: Set IP_CSUM as default guard type IB/mlx5: Use extended internal signature layout IB/iser: Centralize ib_sig_domain settings Target/iser: Centralize ib_sig_domain setting IB/mlx5, iser, isert: Add Signature API additions Selvin Xavier (1): RDMA/ocrdma: Get vlan tag from ib_qp_attrs Steve Wise (1): RDMA/cxgb4: Make c4iw_wr_log_size_order static Yishai Hadas (1): IB/mlx5: Modify to work with arbitrary page size MAINTAINERS | 1 + drivers/infiniband/core/uverbs_cmd.c | 2 + drivers/infiniband/core/uverbs_main.c| 5 + drivers/infiniband/hw/cxgb4/cm.c | 32 +- drivers/infiniband/hw/cxgb4/device.c | 2 +- drivers/infiniband/hw/mlx5/main.c| 8 +- drivers/infiniband/hw/mlx5/mem.c | 18 +- drivers/infiniband/hw/mlx5/mr.c | 6 +- drivers/infiniband/hw/mlx5/qp.c | 149 +++--- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 25 +- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 12 + drivers/infiniband/hw/ocrdma/ocrdma_sli.h| 238 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 10 +- drivers/infiniband/ulp/iser/iscsi_iser.c | 313 ++--- drivers/infiniband/ulp/iser/iscsi_iser.h | 408 +++- drivers/infiniband/ulp/iser/iser_initiator.c | 198 drivers/infiniband/ulp/iser/iser_memory.c| 99 ++-- drivers/infiniband/ulp/iser/iser_verbs.c | 667 +++ drivers/infiniband/ulp/isert/ib_isert.c | 65 ++- include/linux/mlx5/qp.h | 35 +- include/rdma/ib_verbs.h | 32 +- 21 files changed, 1372 insertions(+), 953 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus This is later and bigger than I would like, and the blame is all on me: I got very busy with other stuff for a few weeks during the 3.17 cycle, and didn't prepare this tree as soon as I should have. However I don't think there's anything risky here, and no one really cares if we break InfiniBand in 3.17 anyway... Last late set of InfiniBand/RDMA fixes for 3.17: - Fixes for the new memory region re-registration support - iSER initiator error path fixes - Grab bag of small fixes for the qib and ocrdma hardware drivers - Larger set of fixes for mlx4, especially in RoCE mode Alex Estrin (1): IPoIB: Remove unnecessary port query Devesh Sharma (2): RDMA/ocrdma: Report correct value of max_fast_reg_page_list_len RDMA/ocrdma: Do not skip setting deferred_arm Jack Morgenstein (6): IB/mlx4: Fix lockdep splat for the iboe lock mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header IB/mlx4: Don't update QP1 in native mode IB/mlx4: Do not allow APM under RoCE IB/mlx4: Fix VF mac handling in RoCE Markus Stockhausen (1): IB/mlx4: Disable TSO for Connect-X rev. A0 HCAs Matan Barak (2): mlx4: Correct error flows in rereg_mr IB/core: When marshaling uverbs path, clear unused fields Mike Marciniszyn (3): IB/ipath: Change get_user_pages() usage to always NULL vmas IB/qib: Change get_user_pages() usage to always NULL vmas IB/qib: Correct reference counting in debugfs qp_stats Moni Shoua (5): IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Avoid executing gid task when device is being removed Or Gerlitz (1): IB/iser: Bump version to 1.4.1 Roi Dayan (1): IB/iser: Fix RX/TX CQ resource leak on error flow Roland Dreier (1): Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next Sagi Grimberg (1): IB/iser: Allow bind only when connection state is UP Shawn Bohrer (1): IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get devesh.sha...@emulex.com (2): RDMA/ocrdma: Resolve L2 address when creating user AH RDMA/ocrdma: Use right macro in query AH drivers/infiniband/core/umem.c | 19 ++- drivers/infiniband/core/uverbs_marshall.c | 4 + drivers/infiniband/hw/ipath/ipath_user_pages.c | 6 +- drivers/infiniband/hw/mlx4/main.c | 169 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 1 + drivers/infiniband/hw/mlx4/mr.c| 7 +- drivers/infiniband/hw/mlx4/qp.c| 60 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 43 +-- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c| 6 +- drivers/infiniband/hw/qib/qib_debugfs.c| 3 +- drivers/infiniband/hw/qib/qib_qp.c | 8 -- drivers/infiniband/hw/qib/qib_user_pages.c | 6 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 10 +- drivers/infiniband/ulp/iser/iscsi_iser.c | 19 ++- drivers/infiniband/ulp/iser/iscsi_iser.h | 2 +- drivers/infiniband/ulp/iser/iser_verbs.c | 24 ++-- drivers/net/ethernet/mellanox/mlx4/mr.c| 33 +++-- drivers/net/ethernet/mellanox/mlx4/port.c | 11 +- include/rdma/ib_umem.h | 1 + 19 files changed, 277 insertions(+), 155 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1 for-next 00/16] On demand paging
> I would like to note that we at Los Alamos National Laboratory are very > interested in this functionality and it would be great if it gets accepted. Have you done any review or testing of these changes? If so can you share the results? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main set of InfiniBand/RDMA updates for 3.17 merge window: - MR reregistration support - MAD support for RMPP in userspace - iSER and SRP initiator updates - ocrdma hardware driver updates - other fixes... Alex Estrin (1): IB/ipoib: Avoid multicast join attempts with invalid P_key Ariel Nahum (3): IB/iser: Seperate iser_conn and iscsi_endpoint storage space IB/iser: Protect iser state machine with a mutex IB/iser: Replace connection waitqueue with completion object Bart Van Assche (3): scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior IB/srp: Fix deadlock between host removal and multipathd IB/srp: Fix residual handling Dan Carpenter (1): RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf() Devesh Sharma (7): RDMA/ocrdma: Avoid posting DPP requests for RDMA READ be2net: Issue shutdown event to ocrdma driver RDMA/ocrdma: Handle shutdown event from be2net driver RDMA/ocrdma: Remove hardcoding of the max DPP QPs supported RDMA/ocrdma: Delete AH table if ocrdma_init_hw fails after AH table creation RDMA/ocrdma: Obtain SL from device structure RDMA/ocrdma: Update sli data structure for endianness Doug Ledford (2): IB/srpt: Handle GID change events RDMA/uapi: Include socket.h in rdma_user_cm.h Erez Shitrit (2): IB/ipoib: Use P_Key change event instead of P_Key polling mechanism IB/ipoib: Avoid flushing the workqueue from worker context Fabian Frederick (3): IPoIB: Remove unnecessary test for NULL before debugfs_remove() IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0] IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0] Ira Weiny (5): IB/umad: Update module to [pr|dev]_* style print messages IB/mad: Update module to [pr|dev]_* style print messages IB/mad: Add dev_notice messages for various umad/mad registration failures IB/mad: add new ioctl to ABI to support new registration options IB/mad: Add user space RMPP support Jack Morgenstein (1): mlx4_core: Add support for secure-host and SMP firewall Matan Barak (3): IB/core: Add user MR re-registration support mlx4_core: Add helper functions to support MR re-registration IB/mlx4_ib: Add support for user MR re-registration Mitesh Ahuja (4): RDMA/ocrdma: Allow only SEND opcode in case of UD QPs RDMA/ocrdma: Do proper cleanup even if FW is in error state RDMA/ocrdma: Return proper value for max_mr_size RDMA/ocrdma: report asic-id in query device Or Gerlitz (1): IB/ipath: Add P_Key change event support Roi Dayan (3): IB/iser: Support IPv6 address family IB/iser: Add TIMEWAIT_EXIT event handling IB/iser: Clarify a duplicate counters check Roland Dreier (1): Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next Sagi Grimberg (2): IB/iser: Fix responder resources advertisement IB/iser: Remove redundant return code in iser_free_ib_conn_res() Selvin Xavier (8): RDMA/ocrdma: Query and initalize the PFC SL RDMA/ocrdma: Add hca_type and fixing fw_version string in device atrributes RDMA/ocrdma: Avoid reporting wrong completions in case of error CQEs RDMA/ocrdma: Add missing adapter mailbox opcodes RDMA/ocrdma: Increase the size of STAG array in dev structure to 16K RDMA/ocrdma: Initialize the GID table while registering the device RDMA/ocrdma: Fix a sparse warning RDMA/ocrdma: Update the ocrdma module version string Steve Wise (2): RDMA/cxgb4: Only call CQ completion handler if it is armed RDMA/iwcm: Use a default listen backlog if needed Wei Yongjun (1): IB/srp: Fix return value check in srp_init_module() Documentation/infiniband/user_mad.txt | 13 +- drivers/infiniband/core/agent.c| 16 +- drivers/infiniband/core/cm.c | 5 +- drivers/infiniband/core/iwcm.c | 27 ++ drivers/infiniband/core/mad.c | 283 +--- drivers/infiniband/core/mad_priv.h | 3 - drivers/infiniband/core/sa_query.c | 2 +- drivers/infiniband/core/user_mad.c | 188 +++-- drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 93 +++ drivers/infiniband/core/uverbs_main.c | 1 + drivers/infiniband/hw/amso1100/c2_cq.c | 7 +- drivers/infiniband/hw/cxgb4/ev
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA fixes for 3.16 - cxgb4 hardware driver regression fixes - mlx5 hardware driver regression fixes Hariprasad S (2): RDMA/cxgb4: Fix skb_leak in reject_cr() RDMA/cxgb4: Clean up connection on ARP error Or Gerlitz (1): IB/mlx5: Enable "block multicast loopback" for kernel consumers Roland Dreier (1): Merge branches 'cxgb4' and 'mlx5' into for-next Sagi Grimberg (1): mlx5_core: Fix possible race between mr tree insert/delete Steve Wise (2): RDMA/cxgb4: Initialize the device status page RDMA/cxgb4: Call iwpm_init() only once drivers/infiniband/hw/cxgb4/cm.c | 14 +++--- drivers/infiniband/hw/cxgb4/device.c | 18 +++--- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 +- drivers/infiniband/hw/mlx5/qp.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/mr.c | 19 +++ 5 files changed, 39 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.16: - Add iWARP port mapper to avoid conflicts between RDMA and normal stack TCP connections. - Fixes for i386 / x86-64 structure padding differences (ABI compatibility for 32-on-64) from Yann Droneaud. - A pile of SRP initiator fixes from Bart Van Assche. - Fixes for a writeback / memory allocation deadlock with NFS over IPoIB connected mode from Jiri Kosina. - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level drivers. Ariel Nahum (2): IB/iser: Simplify connection management IB/iser: Fix a possible race in iser connection states transition Bart Van Assche (11): IB/srp: Fix a sporadic crash triggered by cable pulling IB/srp: Fix kernel-doc warnings IB/srp: Introduce an additional local variable IB/srp: Introduce srp_map_fmr() IB/srp: Introduce srp_finish_mapping() IB/srp: Introduce the 'register_always' kernel module parameter IB/srp: One FMR pool per SRP connection IB/srp: Rename FMR-related variables IB/srp: Add fast registration support IB/umad: Fix error handling IB/umad: Fix use-after-free on close Christoph Jaeger (1): RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths Colin Ian King (1): IB/mlx4: fix unitialised variable is_mcast Dan Carpenter (2): RDMA/cxgb3: Fix information leak in send_abort() RDMA/cxgb3: Remove a couple unneeded conditions Dennis Dalessandro (1): IB/ipath: Translate legacy diagpkt into newer extended diagpkt Dotan Barak (1): mlx4_core: Fix memory leaks in SR-IOV error paths Duan Jiong (1): RDMA/ocrdma: Convert to use simple_open() Haggai Eran (7): IB/mlx5: Fix error handling in reg_umr IB/mlx5: Add MR to radix tree in reg_mr_callback mlx5_core: Store MR attributes in mlx5_mr_core during creation and after UMR IB/mlx5: Set QP offsets and parameters for user QPs and not just for kernel QPs IB/core: Remove unneeded kobject_get/put calls IB/core: Fix port kobject deletion during error flow IB/core: Fix kobject leak on device register error flow Jack Morgenstein (5): mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAP IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs Jiri Kosina (2): IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO IB/mlx4: Fix gfp passing in create_qp_common() Joe Perches (1): IB/srp: Avoid problems if a header uses pr_fmt Manuel Schölling (1): IB/ipath: Use time_before()/_after() Mike Marciniszyn (1): IB/qib: Fix port in pkey change event Or Gerlitz (3): IB/iser: Bump version to 1.4 IB: Return error for unsupported QP creation flags IB: Add a QP creation flag to use GFP_NOIO allocations Roi Dayan (1): IB/iser: Add missing newlines to logging messages Roland Dreier (6): IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement IB/mad: Fix sparse warning about gfp_t use IB/core: Fix sparse warnings about redeclared functions mlx4_core: Fix GFP flags parameters to be gfp_t Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next Sagi Grimberg (3): mlx5_core: Fix signature handover operation for interleaved buffers mlx5_core: Simplify signature handover wqe for interleaved buffers mlx5_core: Copy DIF fields only when input and output space values match Shachar Raindel (1): IB/mlx5: Refactor UMR to have its own context struct Steve Wise (2): RDMA/cxgb4: Fix vlan support RDMA/cxgb4: Add support for iWARP Port Mapper user space service Tatyana Nikolova (2): RDMA/core: Add support for iWARP Port Mapper user space service RDMA/nes: Add support for iWARP Port Mapper user space service Upinder Malhi (1): IB/usnic: Fix source file missing copyright and license Vinit Agnihotri (1): IB/qib: Additional Intel branding changes Yann Droneaud (5): IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq RDMA/cxgb4: Add missing padding at end of struct c
Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path
On Sat, May 17, 2014 at 1:52 PM, Or Gerlitz wrote: > Roland, we're soon on -rc6 and there's no reason for this to miss > 3.16, could you please comment whether you want it to go through your > tree or net-next? I will pick it up. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86, ioremap: Speed up check for RAM pages
Commit-ID: c81c8a1eeede61e92a15103748c23d100880cc8a Gitweb: http://git.kernel.org/tip/c81c8a1eeede61e92a15103748c23d100880cc8a Author: Roland Dreier AuthorDate: Fri, 2 May 2014 11:18:41 -0700 Committer: H. Peter Anvin CommitDate: Fri, 2 May 2014 11:52:26 -0700 x86, ioremap: Speed up check for RAM pages In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-rol...@kernel.org Signed-off-by: H. Peter Anvin --- arch/x86/mm/ioremap.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 597ac15..bc7527e 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, return err; } +static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages, + void *arg) +{ + unsigned long i; + + for (i = 0; i < nr_pages; ++i) + if (pfn_valid(start_pfn + i) && + !PageReserved(pfn_to_page(start_pfn + i))) + return 1; + + WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn); + + return 0; +} + /* * Remap an arbitrary physical address space into the kernel virtual * address space. Needed when the kernel wants to access high addresses @@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, /* * Don't allow anybody to remap normal RAM that we're using.. */ + pfn = phys_addr >> PAGE_SHIFT; last_pfn = last_addr >> PAGE_SHIFT; - for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) { - int is_ram = page_is_ram(pfn); - - if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))) - return NULL; - WARN_ON_ONCE(is_ram); - } + if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL, + __ioremap_check_ram) == 1) + return NULL; /* * Mappings have to be page-aligned -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86, ioremap: Speed up check for RAM pages
From: Roland Dreier In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier --- arch/x86/mm/ioremap.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 597ac155c91c..bc7527e109c8 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, return err; } +static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages, + void *arg) +{ + unsigned long i; + + for (i = 0; i < nr_pages; ++i) + if (pfn_valid(start_pfn + i) && + !PageReserved(pfn_to_page(start_pfn + i))) + return 1; + + WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn); + + return 0; +} + /* * Remap an arbitrary physical address space into the kernel virtual * address space. Needed when the kernel wants to access high addresses @@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, /* * Don't allow anybody to remap normal RAM that we're using.. */ + pfn = phys_addr >> PAGE_SHIFT; last_pfn = last_addr >> PAGE_SHIFT; - for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) { - int is_ram = page_is_ram(pfn); - - if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))) - return NULL; - WARN_ON_ONCE(is_ram); - } + if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL, + __ioremap_check_ram) == 1) + return NULL; /* * Mappings have to be page-aligned -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA updates for 3.15-rc4: - cxgb4 hardware driver fixes Hariprasad S (1): RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter Steve Wise (3): RDMA/cxgb4: Fix endpoint mutex deadlocks RDMA/cxgb4: Force T5 connections to use TAHOE congestion control RDMA/cxgb4: Only allow kernel db ringing for T4 devs drivers/infiniband/hw/cxgb4/Kconfig | 6 ++--- drivers/infiniband/hw/cxgb4/cm.c | 39 ++- drivers/infiniband/hw/cxgb4/iw_cxgb4.h| 1 + drivers/infiniband/hw/cxgb4/qp.c | 13 +++ drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 14 +++ 5 files changed, 55 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA updates for 3.15-rc2: - Mostly cxgb4 fixes unblocked by the merge of some prerequisites via the net tree. - Drop deprecated MSI-X API use. - A couple other miscellaneous things. Alexander Gordeev (2): IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix() IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix() Eli Cohen (1): IB/mlx5: Add block multicast loopback support Hariprasad Shenai (1): RDMA/cxgb4: Use pr_warn_ratelimited Roland Dreier (1): Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next Steve Wise (9): RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices RDMA/cxgb4: Endpoint timeout fixes RDMA/cxgb4: rmb() after reading valid gen bit RDMA/cxgb4: SQ flush fix RDMA/cxgb4: Max fastreg depth depends on DSGL support RDMA/cxgb4: Initialize reserved fields in a FW work request RDMA/cxgb4: Add missing debug stats RDMA/cxgb4: Use uninitialized_var() RDMA/cxgb4: Fix over-dereference when terminating drivers/infiniband/hw/cxgb4/cm.c | 89 drivers/infiniband/hw/cxgb4/cq.c | 24 - drivers/infiniband/hw/cxgb4/device.c | 41 --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 + drivers/infiniband/hw/cxgb4/mem.c| 6 ++- drivers/infiniband/hw/cxgb4/provider.c | 2 +- drivers/infiniband/hw/cxgb4/qp.c | 70 +++-- drivers/infiniband/hw/cxgb4/resource.c | 10 ++-- drivers/infiniband/hw/cxgb4/t4.h | 72 -- drivers/infiniband/hw/mlx5/main.c| 2 + drivers/infiniband/hw/mlx5/qp.c | 12 + drivers/infiniband/hw/mthca/mthca_main.c | 8 +-- drivers/infiniband/hw/qib/qib_pcie.c | 55 ++-- include/linux/mlx5/device.h | 1 + include/linux/mlx5/qp.h | 1 + 15 files changed, 270 insertions(+), 125 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.15: - The biggest change is core API extensions and mlx5 low-level driver support for handling DIF/DIX-style protection information, and the addition of PI support to the iSER initiator. Target support will be arriving shortly through the SCSI target tree. - A nice simplification to the "umem" memory pinning library now that we have chained sg lists. Kudos to Yishai Hadas for realizing our code didn't have to be so crazy. - Another nice simplification to the sg wrappers used by qib, ipath and ehca to handle their mapping of memory to adapter. - The usual batch of fixes to bugs found by static checkers etc. from intrepid people like Dan Carpenter and Yann Droneaud. - A large batch of cxgb4, ocrdma, qib driver updates. Alex Tabachnik (2): IB/iser: Introduce pi_enable, pi_guard module parameters IB/iser: Initialize T10-PI resources Ariel Nahum (1): IB/iser: Remove struct iscsi_iser_conn Bart Van Assche (7): IB/mlx4: Fix a sparse endianness warning scsi_transport_srp: Fix two kernel-doc warnings IB/srp: Add more logging IB/srp: Avoid duplicate connections IB/srp: Make writing into the "add_target" sysfs attribute interruptible IB/srp: Avoid that writing into "add_target" hangs due to a cable pull IB/srp: Fix a race condition between failing I/O and I/O completion CQ Tang (1): IB/qib: Change SDMA progression mode depending on single- or multi-rail Dan Carpenter (7): IB/qib: Remove duplicate check in get_a_ctxt() RDMA/nes: Clean up a condition RDMA/cxgb4: Fix underflows in c4iw_create_qp() RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq() IB/qib: Cleanup qib_register_observer() mlx4_core: Fix some indenting in mlx4_ib_add() mlx4_core: Make buffer larger to avoid overflow warning Dennis Dalessandro (3): IB/qib: Fix potential buffer overrun in sending diag packet routine IB/ipath: Fix potential buffer overrun in sending diag packet routine IB/qib: Fix memory leak of recv context when driver fails to initialize. Devesh Sharma (9): RDMA/ocrdma: EQ full catastrophe avoidance RDMA/ocrdma: SQ and RQ doorbell offset clean up RDMA/ocrdma: Read ASIC_ID register to select asic_gen RDMA/ocrdma: Allow DPP QP creation RDMA/ocrdma: ABI versioning between ocrdma and be2net be2net: Add abi version between be2net and ocrdma RDMA/ocrdma: Update version string RDMA/ocrdma: Increment abi version count RDMA/ocrdma: Code clean-up Fabio Estevam (1): IB/usnic: Remove '0x' when using %pa format Mike Marciniszyn (7): IB/qib: Fix debugfs ordering issue with multiple HCAs IB/qib: Add percpu counter replacing qib_devdata int_counter IB/qib: Modify software pma counters to use percpu variables IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads IB/core: Remove overload in ib_sg_dma* Moni Shoua (1): IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler Or Gerlitz (3): IB/iser: Print QP information once connection is established IB/iser: Update Mellanox copyright note IB/iser: Bump driver version to 1.3 Prarit Bhargava (1): RDMA/ocrdma: Fix compiler warning Randy Dunlap (1): IB/iser: Fix sector_t format warning Roi Dayan (1): IB/iser: Drain the tx cq once before looping on the rx cq Roland Dreier (2): RDMA/ocrdma: Fix warnings about pointer <-> integer casts Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next Sagi Grimberg (23): IB/core: Introduce protected memory regions IB/core: Introduce signature verbs API mlx5: Implement create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature-related members IB/mlx5: Break up wqe handling into begin & finish routines IB/mlx5: Remove MTT access mode from umr flags helper function IB/mlx5: Keep mlx5 MRs in a radix tree under device IB/mlx5: Support IB_WR_REG_SIG_MR IB/mlx5: Collect signature error completion IB/mlx5: Expose support for signature MR feature IB/iser: Suppress completions for fast registration work requests IB/iser: Avoid FRWR notation, use fastreg instead IB/i
Re: linux rdma 3.14 merge plans
Sure, no problem. Do you have a git tree with the latest versions of all the changes you want for 3.15 in a branch? That would be helpful as I catch up on applying things, so that I don't miss anything. If you don't have one, taking a little time to set one up on github or wherever would be nice. You can base your set of changes on Linus's latest tree. Thanks! Roland On Thu, Mar 6, 2014 at 9:07 PM, Devesh Sharma wrote: > Hi Roland, > > Is it okay to send next series of patches even if previous series is not > accepted yet in your tree? Off-course I will cut patches on top of previous > series of patches. > > -Regards > Devesh > > -Original Message- > From: linux-rdma-ow...@vger.kernel.org > [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Nicholas A. Bellinger > Sent: Thursday, March 06, 2014 12:34 AM > To: Roland Dreier > Cc: Or Gerlitz; Hefty Sean; linux-rdma; Martin K. Petersen; target-devel; > Sagi Grimberg; linux-kernel > Subject: Re: linux rdma 3.14 merge plans > > On Wed, 2014-03-05 at 07:18 -0800, Roland Dreier wrote: >> On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger >> wrote: >> > That all said, do you have an objection wrt taking this bits through >> > target-pending..? Given the dependencies involved, that would seem >> > the most logical path to take. >> >> Perhaps not surprisingly, I would prefer to get a chance to review a >> major change to the core RDMA midlayer rather than having you merge it >> through your tree. So yes I do object. Please give me a chance to >> review and merge it. I am working on that this week. >> > > Great. We'll be looking for a response by the end of the week. > > Otherwise if you end up not having time, we'd still like to move forward for > v3.15 given the amount of review the series has already gotten on the list. > > Thank you, > > --nab > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP
On Thu, Feb 27, 2014 at 2:42 AM, Jiri Kosina wrote: > Whatever suits you best. To sum it up: > > - mlx4 is confirmed to have this problem, and we know how that problem > happens -- see the paragraph in the changelog explaining the dependency > between memory reclaim and allocation of TX ring > > - we have a work around which requires human interaction in order > to provide the information whether GFP_NOFS should be used or not > > - I can very well understand why Mellanox would see that as a hack, but if > more comprehensive fix is necessary, I'd expect those who understand > the code the best to come up with a solution/proposal. I'd assume that > you don't want to keep the code with known and easily triggerable > deadlock out there unfixed. > > - where I see the potential for layering violation in any 'general' > solution is that it's the filesystem that has to be "talking" to the > underlying netdevice, i.e. you'll have to make filesystem > netdevice-aware, right? It's quite clear that this is a general problem with IPoIB connected mode on any IB device. In connected mode, a packet send can trigger establishing a new connection, which will allocate a new QP, which in particular will allocate memory for the QP in the low-level IB device driver. Currently I'm positive that every driver will do GFP_KERNEL allocations when allocating a QP (ehca does both a GFP_KERNEL kmem_cache allocation and vmalloc in internal_create_qp(), mlx5 and mthca are similar to mlx4 and qib does vmalloc() in qib_create_qp()). So this patch needs to be extended to the other 4 IB device drivers in the tree. Also, I don't think GFP_NOFS is enough -- it seems we need GFP_NOIO, since we could be swapping to a block device over iSCSI over IPoIB-CM, so even non-FS stuff could deadlock. I don't think it makes any sense to have a "do_not_deadlock" module parameter, especially one that defaults to "false." If this is the right thing to do, then we should just unconditionally do it. It does seem that only using GFP_NOIO when we really need to would be a very difficult problem--how can we carry information about whether a particular packet is involved in freeing memory through all the layers of, say, NFS, TCP, IPSEC, bonding, &c? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux rdma 3.14 merge plans
On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger wrote: > That all said, do you have an objection wrt taking this bits through > target-pending..? Given the dependencies involved, that would seem the > most logical path to take. Perhaps not surprisingly, I would prefer to get a chance to review a major change to the core RDMA midlayer rather than having you merge it through your tree. So yes I do object. Please give me a chance to review and merge it. I am working on that this week. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "driver core: synchronize device shutdown"
> Hm, no one seems to have said anything for the past 5 years about this. It definitely is hard to hit -- you have to do "shutdown" or "reboot" right as something schedules async work. In our case we have some systems with a large and slightly flaky SAS fabric, so there's a constant level of re-probing SCSI disks, and we occasionally see reboots hanging due to waiting for never-finishing sd probe async work. AFAICT the synchronization does nothing useful and is just a remnant of a patch series where the real meat didn't get applied. But of course it would be great if Shaohua could confirm my understanding. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Revert "driver core: synchronize device shutdown"
From: Roland Dreier This reverts commit 401097ea4b89846d66ac78f7f108d49c2e922d9c. The original changelog said: A patch series to make .shutdown execute asynchronously. Some drivers's shutdown can take a lot of time. The patches can help save some shutdown time. The patches use Arjan's async API. This patch: synchronize all tasks submitted by .shutdown However, I'm not able to find any evidence that any other patches from this series were applied, nor am I able to find any async tasks that are scheduled in a .shutdown context. On the other hand, we see occasional hangs on shutdown that appear to be caused by the async_synchronize_full() in device_shutdown() waiting forever for the async probing in sd if a SCSI disk shows up at just the wrong time — the system starts the probe, but begins shutting down and tears down too much of the SCSI driver to finish the probe. If we had any async shutdown tasks, I guess the right fix would be to create a "shutdown" async domain and have device_shutdown() only wait for that domain. But since there apparently are no async shutdown tasks, we can just revert the waiting. Signed-off-by: Roland Dreier --- drivers/base/core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 2b567177ef78..afea3697fa2e 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -2003,7 +2002,6 @@ void device_shutdown(void) spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); - async_synchronize_full(); } /* -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] NTB: Fix typo in setting one translation register
From: Roland Dreier In the code for Xeon devices in back-to-back mode with xeon_errata_workaround disabled, the downstream device puts the wrong value in SNB_B2B_XLAT_OFFSETL (SNB_MBAR01_DSD_ADDR vs. SNB_MBAR01_USD_ADDR). This was spotted while reading code, since the typo has no practical effect, at least for now: the low 32 bits of both constants are actually identical anyway. However, it's clearer and safer to use the right name. Signed-off-by: Roland Dreier --- drivers/ntb/ntb_hw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c index 170e8e60cdb7..2774d356b689 100644 --- a/drivers/ntb/ntb_hw.c +++ b/drivers/ntb/ntb_hw.c @@ -785,7 +785,7 @@ static int ntb_xeon_setup(struct ntb_device *ndev) /* B2B_XLAT_OFFSET is a 64bit register, but can * only take 32bit writes */ - writel(SNB_MBAR01_DSD_ADDR & 0x, + writel(SNB_MBAR01_USD_ADDR & 0x, ndev->reg_base + SNB_B2B_XLAT_OFFSETL); writel(SNB_MBAR01_USD_ADDR >> 32, ndev->reg_base + SNB_B2B_XLAT_OFFSETU); -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus RDMA/InfiniBand fixes for 3.14-rc3: - Fix some rough edges from the "IP addressing for IBoE" merge - Other misc fixes, mostly to hardware drivers Dan Carpenter (1): IB/iser: Fix use after free in iser_snd_completion() Devesh Sharma (2): RDMA/ocrdma: Fix traffic class shift RDMA/ocrdma: Fix load time panic during GID table init Eli Cohen (4): IB/mlx5: Fix RC transport send queue overhead computation IB/mlx5: Fix binary compatibility with libmlx5 IB/mlx5: Don't set "block multicast loopback" capability IB/mlx5: Remove dependency on X86 Julia Lawall (2): RDMA/nes: Fix error return code RDMA/amso1100: Fix error return code Kumar Sanghvi (1): RDMA/cxgb4: Add missing neigh_release in LE-Workaround path Matan Barak (1): IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device Mike Marciniszyn (1): IB/qib: Add missing serdes init sequence Moni Shoua (6): IB/mlx4: Make sure GID index 0 is always occupied IB/mlx4: Move rtnl locking to the right place IB/mlx4: Do IBoE locking earlier when initializing the GID table IB/mlx4: Do IBoE GID table resets per-port IB/mlx4: Build the port IBoE GID table properly under bonding IB: Report using RoCE IP based gids in port caps Roi Dayan (1): IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection Roland Dreier (2): mlx5: Add include of because of kzalloc()/kfree() use Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next Upinder Malhi (1): IB/usnic: Fix smatch endianness error drivers/infiniband/hw/amso1100/c2.c | 4 +- drivers/infiniband/hw/amso1100/c2_rnic.c| 3 +- drivers/infiniband/hw/cxgb4/cm.c| 1 + drivers/infiniband/hw/mlx4/main.c | 185 +--- drivers/infiniband/hw/mlx5/Kconfig | 2 +- drivers/infiniband/hw/mlx5/main.c | 22 ++- drivers/infiniband/hw/mlx5/qp.c | 18 ++- drivers/infiniband/hw/mlx5/user.h | 7 + drivers/infiniband/hw/nes/nes.c | 5 +- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 4 +- drivers/infiniband/hw/qib/qib_iba7322.c | 5 + drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c | 9 +- drivers/infiniband/ulp/iser/iser_initiator.c| 3 +- drivers/infiniband/ulp/iser/iser_verbs.c| 10 +- drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 +- include/linux/mlx5/driver.h | 3 + include/rdma/ib_verbs.h | 3 +- 18 files changed, 214 insertions(+), 74 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux rdma 3.14 merge plans
On Thu, Feb 6, 2014 at 4:02 PM, Nicholas A. Bellinger wrote: > Can you give us an estimate of when you'll have some time to give > feedback on the outstanding patches..? I hope to get to it in the next few weeks. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.14: - Flow steering for InfiniBand UD traffic - IP-based addressing for IBoE aka RoCE - Pass SRP submaintainership from Dave to Bart - SRP transport fixes from Bart - Add the new Cisco usNIC low-level device driver - Various other fixes Bart Van Assche (4): scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo = off scsi_transport_srp: Fix a race condition scsi_transport_srp: Add rport state diagram scsi_transport_srp: Fix kernel-doc warnings Dan Carpenter (2): mlx5_core: Remove dead code IB/usnic: Use GFP_ATOMIC under spinlock David Dillow (1): MAINTAINERS: Pass the torch of SRP submaintainership Devesh Sharma (2): RDMA/ocrdma: Fix AV_VALID bit position RDMA/ocrdma: Fix OCRDMA_GEN2_FAMILY macro definition Ding Tianhong (1): RDMA/nes: Slight optimization of Ethernet address compare Eli Cohen (13): IB/mlx5: Remove unused code in mr.c IB/mlx5: Fix micro UAR allocator IB/mlx5: Clear out struct before create QP command mlx5_core: Use mlx5 core style warning IB/mlx5: Make sure doorbell record is visible before doorbell IB/mlx5: Implement modify CQ IB/mlx5: Add support for resize CQ mlx5_core: Improve debugfs readability mlx5_core: Fix PowerPC support IB/mlx5: Allow creation of QPs with zero-length work queues IB/mlx5: Abort driver cleanup if teardown hca fails IB/mlx5: Remove old field for create mkey mailbox IB/mlx5: Verify reserved fields are cleared Haggai Eran (1): mlx5_core: Fix out arg size in access_register command Ira Weiny (1): IB/qib: Fix QP check when looping back to/from QP1 Julia Lawall (1): IB/mlx4: Fix error return code Matan Barak (9): IB/core: Add flow steering support for IPoIB UD traffic IB/core: Add support for IB L2 device-managed steering mlx4_core: Add support for steerable IB UD QPs IB/mlx4: Enable device-managed steering support for IB ports too IB/mlx4: Add mechanism to support flow steering over IB links IB/mlx4: Add support for steerable IB UD QPs IB/core: Ethernet L2 attributes in verbs/cm structures IB/core: Make ib_addr a core IB module IB/mlx4: Add dependency INET Michal Schmidt (1): IPoIB: Report operstate consistently when brought up without a link Moni Shoua (5): IB/cma: IBoE (RoCE) IP-based GID addressing IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing RDMA/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing RDMA/ocrdma: Populate GID table with IP based gids Or Gerlitz (2): IB/core: Resolve Ethernet L2 addresses when modifying QP IB/core: Fix unused variable warning Paul Bolle (1): RDMA/cxgb4: Fix gcc warning on 32-bit arch Roland Dreier (6): IB/usnic: Fix typo "Ignorning" -> "Ignoring" RDMA/ocrdma: Move ocrdma_inetaddr_event outside of "#if CONFIG_IPV6" RDMA/ocrdma: Add dependency on INET IB/mlx4: Use IS_ENABLED(CONFIG_IPV6) Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next Merge branch 'ip-roce' into for-next Somnath Kotur (1): RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal() Svetlana Mavrina (1): RDMA/amso1100: Add check if cache memory was allocated before freeing it Upinder Malhi (22): IB/usnic: Add Cisco VIC low-level hardware driver IB/usnic: Change WARN_ON to lockdep_assert_held IB/usnic: Add struct usnic_transport_spec IB/usnic: Push all forwarding state to usnic_fwd.[hc] IB/usnic: Port over main.c and verbs.c to the usnic_fwd.h IB/usnic: Port over usnic_ib_qp_grp.[hc] to new usnic_fwd.h IB/usnic: Port over sysfs to new usnic_fwd.h IB/usnic: Update ABI and Version file for UDP support IB/usnic: Add UDP support to usnic_fwd.[hc] IB:usnic: Add UDP support to usnic_transport.[hc] IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h IB/usnic: Add UDP support in usnic_ib_qp_grp.[hc] IB/core: Add RDMA_TRANSPORT_USNIC_UDP IB/usnic: Remove superflous parentheses IB/usnic: Use for_each_sg instead of a for-loop IB/usnic: Expose flows via debugfs IB/usnic: Append documentation to usnic_transport.h and cleanup IB/usnic: Fix endianness-related warnings IB/usnic: Add d
Re: linux rdma 3.14 merge plans
On Tue, Jan 21, 2014 at 2:00 PM, Or Gerlitz wrote: > Roland, ping! the signature patches were posted > three months ago. We > deserve a response from the maintainer that goes beyond "I need to > think on that". > > Responsiveness was stated by Linus to be the #1 requirement from > kernel maintainers. Or, I'm not sure what response you're after from me. Linus has also said that maintainers should say "no" a lot more (http://lwn.net/Articles/571995/) so maybe you want me to say, "No, I won't merge this patch set, since it adds a bunch of complexity to support a feature no one really cares about." Is that it? (And yes I am skeptical about this stuff — I work at an enterprise storage company and even here it's hard to find anyone who cares about DIF/DIX, especially offload features that stop it from being end-to-end) I'm sure you're not expecting me to say, "Sure, I'll merge it without understanding the problem it's solving or how it's doing that," especially given the your recent history of pushing me to merge stuff like the IP-RoCE patches back when they broke the userspace ABI. I'd really rather spend my time on something actually useful like cleaning up softroce. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus The following changes since commit 374b105797c3d4f29c685f3be535c35f5689b30e: Linux 3.13-rc3 (2013-12-06 09:34:04 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus for you to fetch changes up to 22f12c60e12a4112fdca31582e66fe501600ee2b: Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus (2013-12-23 09:19:02 -0800) Last batch of InfiniBand/RDMA changes for 3.13 / 2014: - Additional checks for uverbs to ensure forward compatibility, handle malformed input better. - Fix potential use-after-free in iWARP connection manager. - Make a function static. Rashika (1): RDMA/cxgb4: Make _c4iw_write_mem_dma() static Roland Dreier (2): IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA() Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus Steve Wise (1): RDMA/iwcm: Don't touch cm_id after deref in rem_ref Yann Droneaud (7): IB/core: const'ify inbuf in struct ib_udata IB/uverbs: Check reserved field in extended command header IB/uverbs: Check comp_mask in destroy_flow IB/uverbs: Check reserved fields in create_flow IB/uverbs: Set error code when fail to consume all flow_spec items IB/uverbs: Check input length in flow steering uverbs IB/uverbs: Check access to userspace response buffer in extended command drivers/infiniband/core/iwcm.c| 11 +-- drivers/infiniband/core/uverbs.h | 10 +- drivers/infiniband/core/uverbs_cmd.c | 17 + drivers/infiniband/core/uverbs_main.c | 27 --- drivers/infiniband/hw/cxgb4/mem.c | 2 +- include/rdma/ib_verbs.h | 2 +- 6 files changed, 53 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.13: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC Bart Van Assche (11): IB/srp: Keep rport as long as the IB transport layer scsi_transport_srp: Add transport layer error handling IB/srp: Use SRP transport layer error recovery IB/srp: Start timers if a transport layer error occurs scsi_transport_srp: Add periodic reconnect support IB/srp: Add periodic reconnect functionality IB/srp: Export sgid to sysfs IB/srp: Introduce srp_alloc_req_data() IB/srp: Make queue size configurable IB/srp: Avoid offlining operational SCSI devices IB/srp: Report receive errors correctly Ben Hutchings (1): IB/cxgb4: Fix formatting of physical address Dan Carpenter (1): RDMA/ocrdma: Silence an integer underflow warning Dave Jones (1): RDMA/nes: Remove self-assignment from nes_query_qp() Doug Ledford (2): IB/cma: Use cached gids IB/cma: Check for GID on listening device first Eli Cohen (17): IB/mlx5: Fix check of number of entries in create CQ IB/mlx5: Multithreaded create MR IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR IB/mlx5: Simplify mlx5_ib_destroy_srq mlx5: Fix cleanup flow when DMA mapping fails mlx5: Support communicating arbitrary host page size to firmware mlx5: Clear reserved area in set_hca_cap() IB/mlx5: Remove dead code in mr.c IB/mlx5: Remove "Always false" comparison IB/mlx5: Update opt param mask for RTS2RTS mlx5: Use enum to indicate adapter page size IB/mlx4: Fix endless loop in resize CQ IB/core: Encorce MR access rights rules on kernel consumers IB/mlx5: Remove dead code IB/mlx5: Fix list_del of empty list IB/mlx4: Fix device max capabilities check IB/mlx5: Fix page shift in create CQ for userspace Erez Shitrit (6): IPoIB: Fix crash in dev_open error flow IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush() IPoIB: Avoid flushing the driver workqueue on dev_down IPoIB: Fix usage of uninitialized multicast objects IPoIB: Add path query flushing in ipoib_ib_dev_cleanup IPoIB: Start multicast join process only on active ports Jack Wang (1): IB/srp: Add change_queue_depth and change_queue_type support Jan Kara (2): IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast() IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() Joe Perches (1): IB/ucma: Convert use of typedef ctl_table to struct ctl_table Latchesar Ionkov (1): IB/core: Pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly Matan Barak (2): IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/core: Re-enable create_flow/destroy_flow uverbs Mathias Krause (1): IB/netlink: Remove superfluous RDMA_NL_GET_OP() masking Michal Nazarewicz (1): RDMA/cma: Remove unused argument and minor dead code Michal Schmidt (1): IPoIB: lower NAPI weight Mike Marciniszyn (2): IB/qib: Fix checkpatch __packed warnings IB/qib: Fix txselect regression Moshe Lazer (2): IB/mlx5: Fix srq free in destroy qp mlx5_core: Change optimal_reclaimed_pages for better performance Naresh Gottumukkala (2): RDMA/ocrdma: Fix a crash in rmmod RDMA/ocrdma: Remove redundant check in ocrdma_build_fr() Roland Dreier (1): Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next Sean Hefty (1): RDMA/ucma: Discard events for IDs not yet claimed by user space Tal Alon (1): IPoIB: Change CM skb memory allocation to be non-atomic during init Upinder Malhi \(umalhi\) (1): IB/core: Add Cisco usNIC rdma node and transport types Vu Pham (2): IB/srp: Make transport layer retry count configurable IB/srp: Remove target from list before freeing Scsi_Host structure Yann Droneaud (5): IB/core: Rename 'flow' structs to match other uverbs structs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Use a common header for uverbs flow_specs IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: extended command: an improved infrastructure for uverbs command
Re: linux-next: build warning after merge of the infiniband tree
Sorry about that, folded in the fix I missed and will push out the tree shortly. On Mon, Nov 4, 2013 at 5:42 AM, Marciniszyn, Mike wrote: > This issue was caught by Tetsuo Handa and Acked on 10/30: > http://marc.info/?t=13831336458&r=1&w=2. > > Roland, I noticed that the Tetsuo's original message didn't cc the linux-rdma > list? > > Mike > >> -Original Message- >> From: Stephen Rothwell [mailto:s...@canb.auug.org.au] >> Sent: Sunday, November 03, 2013 11:55 PM >> To: Roland Dreier; linux-r...@vger.kernel.org >> Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Jan Kara; >> Marciniszyn, Mike >> Subject: linux-next: build warning after merge of the infiniband tree >> >> Hi all, >> >> After merging the infiniband tree, today's linux-next build (x86_64 >> allmodconfig) produced this warning: >> >> drivers/infiniband/hw/ipath/ipath_user_sdma.c: In function >> 'ipath_user_sdma_pin_pages': >> drivers/infiniband/hw/ipath/ipath_user_sdma.c:283:6: warning: 'j' is used >> uninitialized in this function [-Wuninitialized] >> ret = get_user_pages_fast(addr, j, 0, pages); >> ^ >> >> Introduced by commit 18fec3c6bdcb ("IB/ipath: Convert >> ipath_user_sdma_pin_pages() to use get_user_pages_fast()"). How did that >> pass review or testing? >> >> -- >> Cheers, >> Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, This is the "disable ABI we don't want to freeze" pull I warned you about. Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Disable not-quite-ready userspace ABI for IB flow steering Yann Droneaud (1): IB/core: Temporarily disable create_flow/destroy_flow uverbs drivers/infiniband/Kconfig| 11 +++ drivers/infiniband/core/uverbs.h | 2 ++ drivers/infiniband/core/uverbs_cmd.c | 4 drivers/infiniband/core/uverbs_main.c | 6 ++ drivers/infiniband/hw/mlx4/main.c | 2 ++ include/uapi/rdma/ib_user_verbs.h | 6 ++ 6 files changed, 31 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] please pull infiniband.git
On Mon, Oct 14, 2013 at 5:52 PM, Linus Torvalds wrote: > So get your act together, and push back on the people you are supposed > to manage. Because this is *not* acceptable for post-rc5, and I'm > giving this single warning. Next time, I'll just ignore the sh*t you > send me. > > Comprende? Fair enough. I've been AWOL for a month due to real life / non-kernel stuff, and I didn't want the Mellanox guys to miss a kernel cycle just because I couldn't get my act together. So this one is totally on me -- I know it's late in the cycle and I tried to sneak it in. I do expect to send one more patch turning off a not-fully-baked new feature for 3.12, but other than that everything else will wait for 3.13. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Last batch of IB changes for 3.12: many mlx5 hardware driver fixes plus one trivial semicolon cleanup. Eli Cohen (12): IB/mlx5: Fix send work queue size calculation mlx5: Remove checksum on command interface commands IB/mlx5: Decrease memory consumption of mr caches IB/mlx5: Avoid async events on invalid port number mlx5: Keep polling to reclaim pages while any returned mlx5: Fix layout of struct mlx5_init_seg IB/mlx5: Disable atomic operations mlx5: Fix opt param mask for sq err to rts transition IB/mlx5: Fix opt param mask according to firmware spec mlx5: Fix error code translation from firmware to driver IB/mlx5: Fix alignment of reg umr gather buffers IB/mlx5: Ensure proper synchronization accessing memory Joe Perches (1): IB: Remove unnecessary semicolons Moshe Lazer (2): IB/mlx5: Flush cache workqueue before destroying it IB/mlx5: Fix memory leak in mlx5_ib_create_srq Roland Dreier (1): Merge branch 'misc' into for-next Sagi Grimberg (1): IB/mlx5: Fix eq names to display nicely in /proc/interrupts drivers/infiniband/hw/amso1100/c2_ae.c | 2 +- drivers/infiniband/hw/mlx5/main.c | 16 +++-- drivers/infiniband/hw/mlx5/mr.c| 70 +-- drivers/infiniband/hw/mlx5/qp.c| 80 -- drivers/infiniband/hw/mlx5/srq.c | 4 +- drivers/infiniband/hw/mthca/mthca_eq.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 6 +- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c| 6 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 28 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 21 ++ .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 16 - include/linux/mlx5/device.h| 4 +- include/linux/mlx5/driver.h| 6 +- 15 files changed, 126 insertions(+), 141 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.12 merge window: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes CQ Tang (1): IB/qib: Improve SDMA performance Hadar Hen Zion (3): IB/core: Add receive flow steering support IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive flow steering support Igor Ivanov (1): IB/core: Infrastructure for extensible uverbs commands Ira Weiny (1): IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards Jim Foraker (1): IPoIB: Fix race in deleting ipoib_neigh entries Matan Barak (1): IB/core: Better checking of userspace values for receive flow steering Naresh Gottumukkala (19): RDMA/ocrdma: Style and redundant code cleanup RDMA/ocrdma: Remove redundant dev reference RDMA/ocrdma: Don't allow zero/invalid sgid usage RDMA/ocrdma: Remove driver QP state machine RDMA/ocrdma: Remove __packed RDMA/ocrdma: Cache recv DB until QP moved to RTR RDMA/ocrdma: Create IRD queue fix RDMA/ocrdma: Add support for fast register work requests (FRWR) RDMA/ocrdma: Remove the MTU check based on Ethernet MTU RDMA/ocrdma: Fix to work with even a single MSI-X vector RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24 RDMA/ocrdma: FRMA code cleanup RDMA/ocrdma: Dont use PD 0 for userpace CQ DB RDMA/ocrdma: Increase STAG array size RDMA/ocrdma: Fix for displaying proper link speed RDMA/ocrdma: Consider multiple SGES in case of DPP RDMA/ocrdma: Add ABI versioning support RDMA/ocrdma: Fill PVID in UMC case RDMA/ocrdma: Fix passing wrong opcode to modify_srq Or Gerlitz (1): IB/iser: Use proper debug level value for info prints Paul Bolle (1): IB/qib: Make qib_driver static Roi Dayan (1): IB/iser: Fix possible memory leak in iser_create_frwr_pool() Roland Dreier (2): RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next Sagi Grimberg (5): IB/iser: Generalize rdma memory registration IB/iser: Handle unaligned SG in separate function IB/iser: Place the fmr pool into a union in iser's IB conn struct IB/iser: Introduce fast memory registration model (FRWR) IB/iser: Fix redundant pointer check in dealloc flow Shlomo Pongratz (2): IB/iser: Restructure allocation/deallocation of connection resources IB/iser: Accept session->cmds_max from user space Steve Wise (9): RDMA/cma: Add IPv6 support for iWARP RDMA/cxgb4: Use correct bit shift macros for vlan filter tuples RDMA/cxgb4: Handle newer firmware changes RDMA/cxgb4: Fix QP flush logic RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK RDMA/cxgb4: Advertise ~0ULL as max MR size RDMA/cxgb4: Issue RI.FINI before closing when entering TERM Vipul Pandya (3): cxgb4: Add routines to create and remove listening IPv6 servers cxgb4: Add CLIP support to store compressed IPv6 address RDMA/cxgb4: Add support for active and passive open connection with IPv6 address Yijing Wang (1): IB/qib: Clean up unnecessary MSI/MSI-X capability find Yishai Hadas (3): mlx4_core: Fix XRC QPs detection in the resource tracker IB/core: Add locking around event dispatching on XRC target QPs IB/core: Fixes to XRC reference counting in uverbs drivers/infiniband/core/cma.c | 44 +- drivers/infiniband/core/uverbs.h | 4 + drivers/infiniband/core/uverbs_cmd.c | 250 +- drivers/infiniband/core/uverbs_main.c | 42 +- drivers/infiniband/core/verbs.c| 30 + drivers/infiniband/hw/amso1100/c2_ae.c | 18 +- drivers/infiniband/hw/amso1100/c2_cm.c | 16 +- drivers/infiniband/hw/cxgb3/iwch_cm.c | 46 +- drivers/infiniband/hw/cxgb4/Kconfig| 2 +- drivers/infiniband/hw/cxgb4/cm.c | 860 --- drivers/infiniband/hw/cxgb4/cq.c | 329 +--- drivers/infinib
Re: [PATCH 9/9] tcm_qla2xxx: Add special case for COMPARE_AND_WRITE data_direction
On Wed, Aug 21, 2013 at 7:38 AM, Roland Dreier wrote: > I don't understand this. In fact the whole patch series looks quite > confused. COMPARE AND WRITE is a normal Data-Out command, with no > requirement for special bidirectional handling or anything like that. > The only slightly unusual thing is that a CAW command with a NUMBER OF > LOGICAL BLOCKS equal to N will actually transfer 2*N worth of data -- > one set of data for the compare operation and a second set to write if > the compare succeeds. But just to be clear, the transfer of those 2*N > blocks happens as a single transfer during the Data-Out phase. OK, I understand the patch set a bit better. You're using the bidi infrastructure to have a place to stick the data that you internally read to implement the compare, but then you end up having places like this where you have to say, "oh it's not really a bidi command, it's just a compare and write." Shouldn't there be a way to confine the COMPARE AND WRITE handling to the actual implementation of that command? Or maybe make the bidi handling more generic so that this becomes clearer? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] tcm_qla2xxx: Add special case for COMPARE_AND_WRITE data_direction
On Tue, Aug 20, 2013 at 1:08 PM, Nicholas A. Bellinger wrote: > Add a special case for COMPARE_AND_WRITE for the reverse data direction > mapping used for pci_map_sg() + friends. I don't understand this. In fact the whole patch series looks quite confused. COMPARE AND WRITE is a normal Data-Out command, with no requirement for special bidirectional handling or anything like that. The only slightly unusual thing is that a CAW command with a NUMBER OF LOGICAL BLOCKS equal to N will actually transfer 2*N worth of data -- one set of data for the compare operation and a second set to write if the compare succeeds. But just to be clear, the transfer of those 2*N blocks happens as a single transfer during the Data-Out phase. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Jens / James, do you guys plan to send this to Linus for 3.11? Triggering this bug is a bit esoteric but the impact is pretty nasty (corrupting an unrelated process). Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Wed, Aug 7, 2013 at 9:31 AM, Douglas Gilbert wrote: > So what kind of signal was leading to your "stomping on the memory"? > Was it user generated or something like SIGIO, SIGPIPE or a RT signal? It was sometimes SIGHUP (for reopening log files) and sometimes SIGALARM (for various periodic things). > To get around the SG_IO ioctl restart problem (for non idempotent > SCSI commands) could we replace a -ERESTARTSYS return value > with -EINTR ? > > As I noted in a previous post, for robust user space code using the > SG_IO ioctl, masking signals during the IO may help. Yes, absolutely. But process A should be able to keep its memory uncorrupted even if process B is coded wrong :) > And what about bsg? Is it any better or worse than sg in the case > of interrupted SG_IO ioctls? Apart from the interface (sg_io_hdr > v3 versus v4) it should be a drop in replacement for sg. As far as I can tell bsg looks much better w.r.t. signals -- I don't see anywhere that it schedules work onto a workqueue or other kernel thread, and it looks like the SG_IO ioctl there actually has nowhere that checks for signals. All sleeps will be uninterruptible, which I guess may be better or worse depending on your perspective. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Wed, Aug 7, 2013 at 7:38 AM, David Milburn wrote: > I was able to succesfully test this patch overnight, I had been experimenting > with the > sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a > orphan process > which prevented the corruption, but your solution seems much better. Very cool, thanks for the testing. I actually looked at using BIO_NULL_MAPPED as well, but it seemed a bit too fragile to me -- it had the right effect of skipping __bio_copy_iov(), and skipping the __free_pages() stuff in there is OK because sg owns its pages rather than the bio layer, but all that seemed vulnerable to being broken by an unrelated change. Out of curiousity, were you already working on this bug? Because if you had fixed it a few weeks earlier we might not have spent so long wondering WTF was stomping on the memory of one of our processes :) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
From: Roland Dreier There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley , add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier Cc: --- fs/bio.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..c5eae72 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio->bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, -bmd->nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd->is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into a random user address space, just free. +*/ + if (current->mm) + ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, +bmd->nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd->is_our_pages); + else if (bmd->is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec->bv_page); + }
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley wrote: > I agree with the analysis. The fix is a bit draconian, though. A > workqueue actually runs in a kernel thread and there's a simple test for > that (!current->mm), so how about this instead (which is much less > intrusive) > --- > diff --git a/fs/bio.c b/fs/bio.c > index 94bbc04..e2ab39c 100644 > --- a/fs/bio.c > +++ b/fs/bio.c > @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct > bio_vec *iovecs, > int bio_uncopy_user(struct bio *bio) > { > struct bio_map_data *bmd = bio->bi_private; > - int ret = 0; > + struct bio_vec *bvec; > + int ret = 0, i; > > - if (!bio_flagged(bio, BIO_NULL_MAPPED)) > - ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, > -bmd->nr_sgvecs, bio_data_dir(bio) == > READ, > -0, bmd->is_our_pages); > + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { > + /* > +* if we're in a workqueue, the request is orphaned, so > +* don't copy into the kernel address space, just free > +*/ > + if (current->mm) > + ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, > +bmd->nr_sgvecs, > bio_data_dir(bio) == READ, > +0, bmd->is_our_pages); > + else if (bmd->is_our_pages) > + bio_for_each_segment_all(bvec, bio, i) > + __free_page(bvec->bv_page); > + } > bio_free_map_data(bmd); > bio_put(bio); > return ret; Yes, looks reasonable -- I can't think of any reason why anyone would ever want the bio code to copy to a random userspace address space. Acked-by: Roland Dreier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
From: Roland Dreier There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! Fix this by telling sg_finish_rem_req() whether we're on a workqueue or not, and if we are, calling a new function blk_rq_unmap_user_nocopy() that does everything the original blk_rq_unmap_user() does except calling copy_{to,from}_user(). This requires a few levels of plumbing through a "copy" flag in the bio layer. I also considered fixing this by having the sg code just set BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which happens to work because the __free_page() part of __bio_copy_iov() isn't needed for sg (because sg handles its own pages). However, this seems coincidental and fragile, so I preferred making the fix explicit, at the cost of minor tweaks to the bio code. Huge thanks to Costa Sapuntzakis for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier Cc: --- block/blk-map.c| 15 --- drivers/scsi/sg.c | 19 ++- fs/bio.c | 22 +++--- include/linux/bio.h| 2 +- include/linux/blkdev.h | 11 ++- 5 files changed, 44 insertions(+), 25 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 623e1cd..bd63201 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -25,7 +25,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq, return 0; } -static int __blk_rq_unmap_user(struct bio *bio) +static int __blk_rq_unmap_user(struct bio *bio, bool copy) { int ret = 0; @@ -33,7 +33,7 @@ static int __blk_rq_unmap_user(struct bio *bio) if (bio_flagged(bio, BIO_USER_MAPPED)) bio_unmap_user(bio); else - ret = bio_uncopy_user(bio); + ret = bio_uncopy_user(bio, copy); } return ret; @@ -80,7 +80,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq, /* if it was boucned we must
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA fixes for 3.11-rc: - Fixes for the newly merged mlx5 hardware driver - Stack info leak fixes from Dan Carpenter - Fixes for pkey table handling with SR-IOV - A few other small things Andi Shyti (1): mlx5_core: Variable may be used uninitialized Dan Carpenter (6): RDMA/cxgb4: Fix stack info leak in c4iw_create_qp() RDMA/ocrdma: Fix several stack info leaks RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq() RDMA/cxgb3: Fix stack info leak in iwch_create_cq() IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext() mlx5_core: Fix use after free in mlx5_cmd_comp_handler() Eli Cohen (1): mlx5_core: Implement new initialization sequence Erez Shitrit (1): IPoIB: Fix pkey change flow for virtualization environments Jack Morgenstein (2): IB/mlx4: Use default pkey when creating tunnel QPs IB/core: Create QP1 using the pkey index which contains the default pkey Mike Marciniszyn (1): IB/qib: Add err_decode() call for ring dump Or Gerlitz (1): IPoIB: Make sure child devices use valid/proper pkeys Paul Bolle (1): RDMA/cma: Fix gcc warning Roland Dreier (3): RDMA/ocrdma: Remove unused include Revert "RDMA/nes: Fix compilation error when nes_debug is enabled" Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma' and 'qib' into for-next Sean Hefty (2): RDMA/cma: Fix accessing invalid private data for UD RDMA/cma: Only call cma_save_ib_info() for CM REQs Wei Yongjun (1): IB/mlx5: Fix error return code in init_one() drivers/infiniband/core/cma.c | 29 + drivers/infiniband/core/mad.c | 8 ++- drivers/infiniband/hw/cxgb3/iwch_provider.c| 1 + drivers/infiniband/hw/cxgb4/qp.c | 2 + drivers/infiniband/hw/mlx4/mad.c | 10 ++- drivers/infiniband/hw/mlx5/main.c | 11 ++-- drivers/infiniband/hw/mlx5/qp.c| 2 +- drivers/infiniband/hw/nes/nes_hw.c | 4 +- drivers/infiniband/hw/nes/nes_verbs.c | 3 +- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 1 - drivers/infiniband/hw/ocrdma/ocrdma_verbs.c| 5 +- drivers/infiniband/hw/qib/qib_iba7322.c| 2 + drivers/infiniband/hw/qib/qib_sdma.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 76 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 9 +++ drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 19 -- drivers/net/ethernet/mellanox/mlx5/core/main.c | 69 ++-- .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 20 -- include/linux/mlx5/device.h| 20 ++ include/linux/mlx5/driver.h| 4 +- 21 files changed, 239 insertions(+), 60 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main (updated) batch of InfiniBand/RDMA changes for 3.11 merge window: - AF_IB (native IB addressing) for CMA from Sean Hefty - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - Resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - Other small changes and fixes Bart Van Assche (6): IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Skip host settle delay IB/srp: Fail I/O fast if target offline IB/srp: Maintain a single connection per I_T nexus IB/srp: Make HCA completion vector configurable IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline Dan Carpenter (2): RDMA/cxgb3: Timeout condition is never true mlx5: Return -EFAULT instead of -EPERM Dean Luick (1): IB/qib: Log all SDMA errors unconditionally Dotan Barak (1): IB/srp: Fix remove_one crash due to resource exhaustion Eli Cohen (1): mlx5: Add driver for Mellanox Connect-IB adapters Gottumukkala, Naresh (1): RDMA/ocrdma: Remove use_cnt for queues Jack Morgenstein (1): IB/core: Add reserved values to enums for low-level driver use Mike Marciniszyn (7): IB/qib: Add DCA support IB/qib: Remove atomic_inc_not_zero() from QP RCU IB/qib: Optimize CQ callbacks IB/qib: Convert opcode counters to per-context IB/qib: Add per-context stats interface IB/qib: Add qp_stats debug file IB/qib: Fix module-level leak Mitko Haralanov (1): IB/qib: New transmitter tunning settings for Dell 1.1 backplane Moshe Lazer (1): mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec Naresh Gottumukkala (5): RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create RDMA/ocrdma: Replace ocrdma_err with pr_err RDMA/ocrdma: Set bad_wr in error case RDMA/ocrdma: Change macros to inline funtions RDMA/ocrdma: Reorg structures to avoid padding Ramkrishna Vepa (2): IB/qib: Add optional NUMA affinity IB/qib: Add dual-rail NUMA awareness for PSM processes Roland Dreier (6): mlx5: Fix parameter type of health_handler_t IB/mlx5: Make profile[] static in main.c mlx5_core: Fixes for sparse warnings IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next Merge branches 'mlx5', 'qib' and 'srp' into for-next Sean Hefty (28): RDMA/cma: Define native IB address RDMA/cma: Allow enabling reuseaddr in any state RDMA/cma: Include AF_IB in loopback and any address checks IB/addr: Add AF_IB support to ip_addr_size RDMA/cma: Update port reservation to support AF_IB RDMA/cma: Allow user to specify AF_IB when binding RDMA/cma: Do not modify sa_family when setting loopback address RDMA/cma: Add helper functions to return id address information RDMA/cma: Restrict AF_IB loopback to binding to IB devices only RDMA/cma: Verify that source and dest sa_family are the same RDMA/cma: Add support for AF_IB to rdma_resolve_addr() RDMA/cma: Add support for AF_IB to rdma_resolve_route() RDMA/cma: Add support for AF_IB to cma_get_service_id() RDMA/cma: Remove unused SDP related code RDMA/cma: Merge cma_get/save_net_info RDMA/cma: Expose private data when using AF_IB RDMA/cma: Set qkey for AF_IB RDMA/cma: Only listen on IB devices when using AF_IB RDMA/ucma: Support querying for AF_IB addresses IB/sa: Export function to pack a path record into wire format RDMA/ucma: Support querying when IB paths are not reversible RDMA/cma: Export cma_get_service_id() RDMA/ucma: Add ability to query GID addresses RDMA/ucma: Name changes to indicate only IP addresses supported RDMA/ucma: Allow user space to bind to AF_IB RDMA/ucma: Allow user space to pass AF_IB into resolve RDMA/ucma: Allow user space to specify AF_IB when joining multicast RDMA/cma: Export AF_IB statistics Vinit Agnihotri (1): IB/qib: Update minor version number Vu Pham (1): IB/srp: Bump driver version and release date Wei Yongjun (3): IB/ehca: Fix error return code in ehca_create_slab_caches() RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd() IB/core: Fix error return code in add_port() Documentation/ABI/stable/sysfs-driver-ib_srp |7 + MAINTAINERS
Re: [GIT PULL] please pull infiniband.git
On Wed, Jul 10, 2013 at 7:35 AM, Sebastian Riemer wrote: > > I've checked the commits on that tag and the following commit is not > what we've agreed on: Sorry about that. The discussion was long and complex and I probably made a mistake in aplying the patches. Please me send a patch to fix the driver to what it should be, and I will merge it ASAP. - Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.11 merge window: - AF_IB (native IB addressing) for CMA from Sean Hefty - New mlx5 driver for Mellanox Connect-IB adapters - SRP fixes from Bart Van Assche - qib HW driver updates - Resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - Other small changes and fixes Bart Van Assche (5): IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Skip host settle delay IB/srp: Fail I/O fast if target offline IB/srp: Maintain a single connection per I_T nexus IB/srp: Make HCA completion vector configurable Dan Carpenter (1): RDMA/cxgb3: Timeout condition is never true Dotan Barak (1): IB/srp: Fix remove_one crash due to resource exhaustion Eli Cohen (1): mlx5: Add driver for Mellanox Connect-IB adapters Gottumukkala, Naresh (1): RDMA/ocrdma: Remove use_cnt for queues Jack Morgenstein (1): IB/core: Add reserved values to enums for low-level driver use Mike Marciniszyn (6): IB/qib: Add DCA support IB/qib: Remove atomic_inc_not_zero() from QP RCU IB/qib: Optimize CQ callbacks IB/qib: Convert opcode counters to per-context IB/qib: Add per-context stats interface IB/qib: Add qp_stats debug file Mitko Haralanov (1): IB/qib: New transmitter tunning settings for Dell 1.1 backplane Naresh Gottumukkala (5): RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create RDMA/ocrdma: Replace ocrdma_err with pr_err RDMA/ocrdma: Set bad_wr in error case RDMA/ocrdma: Change macros to inline funtions RDMA/ocrdma: Reorg structures to avoid padding Ramkrishna Vepa (2): IB/qib: Add optional NUMA affinity IB/qib: Add dual-rail NUMA awareness for PSM processes Roland Dreier (5): mlx5: Fix parameter type of health_handler_t IB/mlx5: Make profile[] static in main.c mlx5_core: Fixes for sparse warnings IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next Sean Hefty (28): RDMA/cma: Define native IB address RDMA/cma: Allow enabling reuseaddr in any state RDMA/cma: Include AF_IB in loopback and any address checks IB/addr: Add AF_IB support to ip_addr_size RDMA/cma: Update port reservation to support AF_IB RDMA/cma: Allow user to specify AF_IB when binding RDMA/cma: Do not modify sa_family when setting loopback address RDMA/cma: Add helper functions to return id address information RDMA/cma: Restrict AF_IB loopback to binding to IB devices only RDMA/cma: Verify that source and dest sa_family are the same RDMA/cma: Add support for AF_IB to rdma_resolve_addr() RDMA/cma: Add support for AF_IB to rdma_resolve_route() RDMA/cma: Add support for AF_IB to cma_get_service_id() RDMA/cma: Remove unused SDP related code RDMA/cma: Merge cma_get/save_net_info RDMA/cma: Expose private data when using AF_IB RDMA/cma: Set qkey for AF_IB RDMA/cma: Only listen on IB devices when using AF_IB RDMA/ucma: Support querying for AF_IB addresses IB/sa: Export function to pack a path record into wire format RDMA/ucma: Support querying when IB paths are not reversible RDMA/cma: Export cma_get_service_id() RDMA/ucma: Add ability to query GID addresses RDMA/ucma: Name changes to indicate only IP addresses supported RDMA/ucma: Allow user space to bind to AF_IB RDMA/ucma: Allow user space to pass AF_IB into resolve RDMA/ucma: Allow user space to specify AF_IB when joining multicast RDMA/cma: Export AF_IB statistics Vinit Agnihotri (1): IB/qib: Update minor version number Vu Pham (1): IB/srp: Bump driver version and release date Wei Yongjun (3): IB/ehca: Fix error return code in ehca_create_slab_caches() RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd() IB/core: Fix error return code in add_port() Documentation/ABI/stable/sysfs-driver-ib_srp|7 + MAINTAINERS | 22 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/core/addr.c | 20 +- drivers/infiniband/core/cma.c | 906 ++- drivers/infiniband/core/sa_query.c |6 + drivers/infiniband/core/sysfs.c
Re: [PATCH 03/13] infiniband: use get_unused_fd_flags(0) instead of get_unused_fd()
Thanks, I just applied a patch to convert to get_unused_fd_flags(O_CLOEXEC) in uverbs, since there isn't anything useful that can be done with uverbs fds across an exec. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Revert pinned_vm braindamage
On Thu, Jun 20, 2013 at 7:48 AM, Christoph Lameter wrote: > There is no way that user space can initiate a page pin right now. Perf is > pinning the page from the kernel. Similarly the IB subsystem pins memory > meeded for device I/O. Christoph, your argument would be a lot more convincing if you stopped repeating this nonsense. Sure, in a strict sense, it might be true that the IB subsystem in the kernel is the code that actually pins memory, but given that unprivileged userspace can tell the kernel to pin arbitrary parts of its memory for any amount of time, is that relevant? And in fact taking your "initiate" word choice above, I don't even think your statement is true -- userspace initiates the pinning by, for example, doing an IB memory registration (libibverbs ibv_reg_mr() call), which turns into a system call, which leads to the kernel trying to pin pages. The pages aren't unpinned until userspace unregisters the memory (or causes a cleanup by closing the context fd). Here's an argument by analogy. Would it make any sense for me to say userspace can't mlock memory, because only the kernel can set VM_LOCKED on a vma? Of course not. Userspace has the mlock() system call, and although the actual work happens in the kernel, we clearly want to be able to limit the amount of memory locked by the kernel ON BEHALF OF USERSPACE. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: just chaeking correctness of the infiniband tree
On Thu, Jun 20, 2013 at 5:09 PM, Stephen Rothwell wrote: > I noticed that the infiniband tree is now based on the net-next tree. I > assume that is deliberate? I do have to question how much testing that > tree has had since it is now based on a tree that Dave only released in > the last 24 hours ... That is intentional since there is work coming that relies on net-next prerequisites. The tree hasn't had much testing, but pushing it out a few weeks before the merge window is the way it gets testing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand fixes for 3.10-rc: - qib RCU/lockdep fix - iser device removal fix, plus doc fixes Mike Marciniszyn (1): IB/qib: Fix lockdep splat in qib_alloc_lkey() Or Gerlitz (2): IB/iser: Add Mellanox copyright MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator Roi Dayan (1): IB/iser: Fix device removal flow Roland Dreier (1): Merge branches 'iser' and 'qib' into for-next MAINTAINERS | 10 ++ drivers/infiniband/hw/qib/qib_keys.c | 2 +- drivers/infiniband/ulp/iser/iscsi_iser.c | 1 + drivers/infiniband/ulp/iser/iscsi_iser.h | 1 + drivers/infiniband/ulp/iser/iser_initiator.c | 1 + drivers/infiniband/ulp/iser/iser_memory.c| 1 + drivers/infiniband/ulp/iser/iser_verbs.c | 16 +--- 7 files changed, 24 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus InfiniBand/RDMA changes for the 3.10 merge window: - XRC transport fixes - Fix DHCP on IPoIB - mlx4 preparations for flow steering - iSER fixes - miscellaneous other fixes Sorry for being so late on this -- I moved houses and the system with all my private keys was offline for a week or so. Akinobu Mita (1): RDMA: Rename random32() to prandom_u32() Cong Ding (1): RDMA/cxgb3: Fix uninitialized variable Dotan Barak (1): IB/mlx4: Disable VLAN stripping for RAW PACKET QPs Doug Ledford (1): IPoIB: Fix ipoib_hard_header() return value Eli Cohen (1): IB/mlx4: Set link type for RAW PACKET QPs in the QP context Grant Grundler (1): SRPT: Fix odd use of WARN_ON() Hadar Hen Zion (5): mlx4_core: Move DMFS HW structs to common header file mlx4: Match DMFS promiscuous field names to firmware spec mlx4_core: Change a few DMFS fields names to match firmare spec mlx4_core: Directly expose fields of DMFS HW rule control segment mlx4_core: Expose a few helpers to fill DMFS HW strucutures Jack Morgenstein (1): mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level Mike Marciniszyn (2): IB/ipath: Correct ipath_verbs_register_sysfs() error handling IB/qib: Correct qib_verbs_register_sysfs() error handling Or Gerlitz (2): IB/iser: Return error to upper layers on EAGAIN registration failures IB/iser: Add support for iser CM REQ additional info Roi Dayan (2): IB/iser: Add module version IB/iser: Move informational messages from error to info level Roland Dreier (1): Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next Shlomo Pongratz (3): IB/core: Verify that QP handler is valid before dispatching events mlx4_core: Implement SRQ object lookup from srqn IB/mlx4: Fetch XRC SRQ in the CQ polling code Steve Wise (1): RDMA/iwcm: Don't touch cmid after dropping reference Thadeu Lima de Souza Cascardo (1): RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled drivers/infiniband/core/iwcm.c | 2 + drivers/infiniband/core/verbs.c | 3 +- drivers/infiniband/hw/cxgb3/cxio_resource.c | 4 +- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb4/id_table.c | 4 +- drivers/infiniband/hw/cxgb4/qp.c| 25 ++--- drivers/infiniband/hw/ipath/ipath_verbs.c | 19 ++-- drivers/infiniband/hw/mlx4/cq.c | 21 + drivers/infiniband/hw/mlx4/mad.c| 2 +- drivers/infiniband/hw/mlx4/qp.c | 6 ++ drivers/infiniband/hw/qib/qib_sysfs.c | 6 +- drivers/infiniband/hw/qib/qib_verbs.c | 3 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/iser/iscsi_iser.c| 24 ++--- drivers/infiniband/ulp/iser/iscsi_iser.h| 24 - drivers/infiniband/ulp/iser/iser_memory.c | 3 +- drivers/infiniband/ulp/iser/iser_verbs.c| 36 --- drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 16 ++-- drivers/net/ethernet/mellanox/mlx4/eq.c | 4 +- drivers/net/ethernet/mellanox/mlx4/mcg.c| 120 +++- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 79 drivers/net/ethernet/mellanox/mlx4/srq.c| 15 +++ include/linux/mlx4/device.h | 104 ++-- include/linux/mlx4/srq.h| 2 + 27 files changed, 328 insertions(+), 204 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Apr 17 (infiniband/rdma)
On Wed, Apr 17, 2013 at 11:06 AM, Randy Dunlap wrote: > on x86_64: > > drivers/built-in.o: In function `isert_free_np': > ib_isert.c:(.text+0x6e8a77): undefined reference to `rdma_destroy_id' > drivers/built-in.o: In function `isert_conn_setup_qp': > ib_isert.c:(.text+0x6e9038): undefined reference to `rdma_create_qp' Nic, I think isert needs a "depends on INFINIBAND_ADDR_TRANS" to avoid this. (this is coming from the SCSI target tree, not the InfiniBand/RDMA tree) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines wrote: > Sorry, I was wrong. ignore the comments about cgroups. That's still broken. > (i.e. trying to register RDMA memory while using a cgroup swap limit cause > the process get killed). > > But the GIFT flag patch works (my understanding is that GIFT flag allows the > adapter to transmit stale memory information, it does not have anything to > do with cgroups specifically). The point of the GIFT patch is to avoid triggering copy-on-write so that memory doesn't blow up during migration. If that doesn't work then there's no point to the patch. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines wrote: > I also removed the IBV_*_WRITE flags on the sender-side and activated > cgroups with the "memory.memsw.limit_in_bytes" activated and the migration > with RDMA also succeeded without any problems (both with *and* without GIFT > also worked). Not sure I'm interpreting this correctly. Are you saying that things worked without actually setting the GIFT flag? In which case why are we adding this flag? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Tue, Apr 2, 2013 at 8:51 AM, Michael S. Tsirkin wrote: >> At the moment registering an MR breaks COW. This breaks memory >> overcommit for users such as KVM: we have a lot of COW pages, e.g. >> instances of the zero page or pages shared using KSM. >> >> If the application does not care that adapter sees stale data (for >> example, it tracks writes reregisters and resends), it can use a new >> IBV_ACCESS_GIFT flag to prevent registration from breaking COW. >> >> The semantics are similar to that of SPLICE_F_GIFT thus the name. >> >> Signed-off-by: Michael S. Tsirkin > > Roland, Michael is yet to test this but could you please > confirm whether this looks acceptable to you? The patch itself is reasonable I guess, given the needs of this particular app. I'm not particularly happy with the name of the flag. The analogy with SPLICE_F_GIFT doesn't seem particularly strong and I'm not convinced even the splice flag name is very understandable. But in the RDMA case there's not really any sense in which we're "gifting" memory to the adapter -- we're just telling the library "please don't trigger copy-on-write" and it doesn't seem particularly easy for users to understand that from the flag name. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Small batch of InfiniBand/RDMA fixes for 3.9: - Fix for TX lockup in IPoIB - QLogic -> Intel update for qib driver - Small static checker fix for qib - Fix error path return value in cxgb4 Dan Carpenter (1): IB/ipath: Silence a static checker warning Mike Marciniszyn (1): IPoIB: Fix send lockup due to missed TX completion Roland Dreier (1): Merge branches 'cxgb4', 'ipoib' and 'qib' into for-next Vinit Agnihotri (1): IB/qib: change QLogic to Intel Wei Yongjun (1): RDMA/cxgb4: Fix error return code in create_qp() drivers/infiniband/hw/cxgb4/qp.c | 4 +++- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +- drivers/infiniband/hw/qib/Kconfig | 6 +++--- drivers/infiniband/hw/qib/qib_driver.c| 5 +++-- drivers/infiniband/hw/qib/qib_iba6120.c | 3 ++- drivers/infiniband/hw/qib/qib_init.c | 8 drivers/infiniband/hw/qib/qib_sd7220.c| 4 ++-- drivers/infiniband/hw/qib/qib_verbs.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 8 ++-- firmware/Makefile | 2 +- firmware/{qlogic => intel}/sd7220.fw.ihex | 0 11 files changed, 27 insertions(+), 19 deletions(-) rename firmware/{qlogic => intel}/sd7220.fw.ihex (100%) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rdma: don't make pages writeable if not requiested
On Thu, Mar 21, 2013 at 1:51 AM, Michael S. Tsirkin wrote: >> In that case, no, I don't see any reason for LOCAL_WRITE, since the >> only RDMA operations that will access this memory are remote reads. > > What is the meaning of LOCAL_WRITE then? There are no local > RDMA writes as far as I can see. Umm, it means you're giving the local adapter permission to write to that memory. So you can use it as a receive buffer or as the target for remote data from an RDMA read operation. > OK then what we need is a new flag saying "I really do not > intend to write into this memory please do not break > COW or do anything else just in case I do". Isn't that a shared read-only mapping? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rdma: don't make pages writeable if not requiested
>> I think this change will break the case where userspace tries to >> register an MR with read-only permission, but intends locally through >> the CPU to write to the memory. > Shouldn't it set LOCAL_WRITE then? We're talking about the permissions for the register MR operation, right? (That's what the kernel RDMA driver code that does get_user_pages() sees) In that case, no, I don't see any reason for LOCAL_WRITE, since the only RDMA operations that will access this memory are remote reads. The writing (that triggers COW) is coming from normal process access triggering a page fault, etc. This is a pretty standard way of using RDMA... For example, I allocate some memory and register it for RDMA read (and pass the R_Key to the remote system) with only REMOTE_READ permission. Then I fill in the memory with the results of some computation and the remote system does an RDMA read to get those results. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rdma: don't make pages writeable if not requiested
On Wed, Mar 20, 2013 at 11:18 PM, Michael S. Tsirkin wrote: > core/umem.c seems to get the arguments to get_user_pages > in the reverse order: it sets writeable flag and > breaks COW for MAP_SHARED if and only if hardware needs to > write the page. > > This breaks memory overcommit for users such as KVM: > each time we try to register a page to send it to remote, this > breaks COW. It seems that for applications that only have > REMOTE_READ permission, there is no reason to break COW at all. I proposed a similar (but not exactly the same, see below) patch a while ago: https://lkml.org/lkml/2012/1/26/7 but read the thread, especially https://lkml.org/lkml/2012/2/6/265 I think this change will break the case where userspace tries to register an MR with read-only permission, but intends locally through the CPU to write to the memory. If the memory registration is done while the memory is mapped read-only but has VM_MAYWRITE, then userspace gets into trouble when COW happens. In the case you're describing (although I'm not sure where in KVM we're talking about using RDMA), what happens if you register memory with only REMOTE_READ and then COW is triggered because of a local write? (I'm assuming you don't want remote access to continue to get the old contents of the page) I have to confess that I still haven't had a chance to implement the proposed FOLL_FOLLOW solution to all of this. > If the page that is COW has lots of copies, this makes the user process > quickly exceed the cgroups memory limit. This makes RDMA mostly useless > for virtualization, thus the stable tag. The actual problem description here is a bit too terse for me to understand. How do we end up with lots of copies of a COW page? Why is RDMA registering the memory any more special than having everyone who maps this page actually writing to it and triggering COW? > ret = get_user_pages(current, current->mm, cur_base, > min_t(unsigned long, npages, >PAGE_SIZE / sizeof (struct page > *)), > -1, !umem->writable, page_list, vma_list); > +!umem->writable, 1, page_list, vma_list); The first two parameters in this line being changed are "write" and "force". I think if we do change this, then we need to pass umem->writable (as opposed to !umem->writable) for the "write" parameter. Not sure whether "force" makes sense or not. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Main batch of InfiniBand/RDMA changes for 3.9: - SRP error handling fixes from Bart Van Assche - Implementation of memory windows for mlx4 from Shani Michaeli - Lots of cxgb4 HW driver fixes from Vipul Pandya - Make iSER work for virtual functions, other fixes from Or Gerlitz - Fix for bug in qib HW driver from Mike Marciniszyn - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun Bart Van Assche (4): IB/srp: Track connection state properly IB/srp: Avoid sending a task management function needlessly IB/srp: Avoid endless SCSI error handling loop IB/srp: Fail I/O requests if the transport is offline Dan Carpenter (1): IB/mlx4: Fix bug unwinding on error in mlx4_ib_init_sriov() Itai Garbi (1): IPoIB: Don't attempt to release resources on error flow Julia Lawall (1): IB/mlx4: Adjust duplicate test Mike Marciniszyn (1): IB/qib: Fix QP locate/remove race Or Gerlitz (3): IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML IB/iser: Avoid error prints on EAGAIN registration failures IB/iser: Enable iser when FMRs are not supported Paul Bolle (2): RDMA/cxgb4: "cookie" can stay in host endianness IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable Roland Dreier (3): IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool IPoIB: Free ipoib neigh on path record failure so path rec queries are retried Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next Shani Michaeli (10): IB/mlx4_ib: Remove local invalidate segment unused fields mlx4_core: Rename MPT-related functions to have mpt_ prefix mlx4_core: Propagate MR deregistration failures to caller IB/core: Add "type 2" memory windows support IB/uverbs: Implement memory windows support in uverbs mlx4_core: Disable memory windows for virtual functions mlx4_core: Enable memory windows in {INIT, QUERY}_HCA mlx4: Implement memory windows allocation and deallocation IB/mlx4: Support memory window binding IB/mlx4: Advertise MW support Shlomo Pongratz (1): IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets Stefan Hasko (1): RDMA/cxgb4: Fix cast warning Syam Sidhardhan (1): IB/mlx4: Remove redundant NULL check before kfree Vipul Pandya (11): RDMA/cxgb4: Abort connections that receive unexpected streaming mode data RDMA/cxgb4: Abort connections when moving to ERROR state RDMA/cxgb4: Display streaming mode error only if detected in RTS RDMA/cxgb4: Keep QP referenced until TID released RDMA/cxgb4: Always log async errors RDMA/cxgb4: Only log rx_data warnings if cpl status is non-zero RDMA/cxgb4: Fix endpoint timeout race condition RDMA/cxgb4: Don't reconnect on abort for mpa_rev 1 RDMA/cxgb4: Don't wakeup threads for MPAv2 RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish RDMA/cxgb4: Address sparse warnings Wei Yongjun (1): RDMA/amso1100: Use module_pci_driver() to simplify the code Yan Burman (1): IPoIB: Add version and firmware info to ethtool reporting drivers/infiniband/core/uverbs.h | 2 + drivers/infiniband/core/uverbs_cmd.c | 121 ++ drivers/infiniband/core/uverbs_main.c | 13 +- drivers/infiniband/core/verbs.c| 5 +- drivers/infiniband/hw/amso1100/c2.c| 13 +- drivers/infiniband/hw/cxgb3/iwch_provider.c| 5 +- drivers/infiniband/hw/cxgb3/iwch_qp.c | 15 +- drivers/infiniband/hw/cxgb4/cm.c | 170 +++ drivers/infiniband/hw/cxgb4/device.c | 5 +- drivers/infiniband/hw/cxgb4/ev.c | 8 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 4 +- drivers/infiniband/hw/cxgb4/mem.c | 5 +- drivers/infiniband/hw/cxgb4/qp.c | 1 + drivers/infiniband/hw/ehca/ehca_iverbs.h | 2 +- drivers/infiniband/hw/ehca/ehca_mrmw.c | 5 +- drivers/infiniband/hw/mlx4/mad.c | 7 +- drivers/infiniband/hw/mlx4/main.c | 22 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 18 +- drivers/infiniband/hw/mlx4/mr.c| 87 +- drivers/infiniband/hw/mlx4/qp.c| 49 -- drivers/infiniband/hw/mlx4/sysfs
Re: [PATCH v2] IB/mlx4: silence GCC warning
On Mon, Feb 25, 2013 at 8:54 AM, Roland Dreier wrote: > I'm finally noticing that this is in the build_mlx_header() function, > which is pretty much a slow path. Certainly another compare isn't > going to change performance given all the other stuff we do there. > > Let me look at the patches that have gone by and see what the cleanest > way to handle this is. OK, after playing around a bit, I see that just initializing vlan doesn't really change the generated code (my gcc at least was already if effect setting vlan in the generated assembly code), so I'll just merge that. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] IB/mlx4: silence GCC warning
On Sun, Feb 24, 2013 at 4:34 AM, Jack Morgenstein wrote: > However, this approach does add the line below to processing for an IB port > (ETH/RoCE port stays same, more or less). > Processing time is therefore increased (at least on the IB side) relative to > just living with the warning. > > Roland? I'm finally noticing that this is in the build_mlx_header() function, which is pretty much a slow path. Certainly another compare isn't going to change performance given all the other stuff we do there. Let me look at the patches that have gone by and see what the cleanest way to handle this is. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus IB regression fixes for 3.8: - Fix mlx4 VFs not working on old guests because of 64B CQE changes - Fix ill-considered sparse fix for qib - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1 Mike Marciniszyn (1): IB/qib: Fix for broken sparse warning fix Or Gerlitz (1): mlx4_core: Fix advertisement of wrong PF context behaviour Roland Dreier (1): Merge branches 'ipoib', 'mlx4' and 'qib' into for-next Shlomo Pongratz (1): IPoIB: Fix crash due to skb double destruct drivers/infiniband/hw/qib/qib_qp.c| 11 +++ drivers/infiniband/ulp/ipoib/ipoib_cm.c | 6 +++--- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 6 +++--- drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- 4 files changed, 10 insertions(+), 15 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] printk: Fix incorrect length from print_time() when seconds > 99999
On Sat, Dec 29, 2012 at 12:08 PM, Joe Perches wrote: > Sylvan Munaut did something similar > https://lkml.org/lkml/2012/12/5/168 Missed that and duplicated the debugging :( Sorry Sylvain. I guess my patch may be preferable, since I happened to use the snprintf() method that you suggest -- all the open-coded digit-counting seems a bit verbose and perhaps hard to read and see the equivalence to the sprintf. But certainly Sylvain fixed this quite a bit earlier and he should get credit. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] printk: Fix incorrect length from print_time() when seconds > 99999
On Sat, Dec 29, 2012 at 9:56 AM, Greg Kroah-Hartman wrote: > Nice work. When did you start seeing this problem, 3.6 or so? I ask as > it's probably something that should go to stable as well if so. We happened to see it when we rebased to the 3.6 kernel, but as far as I can see, the bug has been there as long as print_time(), which comes from 084681d14e42 ("printk: flush continuation lines immediately to console") in 3.5-rc5. When I was doing a web search for info on the problem, I found at least the following reports: https://bbs.archlinux.org/viewtopic.php?id=148100 https://lkml.org/lkml/2012/10/17/81 so yeah this seems to be stable material. > Andrew seems to be keeping the printk patches these days, so I'll let > him pick this up with: > > Signed-off-by: Greg Kroah-Hartman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] printk: Fix incorrect length from print_time() when seconds > 99999
From: Roland Dreier print_prefix() passes a NULL buf to print_time() to get the length of the time prefix; when printk times are enabled, the current code just returns the constant 15, which matches the format "[%5lu.%06lu] " used to print the time value. However, this is obviously incorrect when the whole seconds part of the time gets beyond 5 digits (10 seconds is a bit more than a day of uptime). The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual length of the time prefix. This could be micro-optimized but it seems better to have simpler, more readable code here. The bug leads to the syslog system call miscomputing which messages fit into the userspace buffer. If there are enough messages to fill log_buf_len and some have a timestamp >= 10, dmesg may fail with: # dmesg klogctl: Bad address When this happens, strace shows that the failure is indeed EFAULT due to the kernel mistakenly accessing past the end of dmesg's buffer, since dmesg asks the kernel how big a buffer it needs, allocates a bit more, and then gets an error when it asks the kernel to fill it: syslog(0xa, 0, 0) = 1048576 mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000 syslog(0x3, 0x7fa4d25d2010, 0x18) = -1 EFAULT (Bad address) Signed-off-by: Roland Dreier --- kernel/printk.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/printk.c b/kernel/printk.c index 19c0d7b..357f714 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -870,10 +870,11 @@ static size_t print_time(u64 ts, char *buf) if (!printk_time) return 0; + rem_nsec = do_div(ts, 10); + if (!buf) - return 15; + return snprintf(NULL, 0, "[%5lu.00] ", (unsigned long)ts); - rem_nsec = do_div(ts, 10); return sprintf(buf, "[%5lu.%06lu] ", (unsigned long)ts, rem_nsec / 1000); } -- 1.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long Hadar Hen Zion (2): mlx4_core: Add QPN enforcement for flow steering rules set by VFs mlx4_core: Fix error flow in the flow steering wrapper Jack Morgenstein (2): mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV mlx4_core: Allow choosing flow steering mode Roland Dreier (2): IPoIB: Call skb_dst_drop() once skb is enqueued for sending Merge branches 'cxgb4', 'ipoib' and 'mlx4' into for-next Vipul Pandya (5): cxgb4: Add T4 filter support cxgb4: Add LE hash collision bug fix path in LLD driver RDMA/cxgb4: Fix LE hash collision bug for active open connection RDMA/cxgb4: Fix LE hash collision bug for passive open connection RDMA/cxgb4: Fix bug for active and passive LE hash collision path drivers/infiniband/hw/cxgb4/cm.c | 791 ++--- drivers/infiniband/hw/cxgb4/device.c | 210 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 33 + drivers/infiniband/ulp/ipoib/ipoib_cm.c| 3 + drivers/infiniband/ulp/ipoib/ipoib_ib.c| 3 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 136 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 459 +++- drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 23 +- drivers/net/ethernet/chelsio/cxgb4/l2t.c | 32 + drivers/net/ethernet/chelsio/cxgb4/l2t.h | 3 + drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 22 +- drivers/net/ethernet/chelsio/cxgb4/t4_msg.h| 66 ++ drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 37 + drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 418 +++ drivers/net/ethernet/mellanox/mlx4/fw.c| 15 +- drivers/net/ethernet/mellanox/mlx4/fw.h| 1 + drivers/net/ethernet/mellanox/mlx4/main.c | 115 ++- drivers/net/ethernet/mellanox/mlx4/mcg.c | 7 +- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 6 +- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 28 +- drivers/scsi/csiostor/t4fw_api_stor.h | 39 - include/linux/mlx4/device.h| 1 + 22 files changed, 2234 insertions(+), 214 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the infiniband tree
On Wed, Dec 19, 2012 at 2:44 PM, Stephen Rothwell wrote: > Hi all, > > After merging the infiniband tree, today's linux-next build (x86_64_ > allmodconfig) failed like this: > > In file included from drivers/scsi/csiostor/csio_wr.h:42:0, > from drivers/scsi/csiostor/csio_scsi.h:49, > from drivers/scsi/csiostor/csio_init.h:45, > from drivers/scsi/csiostor/csio_attr.c:45: > drivers/scsi/csiostor/t4fw_api_stor.h:43:6: error: nested redefinition of > 'enum fw_retval' > drivers/scsi/csiostor/t4fw_api_stor.h:43:6: error: redeclaration of 'enum > fw_retval' > drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:42:6: note: originally defined > here > drivers/scsi/csiostor/t4fw_api_stor.h:48:2: error: redeclaration of > enumerator 'FW_ENOEXEC' > drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:39:2: note: previous definition > of 'FW_ENOEXEC' was here > drivers/scsi/csiostor/t4fw_api_stor.h:50:2: error: redeclaration of > enumerator 'FW_ENOMEM' > drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:43:2: note: previous definition > of 'FW_ENOMEM' was here > drivers/scsi/csiostor/t4fw_api_stor.h:58:2: error: redeclaration of > enumerator 'FW_EADDRINUSE' > drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:44:2: note: previous definition > of 'FW_EADDRINUSE' was here > > And several others similar. > > Caused by commit f65b56b15931 ("RDMA/cxgb4: Fix LE hash collision bug for > active open connection"). Vipul, is the right fix to pull the full list of FW return values from t4fw_api_stor.h into t4fw_api.h as part of this patch (f65b56b15931)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/