Re: [PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-12-04 Thread Roland Dreier
>  /*
>   * Proper multiline comments look like this not like
>   * the above.
>   */

Got it, will fix next time around.

> That aside. Why are you trying to do heuristics on the delta?
>
> We have way better information than that. The watchdog timer expiry time is
> known and we can determine the exact delay of the timer.
>
> The watchdog clocksource provides the maximum 'idle' time, i.e. the time
> between two reads, in clocksource::max_idle_ns. That value is filled in
> when the clocksource is configured.
>
> So without doing speculation we can make an informed decision:
>
> elapsed = jiffies_to_nsec(jiffies - watchdog_timer->expires) +
>   WATCHDOG_INTERVAL_NS;
>
> if (elapsed > wdcs->max_idle_ns) {
> Skip ..
> }

Yes, that makes more sense than what I was doing, although I'm not
sure on the details.  Just missed that idea.

Why are you adding the watchdog interval to the calculated elapsed
time?  It seems we have an issue exactly if jiffies -
watchdog_timer->expires is too big, without adding the interval we
tried to wait in on top.  Also I think we might want to be careful
that jiffies is >= the expires time - or is it not possible that a
timer fires one jiffy early?

Also for full generality it seems we should check against the
clocksource max_idle_ns as well - for x86 TSC is wider than HPET but
there may be other architectures that could hit the same problem, just
with the clocksource being checked wrapping around instead of the
watchdog clocksource.  Right?

Thanks!
  Roland


[tip:x86/timers] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread tip-bot for Roland Dreier
Commit-ID:  d999c0ec2498e54b9328db6b2c1037710025add1
Gitweb: https://git.kernel.org/tip/d999c0ec2498e54b9328db6b2c1037710025add1
Author: Roland Dreier 
AuthorDate: Fri, 30 Nov 2018 13:14:50 -0800
Committer:  Borislav Petkov 
CommitDate: Tue, 4 Dec 2018 12:17:21 +0100

x86/hpet: Remove unused FSEC_PER_NSEC define

The FSEC_PER_NSEC macro has had zero users since commit

  ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code").

Remove it.

Signed-off-by: Roland Dreier 
Signed-off-by: Borislav Petkov 
Acked-by: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20181130211450.5200-1-rol...@purestorage.com
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8


[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



[PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-11-30 Thread Roland Dreier
On a modern x86 system, the TSC is used as a clocksource, with HPET
used in the clocksource watchdog to make sure that the TSC is stable.

If the clocksource watchdog_timer is delayed for an extremely long
time (for example if softirqs are being serviced in ksoftirqd, and
realtime threads are starving ksoftirqd), then the 32-bit HPET counter
may wrap around.  For example, with an HPET running at 24 MHz, 2^32
cycles is about 179 seconds - a long time for timers to be starved,
but possible with a poorly behaved realtime thread.

If this happens, since the TSC is a 64-bit counter and won't wrap, the
watchdog will detect skew - the TSC interval will be 179 seconds
longer than the HPET interval - and will mark the TSC as unstable.
This causes the system to switch to the HPET as a clocksource, which
has a huge negative performance impact.

In this case, switching to the HPET just makes a bad situation (timers
starved) that the system might recover from turn permanently even
worse (more expensive clock_gettime() calls), due to a spurious false
positive detection of TSC instability.

To improve this, add some heuristics to detect cases where the
watchdog is delayed long enough for the instability detection to be
likely to be wrong:

 - If the clocksource being tested (eg TSC) has counted so many cycles
   that converting to nsecs will overflow multiplication, *AND* the
   watchdog clocksource (eg HPET) shows that the watchdog timer has
   missed its interval by at least a factor of 3, skip marking the
   clocksource as unstable for a timer interation.  This is not
   perfect - for example it is possible for the watchdog clocksource
   to wrap around and show a small interval - but at least in the
   specific x86 it is unlikely, since the watchdog interval is a small
   fraction of the wraparound interval.

 - If there is a skew between the clocksource being tested and the
   watchdog clocksource that is at least as big as the wraparound
   interval for the watchdog clocksource, then don't mark the
   clocksource as unstable.  Again, this might fail to mark a
   clocksource as unstable for one iteration, but it is unlikely that
   the instability is bad enough that we will see a larger skew than
   the wraparound interval for many iterations.

These heuristics are imperfect but are chosen to make false detection
of instability much less likely, while leaving detection of true
instability very likely within a few clocksource watchdog iterations.

Signed-off-by: Roland Dreier 
---
 kernel/time/clocksource.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ffe081623aec..f1b3d8ff2437 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -243,12 +243,47 @@ static void clocksource_watchdog(struct timer_list 
*unused)
 watchdog->shift);
 
delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
+
+   /* If the cycle delta is beyond what we can safely
+* convert to nsecs, and the watchdog clocksource
+* suggests that we've overslept, skip checking this
+* iteration to avoid marking a clocksource as
+* unstable because of a severely delayed timer. */
+   if (delta > cs->max_cycles &&
+   wd_nsec > 3 * jiffies_to_nsecs(WATCHDOG_INTERVAL)) {
+   pr_warn("timekeeping watchdog: Clocksource '%s' not 
checked due to apparent long timer delay:\n",
+   cs->name);
+   pr_warn("  Delta %llx > max_cycles 
%llx, wd_nsec %lld\n",
+   delta, cs->max_cycles, wd_nsec);
+   continue;
+   }
+
cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
wdlast = cs->wd_last; /* save these in case we print them */
cslast = cs->cs_last;
cs->cs_last = csnow;
cs->wd_last = wdnow;
 
+   /* If the clocksource interval is far off from the
+* watchdog clocksource interval but the interval is
+* big enough that the watchdog may have wrapped
+* around (again due to a severely delayed timer),
+* skip this iteration.  For example, this saves us
+* from marking the TSC as unstable just because the
+* 32-bit HPET wrapped around on x86. */
+   if (abs(cs_nsec - wd_nsec) >
+   clocksource_cyc2ns(watchdog->max_cycles, watchdog->mult,
+  watchdog->shift) - WATCHDOG_THRESHOLD) {
+   pr_warn("timekeeping watchdog: Clocksource '%s&#

[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-11-30 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-05 Thread Roland Dreier
> The sensible thing to do in nvme is to use different paths for
> different queues.  That is e.g. in the RDMA case use the HCA closer
> to a given CPU by default.  We might allow to override this for
> cases where the is a good reason, but what I really don't want is
> configurability for configurabilities sake.

That makes sense but I'm not sure it covers everything.  Probably the
most common way to do NVMe/RDMA will be with a single HCA that has
multiple ports, so there's no sensible CPU locality.  On the other
hand we want to keep both ports to the fabric busy.  Setting different
paths for different queues makes sense, but there may be
single-threaded applications that want a  different policy.

I'm not saying anything very profound, but we have to find the right
balance between too many and too few knobs.

 - R.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Roland Dreier
> Moreover, I also wanted to point out that fabrics array vendors are
> building products that rely on standard nvme multipathing (and probably
> multipathing over dispersed namespaces as well), and keeping a knob that
> will keep nvme users with dm-multipath will probably not help them
> educate their customers as well... So there is another angle to this.

As a vendor who is building an NVMe-oF storage array, I can say that
clarity around how Linux wants to handle NVMe multipath would
definitely be appreciated.  It would be great if we could all converge
around the upstream native driver but right now it doesn't look
adequate - having only a single active path is not the best way to use
a multi-controller storage system.  Unfortunately it looks like we're
headed to a world where people have to write separate "best practices"
documents to cover RHEL, SLES and other vendors.

We plan to implement all the fancy NVMe standards like ANA, but it
seems that there is still a requirement to let the host side choose
policies about how to use paths (round-robin vs least queue depth for
example).  Even in the modern SCSI world with VPD pages and ALUA,
there are still knobs that are needed.  Maybe NVMe will be different
and we can find defaults that work in all cases but I have to admit
I'm skeptical...

 - R.


Re: KASAN: use-after-free Read in __list_add_valid (5)

2018-05-15 Thread Roland Dreier
> Still reproducible on Linus' tree (commit 66e1c94db3cd4e) and on linux-next
> (next-20180511).  Here's a simplified reproducer:

Thanks!  That's a fantastic test case.

The issue is a race where rdma_listen() sees invalid state in the
middle of an rdma_bind_addr() call that will ultimately fail.  I'll
send a proposed patch shortly.

 - R.


Re: [Patch v2 00/19] CIFS: Implement SMBDirect

2017-08-29 Thread Roland Dreier
> Starting with SMB2 dialect 3.0, Microsoft introduced SMBDirect transport 
> protocol for transferring upper layer (SMB2) payload over RDMA via 
> Infiniband, RoCE or iWARP. The prococol is published in [MS-SMBD] 
> (https://msdn.microsoft.com/en-us/library/hh536346.aspx).

This is great to see.  Is there a Linux implementation of the server
side (in Samba?) so that the client can be tested without needing a
Windows server?

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Fri, Jul 8, 2016 at 9:51 AM, Jason Gunthorpe
 wrote:
> So, it appears, the dst and neigh can be used for all performances cases.
>
> For the non performance dst == null case, can we just burn cycles and
> stuff the daddr in front of the packet at hardheader time, even if we
> have to copy?

OK, sounds interesting.

Unfortunately the scope of this work has gotten to the point where I
can't take it on right now.  My system is running 4.4.y for now
(before struct skb_gso_cb grew) so I think shrinking struct skb_gso_cb
to 8 bytes plus changing SKB_SGO_CB_OFFSET to 20 will work for now.
Hope someone is able to come up with a real fix before I need to
upgrade to 4.10.y...

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Thu, Jul 7, 2016 at 4:14 PM, Jason Gunthorpe
 wrote:
> We have neighbour_priv, and ndo_neigh_construct/destruct now ..
>
> A first blush that would seem to be enough to let ipoib store the AH
> and other path information in the neigh and avoid the cb? At least the
> example in clip sure looks like what ipoib needs to do.

Do you think those new facilities let us go back to using the neigh
and still avoid the issues that led to commit b63b70d87741 ("IPoIB:
Use a private hash table for path lookup in xmit path")?

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-07 Thread Roland Dreier
>> struct skb_gso_cb {
>> int mac_offset;
>> int encap_level;
>> __u16   csum_start;
>> };

> This is based on an out-dated version of this struct.  The 4.7 RC
> kernel has a few more fields that were added to support local checksum
> offload for encapsulated frames.

Thanks for pointing that out.  I hit the perf regression on 4.4.y
(stable) and looked at the struct there.  I see that latest upstream
has changed, and I agree that this struct really can't shrink below 10
bytes.

Since IP needs 20 bytes, GSO needs 10 bytes and IPoIB needs 20 bytes,
we're 2 bytes over the 48 that are available in cb[].  So this is
harder to fix than just changing skb_gso_cb and SKB_SGO_CB_OFFSET
unfortunately.

>> What is the best way to keep the crash fix but not kill IPoIB performance?
>
> It seems like what would probably need to happen is to move where the
> IPoIB address is stored.  I'm not sure the control buffer is really
> the best place for it since the cb gets overwritten at various levels,
> and storing 20 bytes makes it hard to avoid bumping up against the
> size restrictions of the buffer.  Seeing as how the IPoIB hwaddr is
> generated around the same time we generate the L2 header for the
> frame, I wonder if you couldn't get away with using a bit of extra skb
> headroom to store it and then use a offset from the MAC header to
> access it.  An added bonus would be that with a few tricks with
> SKB_GSO_CB(skb)->mac_offset you might even be able to set things up so
> that you copy the hwaddr when you copy the header for each fragment
> instead of having to go and copy the hwaddr out of the cb and clone it
> for each frame.

Can we assume there are 20 bytes of skb headroom available?  What if
we're forwarding an skb received on an Ethernet device?

The reason we moved to the cb storage is that in the past, trying to
hide some data in the actual skb buffer that we don't actually send
led to some awkward-at-best code.  (As I recall GRO was difficult to
handle before commit 936d7de3d736 "IPoIB: Stop lying about
hard_header_len and use skb->cb to stash LL addresses")  But maybe
there's a third way to handle this other than the old way and the
skb->cb way.

 - R.


Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-06 Thread Roland Dreier
On Thu, Jan 7, 2016 at 3:00 AM, Konstantin Khlebnikov  wrote:
> Or just shift GSO CB and add couple checks like
> BUILD_BUG_ON(sizeof(SKB_GSO_CB(skb)->room) < sizeof(*IPCB(skb)));

Resurrecting this old thread, because the patch that ultimately went
upstream (commit 9207f9d45b0a / net: preserve IP control block during
GSO segmentation) causes a huge IPoIB performance regression (to the
point of being unusable):
https://bugzilla.kernel.org/show_bug.cgi?id=111921

I don't think anyone has explained what goes wrong or why IPoIB works
the way it does.  The underlying difference that IPoIB has from other
drivers is that there are two levels of address resolution.  First,
normal ARP / ND resolves an IP address to a "hardware" address.  The
difference is that in IPoIB, the hardware address is an IB GID (plus a
QPN, but we can ignore that).  To actually send data to that GID, the
IPoIB driver has to do a second lookup - it needs to ask the IB subnet
manager for a path record that tells it how to reach that GID.

In particular this means that "destination address" (as the IP / ARP
layer understands it) actually isn't in the packet anywhere - there's
nothing like an ethernet header as there is for "normal" network
drivers.  Instead, the driver stashes the address in skb->cb during
hard_header_ops->create() and then looks at it in the xmit routine -
this was designed way back around when commit a0417fa3a18a / net: Make
qdisc_skb_cb upper size bound explicit. was merged.  The expectation
was that the part of the cb after sizeof (struct qdisc_skb_cb) would
be preserved.

The problem with commit 9207f9d45b0a is that GSO operations now access
cb after SKB_SGO_CB_OFFSET==32, which lands right in the middle of
where IPoIB stashes its hwaddr.

It seems that the intent of the commit is to preserve the IP control
block - struct inet_skb_parm (and presumably struct inet6_skb_parm) -
even when using SKB_GSO_CB().  Seems like both inet_skb_parm and
inet6_skb_parm are 20 bytes.  IPoIB uses the part of cb after 28
bytes, so if we could squeeze struct skb_gso_cb down to 8 bytes and
set SKB_SGO_CB_OFFSET to 20, then everything would work.  The struct
is

struct skb_gso_cb {
int mac_offset;
int encap_level;
__u16   csum_start;
};

is it feasible to make encap_level a __u16 (which would make the
overall struct exactly 8 bytes)?  If I understand this correctly, 64K
nested encapsulations seems like quite a bit for a packet...

Or, earlier in this thread, having the GSO in ip_output and other gso
paths save and restore the IP/IP6 control block was suggested as an
alternate approach.  I don't know if there are performance
implications to that.

What is the best way to keep the crash fix but not kill IPoIB performance?

Thanks!
 - R.


[PATCH] iommu/vt-d: Don't reject NTB devices due to scope mismatch

2016-06-02 Thread Roland Dreier
From: Roland Dreier 

On a system with an Intel PCIe port configured as an NTB device, iommu
initialization fails with

DMAR: Device scope type does not match for :80:03.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):

[0A0h 0160   1]  Device Scope Entry Type : 02
[0A1h 0161   1] Entry Length : 08
[0A2h 0162   2] Reserved : 
[0A4h 0164   1]   Enumeration ID : 00
[0A5h 0165   1]   PCI Bus Number : 80

[0A6h 0166   2] PCI Path : 03,00

but the device has a type 0 PCI header:

80:03.0 Bridge [0680]: Intel Corporation Device [8086:2f0d] (rev 02)
00: 86 80 0d 2f 00 00 10 00 02 00 80 06 10 00 80 00
10: 0c 00 c0 00 c0 38 00 00 0c 00 00 00 80 38 00 00
20: 00 00 00 c8 00 00 10 c8 00 00 00 00 86 80 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00

VT-d works perfectly on this system, so there's no reason to bail out
on initialization due to this apparent scope mismatch.  Use the class
0x0680 ("Other bridge device") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: Roland Dreier 
---
 drivers/iommu/dmar.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6a86b5d1defa..2eff7b6c6c98 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -241,8 +241,20 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info 
*info,
if (!dmar_match_pci_path(info, scope->bus, path, level))
continue;
 
-   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^
-   (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL)) {
+   /*
+* We expect devices with endpoint scope to have normal PCI
+* headers, and devices with bridge scope to have bridge PCI
+* headers.  However PCI NTB devices may be listed in the
+* DMAR table with bridge scope, even though they have a
+* normal PCI header.  NTB devices are identified by class
+* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch
+* for this special case.
+*/
+   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
+info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
+   (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
+(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
return -EINVAL;
-- 
2.7.4



Re: Regression in IO resource allocation

2016-06-01 Thread Roland Dreier
On Tue, May 31, 2016 at 3:31 PM, Rafael J. Wysocki  wrote:
> It may not be called at all if _PTC is used on that system, for example.

Yes, that's exactly the case on my system.

So from my POV:

Tested-by: Roland Dreier 

Thanks!


Re: Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
On Tue, May 31, 2016 at 2:11 PM, Rafael J. Wysocki  wrote:
> Can you please try the appended patch (untested)?

Thanks for the quick reply.  Patch looks OK on my system... it boots
(which is very good :) and I see

system 00:01: [io  0x0400-0x047f] has been reserved

however I don't see the "ACPI CPU throttle" region reserved in
/proc/ioports... haven't debugged why acpi_processor_get_throttling()
isn't getting called or what is happening yet.

Will dig a bit deeper and let you know.

 - R.


Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
Hi,

I recently updated one of my systems from 3.10.y to 4.4.11, and
discovered a regression that stops it from booting.  It's actually
very similar to https://bugzilla.kernel.org/show_bug.cgi?id=99831
(which I reported about the same system last year).

The problem is that commit ac212b6980d8 ("ACPI / processor: Use common
hotplug infrastructure") changes the order that the ACPI processor and
PnP initialization run.  pnp_system_init() is run at fs_initcall time,
while acpi_processor_init() is run from acpi_scan_init(), earlier at
subsys_initcall time.  Pre-ac212b6980d8, the ACPI processor
initialization all ran from acpi_processor_init() at module_init time.
So the processor driver initialization has flipped from after to
before pnp_system_init().

Just as before, the failure is that the resource allocation code puts
some AHCI IO BARs around 0x400, and reservation fails because some
other ACPI stuff is also there.  The problem is that when acpi_processor_init()
runs, it reserves a range 0x410 - 0x415 for "ACPI CPU throttle", and
if that happens before pnp_system_init(), then I get

system 00:01: [io  0x0400-0x047f] could not be reserved

because that overlaps the already-reserved range.  Then the PCI
resource allocation code is free to put PCI resources into that range
and tons of things go south after that.

For now I've worked around it by commenting out the request_region()
in acpi_processor.c but that doesn't seem like a very good long-term
solution.  Does it make sense to resurrect the patches you had to let
ACPI and PnP coexist in resource reservation?  Or could we move the
request_region() for CPU throttle into the still-modular
initialization done from acpi_processor_driver_init()?

Thanks!
  Roland


Re: Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
On Mon, Oct 19, 2015 at 10:00 AM, Yinghai Lu  wrote:
> I would suggest to expand standard_io_resources[] to include all
> possible conflict that we should avoid, like the io port for serial and 
> cf8/cf9.
>
> Then we could just set PCIBIOS_MIN_IO to 0 for x86.

That would work on my system, which is a well-behaved standard server.
But I thought the issue was weird vendor-specific stuff (Sony
laptops?) where there are undocumented nonstandard IO resources that
also aren't reserved in ACPI?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
I recently ran into an interesting issue with IO space allocation, and
I'm looking for opinions on whether this is a BIOS issue, a kernel
issue, both, or neither ;)

What happened is that a BIOS update for my system changed the DSDT
from having three ranges in PCI0:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x03AF, // Range Maximum
0x, // Translation Offset
0x03B0, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03E0, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0918, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03B0, // Range Minimum
0x03DF, // Range Maximum
0x, // Translation Offset
0x0030, // Length
,, , TypeStatic)

to a single range:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0CF8, // Length
,, , TypeStatic)

Naively it seems like this shouldn't make a difference, since in the
end we've covered the space 0...0xCF7.  However because of the code

min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;

/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
 IORESOURCE_PREFETCH,
 pcibios_align_resource, dev);

in pci_bus_alloc_resource(), the single range ultimately means we end
up running out of IO space for our devices (we have various devices
asking for IO space as well as quite a few downstream PCI switch ports
that get allocated IO space).

What happens is that PCIBIOS_MIN_IO is 0x1000, so that code means with
the new BIOS we can't allocate any IO in the range 0...0xCF7; with the
old BIOS we only ruled out the range 0...0x3AF and happily put small
IO resources (for SMBus controller devices etc) at places like 0x480 etc.

Looking at the code and history, I see that the code with PCIBIOS_MIN_IO
is there to deal with systems where not all resources are declared
and the kernel might accidentally allocate something that clashes with
strange hardware.  However in my case I'm pretty confident there isn't
anything in the range we used to use (since my system didn't blow up,
and I know there isn't any weird proprietary stuff anyway).

Would it make sense to change the kernel to reduce PCIBIOS_MIN_IO in
my case?  I could make it generic and send it upstream, or just hack
it locally.  Or (given my ignorance of ACPI in the real world) is this
a broken BIOS change that I should ask my BIOS vendor to revert?
Or... ?

Thanks!
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] target/iscsi: fix digest computation for chained SGs

2015-07-21 Thread Roland Dreier
On Tue, Jul 21, 2015 at 1:57 AM, Sagi Grimberg  wrote:
> How were you able to get a chained SG list in the target code?

Local hack.  So this bug can't be hit in current mainline code, but
patch improves the code and removes a hidden booby-trap, so I think it
makes sense to apply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-15 Thread Roland Dreier
On Sat, Jun 13, 2015 at 9:56 AM, Roland Dreier  wrote:
> Below is a more sophisticated, so to speak, version of it with a changelog and
> all.  It works for me, but more testing would be much appreciated.

Yes, the patch works as expected:

Tested-by: Roland Dreier 


It does change /proc/ioports heirarchy to

  0400-0403 : ACPI PM1a_EVT_BLK
  0404-0405 : ACPI PM1a_CNT_BLK
  0406-0407 : pnp 00:06
  0408-040b : ACPI PM_TMR
  040c-041f : pnp 00:06
0410-0415 : ACPI CPU throttle
  0420-042f : ACPI GPE0_BLK
  0430-044f : pnp 00:06
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
  0450-0450 : ACPI PM2_CNT_BLK
  0451-047f : pnp 00:06
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

where the old kernel had

  0400-047f : pnp 00:06
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

but I don't think that matters.

Thanks,
 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-13 Thread Roland Dreier
On Fri, Jun 12, 2015 at 7:52 PM, Rafael J. Wysocki  wrote:
> Below is a more sophisticated, so to speak, version of it with a changelog and
> all.  It works for me, but more testing would be much appreciated.

Great, I'm convinced by your reasoning that this makes sense.  I'm
building 3.10.80 patched with this (needed a tiny bit of context
adjustment because acpi_dev_filter_resource_type() hadn't been added
to 3.10 yet), and will confirm that it fixes the issue I saw.

Thanks!
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-12 Thread Roland Dreier
On Thu, Jun 11, 2015 at 1:50 PM, Rafael J. Wysocki  wrote:
> Changing the ordering between those two routines would work around that 
> problem,
> but in my view that wouldn't be a proper fix.  In fact, the role of 
> reserve_range()
> is to reserve the resources so as to prevent them from being used going 
> forward,
> so they need not be reserved each in one piece.  Instead, we can just check 
> if they
> overlap with the ones reserved by acpi_reserve_resources() and only request 
> the
> non-overlapping parts of them to avoid conflicts.
>
> So I wonder if the patch below makes any difference?

I will give this a try and make sure it fixes my system, although I'm
pretty sure it will.

However I'm not sure I agree that this is a better fix than just
having pnp reserve ranges before acpi.  It already creates a special
relationship between pnp and acpi, and acpi_reserve_region is a bunch
of extra code.  Could we really have a system where the hierarchy of
acpi being a subset of a pnp bus doesn't work?  I looked at a few
other systems I have, and things like the following seem quite common:

supermicro:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0400-0453 : pnp 00:0c
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK

dell:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0800-087f : pnp 00:06
0800-0803 : ACPI PM1a_EVT_BLK
0804-0805 : ACPI PM1a_CNT_BLK
0808-080b : ACPI PM_TMR
0810-0815 : ACPI CPU throttle
0820-082f : ACPI GPE0_BLK
0830-0833 : iTCO_wdt
  0830-0833 : iTCO_wdt
0850-0850 : ACPI PM2_CNT_BLK
0860-087f : iTCO_wdt
  0860-087f : iTCO_wdt

but I wasn't able to find anything that required more generality...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-11 Thread Roland Dreier
On Wed, Jun 10, 2015 at 4:23 PM, Rafael J. Wysocki  wrote:
> Can you please file a bug at bugzilla.kernel.org to track this and attach
> the output of acpidump from the affected system in there?

Done: https://bugzilla.kernel.org/show_bug.cgi?id=99831

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
On Tue, Jun 9, 2015 at 4:43 PM, Roland Dreier  wrote:
> I understand that the change here fixed another regression, but I'm
> wondering if there's a way to make everyone happy here?  I can provide
> debugging info from my system as required...

Maybe sent my mail too quickly, as I have some thoughts after looking
at the code.

>From the link order, drivers/acpi init wll be called before
drivers/pnp init, right?  In my case, the acpi resources ("ACPI
PM1a_EVT_BLK") etc are under a pnp bus.  But if acpi requests the
resources first, then pnp can't request the enclosing range.

Is the right fix to make sure the pnp init happens before acpi
requests resources?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
Hi, I recently updated from 3.10.79 to 3.10.80, and my system wouldn't
boot any more.  I tracked this down to commit 92c934b10ec3 ("ACPI /
init: Fix the ordering of acpi_reserve_resources()").  With that
commit reverted, my system is OK again.

What happens is that ahci fails to initialize because
pcim_iomap_regions_request_all() fails with EBUSY, due to a resource
conflict on the first IO region of the ahci device.  Since my root
device is on ahci, that's the end of that.  I'm sure this is due to a
BIOS / ACPI table bug on my particular platform, but that's scant
comfort when the system won't boot :)

I patched 3.10.80 so that ahci continues to initialize after the
EBUSY, and relevant parts of the kernel log seem to be:

[3.836643,26] system 00:06: [io  0x0400-0x047f] could not be reserved
...
[3.844112,26] pci :00:1f.2: BAR 0: assigned [io  0x0410-0x0417]
...
[6.020040,00] ahci :00:1f.2: BAR 0: can't reserve [io  0x0410-0x0417]

and /proc/ioports shows

0410-0415 : ACPI CPU throttle

So if I'm understanding properly, for some reason we discover but fail
to reserve the region with the ACPI resources, then PCI decides to
assign ahci IO ports into that range, then ACPI loads and reserves
0x0410-0x0415, and then ahci fails to load.

If I fully revert the patch, then I see

[3.853857,08] system 00:06: [io  0x0400-0x047f] has been reserved
...
[3.861806,08] pci :00:1f.2: BAR 0: assigned [io  0x0820-0x0827]

We're able to reserve the range, and then PCI assigns ahci into a
non-conflicting range.

I understand that the change here fixed another regression, but I'm
wondering if there's a way to make everyone happy here?  I can provide
debugging info from my system as required...

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-04-22 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 4.1:
 - IPoIB fixes from Doug Ledford and Erez Shitrit
 - iSER updates from Sagi Grimberg
 - mlx4 GUID handling changes from Yishai Hadas
 - other misc fixes


Bart Van Assche (1):
  IB/srp: Use P_Key cache for P_Key lookups

Doug Ledford (11):
  IB/ipoib: factor out ah flushing
  IB/ipoib: change init sequence ordering
  IB/ipoib: Consolidate rtnl_lock tasks in workqueue
  IB/ipoib: Make the carrier_on_task race aware
  IB/ipoib: Use dedicated workqueues per interface
  IB/ipoib: No longer use flush as a parameter
  IB/ipoib: fix MCAST_FLAG_BUSY usage
  IB/ipoib: deserialize multicast joins
  IB/ipoib: drop mcast_mutex usage
  ib_srpt: convert printk's to pr_* functions
  Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 
'srp' into for-4.1

Erez Shitrit (6):
  IB/ipoib: Use one linear skb in RX flow
  IB/ipoib: Update broadcast record values after each successful join 
request
  IB/ipoib: Handle QP in SQE state
  IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's
  IB/ipoib: Remove IPOIB_MCAST_RUN bit
  IB/mlx4: Fix WQE LSO segment calculation

Honggang LI (1):
  mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit 
architectures

Sagi Grimberg (18):
  IB/iser: Fix unload during ep_poll wrong dereference
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Bump version to 1.6
  IB/iser: Rewrite bounce buffer code path

Sebastian Ott (1):
  infiniband/mlx4: check for mapping error

Selvin Xavier (1):
  MAINTAINERS: Adding list of maintainers for ocrdma

Stephen Hemminger (1):
  rdma: replace deprecated ifconfig in doc

Sébastien Dugué (1):
  ib_uverbs: Fix pages leak when using XRC SRQs

Yann Droneaud (2):
  IB/core: disallow registering 0-sized memory region
  IB/core: don't disallow registering region starting at 0x0

Yishai Hadas (9):
  IB/mlx4: Alias GUID adding persistency support
  net/mlx4_core: Manage alias GUID per VF
  net/mlx4_core: Set initial admin GUIDs for VFs
  IB/mlx4: Manage admin alias GUID upon admin request
  IB/mlx4: Change init flow to request alias GUIDs for active VFs
  IB/mlx4: Request alias GUID on demand
  net/mlx4_core: Raise slave shutdown event upon FLR
  net/mlx4_core: Return the admin alias GUID upon host view request
  IB/mlx4: Change alias guids default to be host assigned

 Documentation/filesystems/nfs/nfs-rdma.txt |   9 +-
 MAINTAINERS|   9 +
 drivers/infiniband/core/umem.c |   7 +-
 drivers/infiniband/core/uverbs_main.c  |  22 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c| 457 +-
 drivers/infiniband/hw/mlx4/mad.c   |   9 +
 drivers/infiniband/hw/mlx4/main.c  |  26 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  14 +-
 drivers/infiniband/hw/mlx4/qp.c|   7 +-
 drivers/infiniband/hw/mlx4/sysfs.c |  44 +-
 drivers/infiniband/ulp/ipoib/ipoib.h   |  31 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 195 
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  73 ++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 520 ++--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  44 +-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |  66 +--
 drivers/infiniband/ulp/iser/iser_initiator.c   |  66 ++-
 drivers/infiniband/ulp/iser/iser_memory.c  | 523 -
 drivers/infiniband/ulp/iser/iser_verbs.c   | 220 +++--
 drivers/infiniband/ulp/srp/ib_srp.c|   9 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  | 188 
 drive

Re: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib

2015-04-16 Thread Roland Dreier
On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe
 wrote:
>> We can give client->add() callback a return value and make
>> ib_register_device() return -ENOMEM when it failed, just wondering
>> why we don't do this at first, any special reason?

> No idea, but having ib_register_device fail and unwind if a client
> fails to attach makes sense to me.

It seems a bit unfriendly to fail an entire device if one ULP has a
problem.  Let's say you have a system whose main network connection is
IPoIB.  Would you want that connection to come up even if, say, the
NFS/RDMA server fails to find the memory registration type it likes?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-04-02 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One 4.0 RDMA change:
 - Fix for exploitable integer overflow in uverbs interface.


Shachar Raindel (1):
  IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

 drivers/infiniband/core/umem.c | 8 
 1 file changed, 8 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-20 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA changes for 3.20 merge window:
 - Re-enable on-demand paging changes with stable ABI
 - Fairly large set of ocrdma HW driver fixes
 - Some qib HW driver fixes
 - Other miscellaneous changes


Andreea-Cristina Bernat (2):
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_qp.c
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_keys.c

Ariel Nahum (1):
  IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called

Bart Van Assche (1):
  MAINTAINERS: Update SRP initiator entry

Dan Carpenter (2):
  IB/mlx5: Fix error code in get_port_caps()
  RDMA/ocrdma: Fix off by one in ocrdma_query_gid()

Devesh Sharma (4):
  RDMA/ocrdma: Report correct count of interrupt vectors while registering 
ocrdma device
  RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE
  RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac
  RDMA/ocrdma: set vlan present bit for user AH

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (3):
  IB/core: Properly handle registration of on-demand paging MRs after dereg
  IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
  IB/mlx5: Enable the ODP capability query verb

Hariprasad S (2):
  RDMA/cxgb4: Serialize CQ event upcalls with CQ destruction
  RDMA/cxgb4: Don't hang threads forever waiting on WR replies

Ilya Nelkenbaum (1):
  IB/core: When marshaling ucma path from user-space, clear unused fields

Jack Morgenstein (1):
  IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian order

Majd Dibbiny (3):
  IB/mlx4: Fix memory leak in __mlx4_ib_modify_qp
  IB/mlx4: Bug fixes in mlx4_ib_resize_cq
  IB/mlx5: Update the dev in reg_create

Mike Marciniszyn (3):
  IB/qib: Fix sizeof checkpatch warnings
  IB/qib: Fix checkpatch warnings
  IB/qib: Add blank line after declaration

Mitesh Ahuja (7):
  RDMA/ocrdma: Add support for IB stack compliant stats in sysfs.
  RDMA/ocrdma: Increase the GID table size.
  RDMA/ocrdma: Move PD resource management to driver.
  RDMA/ocrdma: Host crash on destroying device resources
  RDMA/ocrdma: Add support for interrupt moderation
  RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure
  RDMA/ocrdma: Update the ocrdma module version string

Mitko Haralanov (1):
  IB/qib: Do not write EEPROM

Moshe Lazer (1):
  IB/core: Fix deadlock on uverbs modify_qp error flow

Or Gerlitz (1):
  IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detach

Padmanabh Ratnakar (1):
  RDMA/ocrdma: Report correct state in ibv_query_qp

Rasmus Villemoes (2):
  RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit
  RDMA/ocrdma: Use unsigned for bit index

Rickard Strandqvist (1):
  IB/ipath: Remove unused function in ipath_wc_ppc64

Roi Dayan (1):
  IB/iser: Use correct dma direction when unmapping SGs

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 
'qib' and 'srp' into for-next

Sagi Grimberg (1):
  IB/iser: Fix memory regions possible leak

Selvin Xavier (2):
  RDMA/ocrdma: Debugfs enhancments for ocrdma driver
  RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the 
QP

Vinit Agnihotri (1):
  IB/qib: Add support for the new QMH7360 card

 MAINTAINERS   |   2 +-
 drivers/infiniband/core/ucma.c|   3 +
 drivers/infiniband/core/umem_odp.c|   3 +-
 drivers/infiniband/core/uverbs.h  |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 158 +
 drivers/infiniband/core/uverbs_main.c |   1 +
 drivers/infiniband/hw/cxgb4/ev.c  |   9 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  29 ++-
 drivers/infiniband/hw/ipath/ipath_kernel.h|   3 -
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c  |  13 --
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c |  15 --
 drivers/infiniband/hw/mlx4/cm.c   |   2 +-
 drivers/infiniband/hw/mlx4/cq.c   |   7 +-
 drivers/infiniband/hw/mlx4/main.c |  10 +-
 drivers/infiniband/hw/mlx4/qp.c   |   6 +-
 drivers/infiniband/hw/mlx5/main.c |   4 +-
 drivers/infiniband/hw/mlx5/mr.c   |   1 +
 drivers/infiniband/hw/ocrdma/ocrdma.h |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h  |   6 +
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c  | 312 ++
 driver

Re: linux-next: build failure after merge of the infiniband tree

2015-02-17 Thread Roland Dreier
On Tue, Feb 17, 2015 at 6:32 PM, Stephen Rothwell  wrote:
> After merging the livepatching tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
>
> In file included from drivers/infiniband/hw/qib/qib_cq.c:41:0:
> drivers/infiniband/hw/qib/qib.h: In function 'qib_flush_wc':
> drivers/infiniband/hw/qib/qib.h:1470:1: error: expected ';' before '}' token
>  }
>  ^
>
> and it went badly down hill from there :-(


Weird, I could have sworn I fixed that before I pushed the tree out.
Anyway I'll try adding the missing ';' again and push it out again :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] IB/mthca: remove deprecated use of pci api

2015-02-17 Thread Roland Dreier
On Wed, Feb 4, 2015 at 6:09 AM, Quentin Lambert
 wrote:
> -   dev->eq_table.icm_dma  = pci_map_page(dev->pdev, 
> dev->eq_table.icm_page, 0,
> - PAGE_SIZE, 
> PCI_DMA_BIDIRECTIONAL);
> -   if (pci_dma_mapping_error(dev->pdev, dev->eq_table.icm_dma)) {
> +   dev->eq_table.icm_dma  = dma_map_page(&dev->pdev->dev,
> + dev->eq_table.icm_page, 0,
> + PAGE_SIZE,
> + (enum 
> dma_data_direction)PCI_DMA_BIDIRECTIONAL);

Surely this can't be right?  Shouldn't the direction just change to
DMA_BIDIRECTIONAL?

Are we really sweeping through the kernel and getting rid of pci_map_
etc. calls?

If so please respin your semantic patch so that it doesn't add crazy stuff like

(enum dma_data_direction)PCI_DMA_BIDIRECTIONAL

and resend the change.

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-06 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One more last-second RDMA change for 3.19:
 - Yann realized that the previous revert of new userspace ABI did not
   go far enough, and we're still exposing a change that we don't want.
   Revert even closer to 3.18 interface to make sure we get things right
   in the long run.

Sorry for sending this at the very end of the release cycle, but we
didn't realize the scope of the required fix until just now.


Yann Droneaud (1):
  Revert "IB/core: Add support for extended query device caps"

 drivers/infiniband/core/uverbs.h |   1 -
 drivers/infiniband/core/uverbs_cmd.c | 137 +++
 drivers/infiniband/hw/mlx5/main.c|   2 -
 include/rdma/ib_verbs.h  |   5 +-
 include/uapi/rdma/ib_user_verbs.h|  27 ---
 5 files changed, 42 insertions(+), 130 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Last minute InfiniBand/RDMA changes for 3.19:
 - Revert IPoIB driver back to 3.18 state.  We had a number of fixes go
   into 3.19, but they introduced regressions.  We tried to get everything
   fixed up but ran out of time, so we'll try again for 3.20.
 - Similarly, turn off the new "extended query port" verb.  Late in the
   cycle we realized the ABI is not quite right, and rather than freeze
   something in a rush and make a mistake, we'll take a bit more time
   and get it right in 3.20.


Haggai Eran (1):
  IB/core: Temporarily disable ex_query_device uverb

Roland Dreier (9):
  Revert "IPoIB: No longer use flush as a parameter"
  Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue"
  Revert "IPoIB: Use dedicated workqueues per interface"
  Revert "IPoIB: change init sequence ordering"
  Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race"
  Revert "IPoIB: fix MCAST_FLAG_BUSY usage"
  Revert "IPoIB: Make the carrier_on_task race aware"
  Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue"
  Merge branches 'ipoib' and 'odp' into for-next

 drivers/infiniband/core/uverbs_main.c  |   1 -
 drivers/infiniband/ulp/ipoib/ipoib.h   |  19 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  27 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  49 ++---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 +
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  22 +--
 7 files changed, 134 insertions(+), 241 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-12-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main batch of InfiniBand/RDMA changes for 3.19:

 - On-demand paging support in core midlayer and mlx5 driver.  This lets
   userspace create non-pinned memory regions and have the adapter HW
   trigger page faults.
 - iSER and IPoIB updates and fixes.
 - Low-level HW driver updates for cxgb4, mlx4 and ocrdma.
 - Other miscellaneous fixes.


Ariel Nahum (2):
  IB/iser: Collapse cleanup and disconnect handlers
  IB/iser: Fix possible NULL derefernce ib_conn->device in session_create

Devesh Sharma (1):
  RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs

Doug Ledford (8):
  IPoIB: Consolidate rtnl_lock tasks in workqueue
  IPoIB: Make the carrier_on_task race aware
  IPoIB: fix MCAST_FLAG_BUSY usage
  IPoIB: fix mcast_dev_flush/mcast_restart_task race
  IPoIB: change init sequence ordering
  IPoIB: Use dedicated workqueues per interface
  IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
  IPoIB: No longer use flush as a parameter

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (14):
  IB/mlx5: Remove per-MR pas and dma pointers
  IB/mlx5: Enhance UMR support to allow partial page table update
  IB/core: Replace ib_umem's offset field with a full address
  IB/core: Add umem function to read data from user-space
  IB/mlx5: Add function to read WQE from user-space
  IB/core: Implement support for MMU notifiers regarding on demand paging 
regions
  mlx5_core: Add support for page faults events and low level handling
  IB/mlx5: Implement the ODP capability query verb
  IB/mlx5: Changes in memory region creation to support on-demand paging
  IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
  IB/mlx5: Page faults handling infrastructure
  IB/mlx5: Handle page faults
  IB/mlx5: Add support for RDMA read/write responder page faults
  IB/mlx5: Implement on demand paging by adding support for MMU notifiers

Hariprasad S (1):
  RDMA/cxgb4: Handle NET_XMIT return codes

Hariprasad Shenai (2):
  RDMA/cxgb4: Fix locking issue in process_mpa_request
  RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices

Jack Morgenstein (2):
  IB/core: Fix mgid key handling in SA agent multicast data-base
  IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr

Max Gurtovoy (1):
  IB/iser: Fix possible SQ overflow

Minh Tran (1):
  IB/iser: Re-adjust CQ and QP send ring sizes to HW limits

Mitesh Ahuja (1):
  RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs

Moni Shoua (1):
  IB/core: Do not resolve VLAN if already resolved

Or Gerlitz (1):
  IB/iser: Bump version to 1.5

Or Kehati (1):
  IB/addr: Improve address resolution callback scheduling

Pramod Kumar (2):
  RDMA/cxgb4: Increase epd buff size for debug interface
  RDMA/cxgb4: Configure 0B MRs to match HW implementation

Roland Dreier (2):
  mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' 
and 'srp' into for-next

Sagi Grimberg (13):
  IB/iser: Fix catastrophic error flow hang
  IB/iser: Decrement CQ's active QPs accounting when QP creation fails
  IB/iser: Fix sparse warnings
  IB/iser: Fix race between iser connection teardown and scsi TMFs
  IB/iser: Terminate connection before cleaning inflight tasks
  IB/iser: Centralize memory region invalidation to a function
  IB/iser: Remove redundant is_mr indicator
  IB/iser: Use more completion queues
  IB/iser: Micro-optimize iser logging
  IB/iser: Micro-optimize iser_handle_wc
  IB/iser: DIX update
  IB/core: Add flags for on demand paging support
  IB/srp: Allow newline separator for connection string

Shachar Raindel (1):
  IB/core: Add support for on demand paging regions

Steve Wise (1):
  RDMA/cxgb4: Wake up waiters after flushing the qp

Yuval Shaia (1):
  mlx4_core: Check for DPDP violation only when DPDP is not supported

 drivers/infiniband/Kconfig |  11 +
 drivers/infiniband/core/Makefile   |   1 +
 drivers/infiniband/core/addr.c |   4 +-
 drivers/infiniband/core/multicast.c|  11 +-
 drivers/infiniband/core/umem.c |  72 ++-
 drivers/infiniband/core/umem_odp.c | 668 +
 drivers/infiniband/core/umem_rbtree.c  |  94 +++
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   | 171 --
 drivers/infiniband/core/u

Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:56 PM, Roland Dreier  wrote:
> I'll add a partial revert of that patch to my tree to get back the
> now-used enum values.

I rebased my tree on top of the merge-window merge of davem's tree,
and added the missing flag on top of the "remove this flag" commit.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:47 PM, Stephen Rothwell  wrote:
> Hi all,
>
> After merging the infiniband tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_query_device':
> drivers/infiniband/hw/mlx5/main.c:248:34: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
>   ^
> drivers/net/ethernet/mellanox/mlx5/core/fw.c: In function 
> 'mlx5_query_odp_caps':
> drivers/net/ethernet/mellanox/mlx5/core/fw.c:79:30: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (!(dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG))
>   ^
> drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_start_eqs':
> drivers/net/ethernet/mellanox/mlx5/core/eq.c:459:28: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
> ^
>
> Really?  Code added half way though the merge window not even build
> tested?

It's not quite as bad as it seems.  The infiniband tree itself builds,
the problem is the merged tree.

The Mellanox guys merged the "cleanup"

commit 0c7aac854f52
Author: Eli Cohen 
Date:   Tue Dec 2 02:26:14 2014

net/mlx5_core: Remove unused dev cap enum fields

These enumerations are not used so remove them.

Signed-off-by: Eli Cohen 
Signed-off-by: David S. Miller 

through davem's tree, and then went ahead and used at least
MLX5_DEV_CAP_FLAG_ON_DMND_PG (which that patch removes) in patches
they merged through my tree.

I'll add a partial revert of that patch to my tree to get back the
now-used enum values.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-10-16 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.18 merge window:

 - Large set of iSER initiator improvements
 - Hardware driver fixes for cxgb4, mlx5 and ocrdma
 - Small fixes to core midlayer


Ariel Nahum (3):
  IB/iser: Unbind at conn_stop stage
  IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release
  IB/iser: Change iscsi_conn_stop log level to info

Devesh Sharma (3):
  RDMA/ocrdma: Add default GID at index 0
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  IB/core: Clear AH attr variable to prevent garbage data

Eli Cohen (5):
  IB/mlx5: Clear umr resources after ib_unregister_device
  IB/mlx5: Improve debug prints in mlx5_ib_reg_user_mr
  IB/core: Avoid leakage from kernel to user space
  IB/mlx5: Fix possible array overflow
  IB/mlx5: Remove duplicate code from mlx5_set_path

Hariprasad S (3):
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line

Jack Morgenstein (1):
  IB/core: Fix XRC race condition in ib_uverbs_open_qp

Jes Sorensen (3):
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis

Li RongQing (1):
  RDMA/ocrdma: Remove a unused-label warning

Or Gerlitz (1):
  IB/iser: Bump version, add maintainer

Roi Dayan (1):
  IB/iser: Remove unused variables and dead code

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next

Sagi Grimberg (23):
  IB/iser: Rename ib_conn -> iser_conn
  IB/iser: Re-introduce ib_conn
  IB/iser: Extend iser_free_ib_conn_res()
  IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon
  IB/iser: Don't bound release_work completions timeouts
  IB/iser: Protect tasks cleanup in case IB device was already released
  IB/iser: Signal iSCSI layer that transport is broken in error completions
  IB/iser: Centralize iser completion contexts
  IB/iser: Use internal polling budget to avoid possible live-lock
  IB/iser: Use single CQ for RX and TX
  IB/iser: Use beacon to indicate all completions were consumed
  IB/iser: Optimize completion polling
  IB/iser: Suppress scsi command send completions
  IB/iser: Nit - add space after __func__ in iser logging
  IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h
  IB/iser: Fix/add kernel-doc style description in iscsi_iser.c
  IB/mlx5: Use enumerations for PI copy mask
  IB/iser: Remove redundant assignment
  IB/iser: Set IP_CSUM as default guard type
  IB/mlx5: Use extended internal signature layout
  IB/iser: Centralize ib_sig_domain settings
  Target/iser: Centralize ib_sig_domain setting
  IB/mlx5, iser, isert: Add Signature API additions

Selvin Xavier (1):
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs

Steve Wise (1):
  RDMA/cxgb4: Make c4iw_wr_log_size_order static

Yishai Hadas (1):
  IB/mlx5: Modify to work with arbitrary page size

 MAINTAINERS  |   1 +
 drivers/infiniband/core/uverbs_cmd.c |   2 +
 drivers/infiniband/core/uverbs_main.c|   5 +
 drivers/infiniband/hw/cxgb4/cm.c |  32 +-
 drivers/infiniband/hw/cxgb4/device.c |   2 +-
 drivers/infiniband/hw/mlx5/main.c|   8 +-
 drivers/infiniband/hw/mlx5/mem.c |  18 +-
 drivers/infiniband/hw/mlx5/mr.c  |   6 +-
 drivers/infiniband/hw/mlx5/qp.c  | 149 +++---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  25 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  12 +
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h| 238 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c | 313 ++---
 drivers/infiniband/ulp/iser/iscsi_iser.h | 408 +++-
 drivers/infiniband/ulp/iser/iser_initiator.c | 198 
 drivers/infiniband/ulp/iser/iser_memory.c|  99 ++--
 drivers/infiniband/ulp/iser/iser_verbs.c | 667 +++
 drivers/infiniband/ulp/isert/ib_isert.c  |  65 ++-
 include/linux/mlx5/qp.h  |  35 +-
 include/rdma/ib_verbs.h  |  32 +-
 21 files changed, 1372 insertions(+), 953 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-09-23 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus

This is later and bigger than I would like, and the blame is all on
me: I got very busy with other stuff for a few weeks during the 3.17
cycle, and didn't prepare this tree as soon as I should have.  However
I don't think there's anything risky here, and no one really cares if
we break InfiniBand in 3.17 anyway...


Last late set of InfiniBand/RDMA fixes for 3.17:

 - Fixes for the new memory region re-registration support
 - iSER initiator error path fixes
 - Grab bag of small fixes for the qib and ocrdma hardware drivers
 - Larger set of fixes for mlx4, especially in RoCE mode


Alex Estrin (1):
  IPoIB: Remove unnecessary port query

Devesh Sharma (2):
  RDMA/ocrdma: Report correct value of max_fast_reg_page_list_len
  RDMA/ocrdma: Do not skip setting deferred_arm

Jack Morgenstein (6):
  IB/mlx4: Fix lockdep splat for the iboe lock
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Fix VF mac handling in RoCE

Markus Stockhausen (1):
  IB/mlx4: Disable TSO for Connect-X rev. A0 HCAs

Matan Barak (2):
  mlx4: Correct error flows in rereg_mr
  IB/core: When marshaling uverbs path, clear unused fields

Mike Marciniszyn (3):
  IB/ipath: Change get_user_pages() usage to always NULL vmas
  IB/qib: Change get_user_pages() usage to always NULL vmas
  IB/qib: Correct reference counting in debugfs qp_stats

Moni Shoua (5):
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Avoid executing gid task when device is being removed

Or Gerlitz (1):
  IB/iser: Bump version to 1.4.1

Roi Dayan (1):
  IB/iser: Fix RX/TX CQ resource leak on error flow

Roland Dreier (1):
  Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into 
for-next

Sagi Grimberg (1):
  IB/iser: Allow bind only when connection state is UP

Shawn Bohrer (1):
  IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get

devesh.sha...@emulex.com (2):
  RDMA/ocrdma: Resolve L2 address when creating user AH
  RDMA/ocrdma: Use right macro in query AH

 drivers/infiniband/core/umem.c |  19 ++-
 drivers/infiniband/core/uverbs_marshall.c  |   4 +
 drivers/infiniband/hw/ipath/ipath_user_pages.c |   6 +-
 drivers/infiniband/hw/mlx4/main.c  | 169 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |   1 +
 drivers/infiniband/hw/mlx4/mr.c|   7 +-
 drivers/infiniband/hw/mlx4/qp.c|  60 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  43 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|   6 +-
 drivers/infiniband/hw/qib/qib_debugfs.c|   3 +-
 drivers/infiniband/hw/qib/qib_qp.c |   8 --
 drivers/infiniband/hw/qib/qib_user_pages.c |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c   |  19 ++-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |   2 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |  24 ++--
 drivers/net/ethernet/mellanox/mlx4/mr.c|  33 +++--
 drivers/net/ethernet/mellanox/mlx4/port.c  |  11 +-
 include/rdma/ib_umem.h |   1 +
 19 files changed, 277 insertions(+), 155 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-03 Thread Roland Dreier
> I would like to note that we at Los Alamos National Laboratory are very
> interested in this functionality and it would be great if it gets accepted.

Have you done any review or testing of these changes?  If so can you
share the results?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-08-14 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.17 merge window:

 - MR reregistration support
 - MAD support for RMPP in userspace
 - iSER and SRP initiator updates
 - ocrdma hardware driver updates
 - other fixes...


Alex Estrin (1):
  IB/ipoib: Avoid multicast join attempts with invalid P_key

Ariel Nahum (3):
  IB/iser: Seperate iser_conn and iscsi_endpoint storage space
  IB/iser: Protect iser state machine with a mutex
  IB/iser: Replace connection waitqueue with completion object

Bart Van Assche (3):
  scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior
  IB/srp: Fix deadlock between host removal and multipathd
  IB/srp: Fix residual handling

Dan Carpenter (1):
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()

Devesh Sharma (7):
  RDMA/ocrdma: Avoid posting DPP requests for RDMA READ
  be2net: Issue shutdown event to ocrdma driver
  RDMA/ocrdma: Handle shutdown event from be2net driver
  RDMA/ocrdma: Remove hardcoding of the max DPP QPs supported
  RDMA/ocrdma: Delete AH table if ocrdma_init_hw fails after AH table 
creation
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/ocrdma: Update sli data structure for endianness

Doug Ledford (2):
  IB/srpt: Handle GID change events
  RDMA/uapi: Include socket.h in rdma_user_cm.h

Erez Shitrit (2):
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipoib: Avoid flushing the workqueue from worker context

Fabian Frederick (3):
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]

Ira Weiny (5):
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add user space RMPP support

Jack Morgenstein (1):
  mlx4_core: Add support for secure-host and SMP firewall

Matan Barak (3):
  IB/core: Add user MR re-registration support
  mlx4_core: Add helper functions to support MR re-registration
  IB/mlx4_ib: Add support for user MR re-registration

Mitesh Ahuja (4):
  RDMA/ocrdma: Allow only SEND opcode in case of UD QPs
  RDMA/ocrdma: Do proper cleanup even if FW is in error state
  RDMA/ocrdma: Return proper value for max_mr_size
  RDMA/ocrdma: report asic-id in query device

Or Gerlitz (1):
  IB/ipath: Add P_Key change event support

Roi Dayan (3):
  IB/iser: Support IPv6 address family
  IB/iser: Add TIMEWAIT_EXIT event handling
  IB/iser: Clarify a duplicate counters check

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next

Sagi Grimberg (2):
  IB/iser: Fix responder resources advertisement
  IB/iser: Remove redundant return code in iser_free_ib_conn_res()

Selvin Xavier (8):
  RDMA/ocrdma: Query and initalize the PFC SL
  RDMA/ocrdma: Add hca_type and fixing fw_version string in device 
atrributes
  RDMA/ocrdma: Avoid reporting wrong completions in case of error CQEs
  RDMA/ocrdma: Add missing adapter mailbox opcodes
  RDMA/ocrdma: Increase the size of STAG array in dev structure to 16K
  RDMA/ocrdma: Initialize the GID table while registering the device
  RDMA/ocrdma: Fix a sparse warning
  RDMA/ocrdma: Update the ocrdma module version string

Steve Wise (2):
  RDMA/cxgb4: Only call CQ completion handler if it is armed
  RDMA/iwcm: Use a default listen backlog if needed

Wei Yongjun (1):
  IB/srp: Fix return value check in srp_init_module()

 Documentation/infiniband/user_mad.txt  |  13 +-
 drivers/infiniband/core/agent.c|  16 +-
 drivers/infiniband/core/cm.c   |   5 +-
 drivers/infiniband/core/iwcm.c |  27 ++
 drivers/infiniband/core/mad.c  | 283 +---
 drivers/infiniband/core/mad_priv.h |   3 -
 drivers/infiniband/core/sa_query.c |   2 +-
 drivers/infiniband/core/user_mad.c | 188 +++--
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   |  93 +++
 drivers/infiniband/core/uverbs_main.c  |   1 +
 drivers/infiniband/hw/amso1100/c2_cq.c |   7 +-
 drivers/infiniband/hw/cxgb4/ev

[GIT PULL] please pull infiniband.git

2014-07-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA fixes for 3.16

 - cxgb4 hardware driver regression fixes
 - mlx5 hardware driver regression fixes


Hariprasad S (2):
  RDMA/cxgb4: Fix skb_leak in reject_cr()
  RDMA/cxgb4: Clean up connection on ARP error

Or Gerlitz (1):
  IB/mlx5: Enable "block multicast loopback" for kernel consumers

Roland Dreier (1):
  Merge branches 'cxgb4' and 'mlx5' into for-next

Sagi Grimberg (1):
  mlx5_core: Fix possible race between mr tree insert/delete

Steve Wise (2):
  RDMA/cxgb4: Initialize the device status page
  RDMA/cxgb4: Call iwpm_init() only once

 drivers/infiniband/hw/cxgb4/cm.c | 14 +++---
 drivers/infiniband/hw/cxgb4/device.c | 18 +++---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +-
 drivers/infiniband/hw/mlx5/qp.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/mr.c | 19 +++
 5 files changed, 39 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-06-10 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.16:

 - Add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - Fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - A pile of SRP initiator fixes from Bart Van Assche.

 - Fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
   low-level drivers.


Ariel Nahum (2):
  IB/iser: Simplify connection management
  IB/iser: Fix a possible race in iser connection states transition

Bart Van Assche (11):
  IB/srp: Fix a sporadic crash triggered by cable pulling
  IB/srp: Fix kernel-doc warnings
  IB/srp: Introduce an additional local variable
  IB/srp: Introduce srp_map_fmr()
  IB/srp: Introduce srp_finish_mapping()
  IB/srp: Introduce the 'register_always' kernel module parameter
  IB/srp: One FMR pool per SRP connection
  IB/srp: Rename FMR-related variables
  IB/srp: Add fast registration support
  IB/umad: Fix error handling
  IB/umad: Fix use-after-free on close

Christoph Jaeger (1):
  RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths

Colin Ian King (1):
  IB/mlx4: fix unitialised variable is_mcast

Dan Carpenter (2):
  RDMA/cxgb3: Fix information leak in send_abort()
  RDMA/cxgb3: Remove a couple unneeded conditions

Dennis Dalessandro (1):
  IB/ipath: Translate legacy diagpkt into newer extended diagpkt

Dotan Barak (1):
  mlx4_core: Fix memory leaks in SR-IOV error paths

Duan Jiong (1):
  RDMA/ocrdma: Convert to use simple_open()

Haggai Eran (7):
  IB/mlx5: Fix error handling in reg_umr
  IB/mlx5: Add MR to radix tree in reg_mr_callback
  mlx5_core: Store MR attributes in mlx5_mr_core during creation and after 
UMR
  IB/mlx5: Set QP offsets and parameters for user QPs and not just for 
kernel QPs
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix port kobject deletion during error flow
  IB/core: Fix kobject leak on device register error flow

Jack Morgenstein (5):
  mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAP
  IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped
  IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses
  mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs
  IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs

Jiri Kosina (2):
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB/mlx4: Fix gfp passing in create_qp_common()

Joe Perches (1):
  IB/srp: Avoid problems if a header uses pr_fmt

Manuel Schölling (1):
  IB/ipath: Use time_before()/_after()

Mike Marciniszyn (1):
  IB/qib: Fix port in pkey change event

Or Gerlitz (3):
  IB/iser: Bump version to 1.4
  IB: Return error for unsupported QP creation flags
  IB: Add a QP creation flag to use GFP_NOIO allocations

Roi Dayan (1):
  IB/iser: Add missing newlines to logging messages

Roland Dreier (6):
  IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  IB/mad: Fix sparse warning about gfp_t use
  IB/core: Fix sparse warnings about redeclared functions
  mlx4_core: Fix GFP flags parameters to be gfp_t
  Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next

Sagi Grimberg (3):
  mlx5_core: Fix signature handover operation for interleaved buffers
  mlx5_core: Simplify signature handover wqe for interleaved buffers
  mlx5_core: Copy DIF fields only when input and output space values match

Shachar Raindel (1):
  IB/mlx5: Refactor UMR to have its own context struct

Steve Wise (2):
  RDMA/cxgb4: Fix vlan support
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service

Tatyana Nikolova (2):
  RDMA/core: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service

Upinder Malhi (1):
  IB/usnic: Fix source file missing copyright and license

Vinit Agnihotri (1):
  IB/qib: Additional Intel branding changes

Yann Droneaud (5):
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq
  RDMA/cxgb4: Add missing padding at end of struct c

Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-19 Thread Roland Dreier
On Sat, May 17, 2014 at 1:52 PM, Or Gerlitz  wrote:
> Roland, we're soon on -rc6 and there's no reason for this to miss
> 3.16, could you please comment whether you want it to go through your
> tree or net-next?

I will pick it up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread tip-bot for Roland Dreier
Commit-ID:  c81c8a1eeede61e92a15103748c23d100880cc8a
Gitweb: http://git.kernel.org/tip/c81c8a1eeede61e92a15103748c23d100880cc8a
Author: Roland Dreier 
AuthorDate: Fri, 2 May 2014 11:18:41 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 2 May 2014 11:52:26 -0700

x86, ioremap: Speed up check for RAM pages

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier 
Link: 
http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-rol...@kernel.org
Signed-off-by: H. Peter Anvin 
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac15..bc7527e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i < nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) &&
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr >> PAGE_SHIFT;
last_pfn = last_addr >> PAGE_SHIFT;
-   for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread Roland Dreier
From: Roland Dreier 

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier 
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac155c91c..bc7527e109c8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i < nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) &&
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr >> PAGE_SHIFT;
last_pfn = last_addr >> PAGE_SHIFT;
-   for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-05-01 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 3.15-rc4:

 - cxgb4 hardware driver fixes


Hariprasad S (1):
  RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter

Steve Wise (3):
  RDMA/cxgb4: Fix endpoint mutex deadlocks
  RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
  RDMA/cxgb4: Only allow kernel db ringing for T4 devs

 drivers/infiniband/hw/cxgb4/Kconfig   |  6 ++---
 drivers/infiniband/hw/cxgb4/cm.c  | 39 ++-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  1 +
 drivers/infiniband/hw/cxgb4/qp.c  | 13 +++
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 14 +++
 5 files changed, 55 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-04-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand/RDMA updates for 3.15-rc2:

 - Mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree.

 - Drop deprecated MSI-X API use.

 - A couple other miscellaneous things.


Alexander Gordeev (2):
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()

Eli Cohen (1):
  IB/mlx5: Add block multicast loopback support

Hariprasad Shenai (1):
  RDMA/cxgb4: Use pr_warn_ratelimited

Roland Dreier (1):
  Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next

Steve Wise (9):
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Fix over-dereference when terminating

 drivers/infiniband/hw/cxgb4/cm.c | 89 
 drivers/infiniband/hw/cxgb4/cq.c | 24 -
 drivers/infiniband/hw/cxgb4/device.c | 41 ---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +
 drivers/infiniband/hw/cxgb4/mem.c|  6 ++-
 drivers/infiniband/hw/cxgb4/provider.c   |  2 +-
 drivers/infiniband/hw/cxgb4/qp.c | 70 +++--
 drivers/infiniband/hw/cxgb4/resource.c   | 10 ++--
 drivers/infiniband/hw/cxgb4/t4.h | 72 --
 drivers/infiniband/hw/mlx5/main.c|  2 +
 drivers/infiniband/hw/mlx5/qp.c  | 12 +
 drivers/infiniband/hw/mthca/mthca_main.c |  8 +--
 drivers/infiniband/hw/qib/qib_pcie.c | 55 ++--
 include/linux/mlx5/device.h  |  1 +
 include/linux/mlx5/qp.h  |  1 +
 15 files changed, 270 insertions(+), 125 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-04-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.15:

 - The biggest change is core API extensions and mlx5 low-level driver
   support for handling DIF/DIX-style protection information, and the
   addition of PI support to the iSER initiator.  Target support will be
   arriving shortly through the SCSI target tree.

 - A nice simplification to the "umem" memory pinning library now that
   we have chained sg lists.  Kudos to Yishai Hadas for realizing our
   code didn't have to be so crazy.

 - Another nice simplification to the sg wrappers used by qib, ipath and
   ehca to handle their mapping of memory to adapter.

 - The usual batch of fixes to bugs found by static checkers etc. from
   intrepid people like Dan Carpenter and Yann Droneaud.

 - A large batch of cxgb4, ocrdma, qib driver updates.


Alex Tabachnik (2):
  IB/iser: Introduce pi_enable, pi_guard module parameters
  IB/iser: Initialize T10-PI resources

Ariel Nahum (1):
  IB/iser: Remove struct iscsi_iser_conn

Bart Van Assche (7):
  IB/mlx4: Fix a sparse endianness warning
  scsi_transport_srp: Fix two kernel-doc warnings
  IB/srp: Add more logging
  IB/srp: Avoid duplicate connections
  IB/srp: Make writing into the "add_target" sysfs attribute interruptible
  IB/srp: Avoid that writing into "add_target" hangs due to a cable pull
  IB/srp: Fix a race condition between failing I/O and I/O completion

CQ Tang (1):
  IB/qib: Change SDMA progression mode depending on single- or multi-rail

Dan Carpenter (7):
  IB/qib: Remove duplicate check in get_a_ctxt()
  RDMA/nes: Clean up a condition
  RDMA/cxgb4: Fix underflows in c4iw_create_qp()
  RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()
  IB/qib: Cleanup qib_register_observer()
  mlx4_core: Fix some indenting in mlx4_ib_add()
  mlx4_core: Make buffer larger to avoid overflow warning

Dennis Dalessandro (3):
  IB/qib: Fix potential buffer overrun in sending diag packet routine
  IB/ipath: Fix potential buffer overrun in sending diag packet routine
  IB/qib: Fix memory leak of recv context when driver fails to initialize.

Devesh Sharma (9):
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: Update version string
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Code clean-up

Fabio Estevam (1):
  IB/usnic: Remove '0x' when using %pa format

Mike Marciniszyn (7):
  IB/qib: Fix debugfs ordering issue with multiple HCAs
  IB/qib: Add percpu counter replacing qib_devdata int_counter
  IB/qib: Modify software pma counters to use percpu variables
  IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/core: Remove overload in ib_sg_dma*

Moni Shoua (1):
  IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler

Or Gerlitz (3):
  IB/iser: Print QP information once connection is established
  IB/iser: Update Mellanox copyright note
  IB/iser: Bump driver version to 1.3

Prarit Bhargava (1):
  RDMA/ocrdma: Fix compiler warning

Randy Dunlap (1):
  IB/iser: Fix sector_t format warning

Roi Dayan (1):
  IB/iser: Drain the tx cq once before looping on the rx cq

Roland Dreier (2):
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 
'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next

Sagi Grimberg (23):
  IB/core: Introduce protected memory regions
  IB/core: Introduce signature verbs API
  mlx5: Implement create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature-related members
  IB/mlx5: Break up wqe handling into begin & finish routines
  IB/mlx5: Remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Expose support for signature MR feature
  IB/iser: Suppress completions for fast registration work requests
  IB/iser: Avoid FRWR notation, use fastreg instead
  IB/i

Re: linux rdma 3.14 merge plans

2014-03-07 Thread Roland Dreier
Sure, no problem.

Do you have a git tree with the latest versions of all the changes you
want for 3.15 in a branch?  That would be helpful as I catch up on
applying things, so that I don't miss anything.

If you don't have one, taking a little time to set one up on github or
wherever would be nice.  You can base your set of changes on Linus's
latest tree.

Thanks!
  Roland

On Thu, Mar 6, 2014 at 9:07 PM, Devesh Sharma  wrote:
> Hi Roland,
>
> Is it okay to send next series of patches even if previous series is not 
> accepted yet in your tree? Off-course I will cut patches on top of previous 
> series of patches.
>
> -Regards
>  Devesh
>
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> Sent: Thursday, March 06, 2014 12:34 AM
> To: Roland Dreier
> Cc: Or Gerlitz; Hefty Sean; linux-rdma; Martin K. Petersen; target-devel; 
> Sagi Grimberg; linux-kernel
> Subject: Re: linux rdma 3.14 merge plans
>
> On Wed, 2014-03-05 at 07:18 -0800, Roland Dreier wrote:
>> On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger
>>  wrote:
>> > That all said, do you have an objection wrt taking this bits through
>> > target-pending..?  Given the dependencies involved, that would seem
>> > the most logical path to take.
>>
>> Perhaps not surprisingly, I would prefer to get a chance to review a
>> major change to the core RDMA midlayer rather than having you merge it
>> through your tree.  So yes I do object.  Please give me a chance to
>> review and merge it.  I am working on that this week.
>>
>
> Great.  We'll be looking for a response by the end of the week.
>
> Otherwise if you end up not having time, we'd still like to move forward for 
> v3.15 given the amount of review the series has already gotten on the list.
>
> Thank you,
>
> --nab
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body of a message to majord...@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mlx4: Use GFP_NOFS calls during the ipoib TX path when creating the QP

2014-03-05 Thread Roland Dreier
On Thu, Feb 27, 2014 at 2:42 AM, Jiri Kosina  wrote:
> Whatever suits you best. To sum it up:
>
> - mlx4 is confirmed to have this problem, and we know how that problem
>   happens -- see the paragraph in the changelog explaining the dependency
>   between memory reclaim and allocation of TX ring
>
> - we have a work around which requires human interaction in order
>   to provide the information whether GFP_NOFS should be used or not
>
> - I can very well understand why Mellanox would see that as a hack, but if
>   more comprehensive fix is necessary, I'd expect those who understand
>   the code the best to come up with a solution/proposal. I'd assume that
>   you don't  want to keep the code with known and easily triggerable
>   deadlock out there unfixed.
>
> - where I see the potential for layering violation in any 'general'
>   solution is that it's the filesystem that has to be "talking" to the
>   underlying netdevice, i.e. you'll have to make filesystem
>   netdevice-aware, right?

It's quite clear that this is a general problem with IPoIB connected
mode on any IB device.  In connected mode, a packet send can trigger
establishing a new connection, which will allocate a new QP, which in
particular will allocate memory for the QP in the low-level IB device
driver.  Currently I'm positive that every driver will do GFP_KERNEL
allocations when allocating a QP (ehca does both a GFP_KERNEL
kmem_cache allocation and vmalloc in internal_create_qp(), mlx5 and
mthca are similar to mlx4 and qib does vmalloc() in qib_create_qp()).
So this patch needs to be extended to the other 4 IB device drivers in
the tree.

Also, I don't think GFP_NOFS is enough -- it seems we need GFP_NOIO,
since we could be swapping to a block device over iSCSI over IPoIB-CM,
so even non-FS stuff could deadlock.

I don't think it makes any sense to have a "do_not_deadlock" module
parameter, especially one that defaults to "false."  If this is the
right thing to do, then we should just unconditionally do it.

It does seem that only using GFP_NOIO when we really need to would be
a very difficult problem--how can we carry information about whether a
particular packet is involved in freeing memory through all the layers
of, say, NFS, TCP, IPSEC, bonding, &c?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux rdma 3.14 merge plans

2014-03-05 Thread Roland Dreier
On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger
 wrote:
> That all said, do you have an objection wrt taking this bits through
> target-pending..?  Given the dependencies involved, that would seem the
> most logical path to take.

Perhaps not surprisingly, I would prefer to get a chance to review a
major change to the core RDMA midlayer rather than having you merge it
through your tree.  So yes I do object.  Please give me a chance to
review and merge it.  I am working on that this week.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "driver core: synchronize device shutdown"

2014-03-04 Thread Roland Dreier
> Hm, no one seems to have said anything for the past 5 years about this.

It definitely is hard to hit -- you have to do "shutdown" or "reboot"
right as something schedules async work.  In our case we have some
systems with a large and slightly flaky SAS fabric, so there's a
constant level of re-probing SCSI disks, and we occasionally see
reboots hanging due to waiting for never-finishing sd probe async
work.

AFAICT the synchronization does nothing useful and is just a remnant
of a patch series where the real meat didn't get applied.  But of
course it would be great if Shaohua could confirm my understanding.

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Revert "driver core: synchronize device shutdown"

2014-03-04 Thread Roland Dreier
From: Roland Dreier 

This reverts commit 401097ea4b89846d66ac78f7f108d49c2e922d9c.  The
original changelog said:

A patch series to make .shutdown execute asynchronously.  Some drivers's
shutdown can take a lot of time.  The patches can help save some shutdown
time.  The patches use Arjan's async API.

This patch:

synchronize all tasks submitted by .shutdown

However, I'm not able to find any evidence that any other patches from
this series were applied, nor am I able to find any async tasks that are
scheduled in a .shutdown context.

On the other hand, we see occasional hangs on shutdown that appear to be
caused by the async_synchronize_full() in device_shutdown() waiting
forever for the async probing in sd if a SCSI disk shows up at just the
wrong time — the system starts the probe, but begins shutting down and
tears down too much of the SCSI driver to finish the probe.

If we had any async shutdown tasks, I guess the right fix would be to
create a "shutdown" async domain and have device_shutdown() only wait
for that domain.  But since there apparently are no async shutdown
tasks, we can just revert the waiting.

Signed-off-by: Roland Dreier 
---
 drivers/base/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b567177ef78..afea3697fa2e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -2003,7 +2002,6 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
-   async_synchronize_full();
 }
 
 /*
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] NTB: Fix typo in setting one translation register

2014-02-21 Thread Roland Dreier
From: Roland Dreier 

In the code for Xeon devices in back-to-back mode with xeon_errata_workaround
disabled, the downstream device puts the wrong value in SNB_B2B_XLAT_OFFSETL
(SNB_MBAR01_DSD_ADDR vs. SNB_MBAR01_USD_ADDR).

This was spotted while reading code, since the typo has no practical effect,
at least for now: the low 32 bits of both constants are actually identical
anyway.  However, it's clearer and safer to use the right name.

Signed-off-by: Roland Dreier 
---
 drivers/ntb/ntb_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 170e8e60cdb7..2774d356b689 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -785,7 +785,7 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
/* B2B_XLAT_OFFSET is a 64bit register, but can
 * only take 32bit writes
 */
-   writel(SNB_MBAR01_DSD_ADDR & 0x,
+   writel(SNB_MBAR01_USD_ADDR & 0x,
   ndev->reg_base + SNB_B2B_XLAT_OFFSETL);
writel(SNB_MBAR01_USD_ADDR >> 32,
   ndev->reg_base + SNB_B2B_XLAT_OFFSETU);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-02-14 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



RDMA/InfiniBand fixes for 3.14-rc3:
 - Fix some rough edges from the "IP addressing for IBoE" merge
 - Other misc fixes, mostly to hardware drivers


Dan Carpenter (1):
  IB/iser: Fix use after free in iser_snd_completion()

Devesh Sharma (2):
  RDMA/ocrdma: Fix traffic class shift
  RDMA/ocrdma: Fix load time panic during GID table init

Eli Cohen (4):
  IB/mlx5: Fix RC transport send queue overhead computation
  IB/mlx5: Fix binary compatibility with libmlx5
  IB/mlx5: Don't set "block multicast loopback" capability
  IB/mlx5: Remove dependency on X86

Julia Lawall (2):
  RDMA/nes: Fix error return code
  RDMA/amso1100: Fix error return code

Kumar Sanghvi (1):
  RDMA/cxgb4: Add missing neigh_release in LE-Workaround path

Matan Barak (1):
  IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device

Mike Marciniszyn (1):
  IB/qib: Add missing serdes init sequence

Moni Shoua (6):
  IB/mlx4: Make sure GID index 0 is always occupied
  IB/mlx4: Move rtnl locking to the right place
  IB/mlx4: Do IBoE locking earlier when initializing the GID table
  IB/mlx4: Do IBoE GID table resets per-port
  IB/mlx4: Build the port IBoE GID table properly under bonding
  IB: Report using RoCE IP based gids in port caps

Roi Dayan (1):
  IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to 
iser connection

Roland Dreier (2):
  mlx5: Add include of  because of kzalloc()/kfree() use
  Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 
'ocrdma', 'qib' and 'usnic' into for-next

Upinder Malhi (1):
  IB/usnic: Fix smatch endianness error

 drivers/infiniband/hw/amso1100/c2.c |   4 +-
 drivers/infiniband/hw/amso1100/c2_rnic.c|   3 +-
 drivers/infiniband/hw/cxgb4/cm.c|   1 +
 drivers/infiniband/hw/mlx4/main.c   | 185 +---
 drivers/infiniband/hw/mlx5/Kconfig  |   2 +-
 drivers/infiniband/hw/mlx5/main.c   |  22 ++-
 drivers/infiniband/hw/mlx5/qp.c |  18 ++-
 drivers/infiniband/hw/mlx5/user.h   |   7 +
 drivers/infiniband/hw/nes/nes.c |   5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   4 +-
 drivers/infiniband/hw/qib/qib_iba7322.c |   5 +
 drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c   |   9 +-
 drivers/infiniband/ulp/iser/iser_initiator.c|   3 +-
 drivers/infiniband/ulp/iser/iser_verbs.c|  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |   2 +-
 include/linux/mlx5/driver.h |   3 +
 include/rdma/ib_verbs.h |   3 +-
 18 files changed, 214 insertions(+), 74 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux rdma 3.14 merge plans

2014-02-06 Thread Roland Dreier
On Thu, Feb 6, 2014 at 4:02 PM, Nicholas A. Bellinger
 wrote:
> Can you give us an estimate of when you'll have some time to give
> feedback on the outstanding patches..?

I hope to get to it in the next few weeks.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-01-24 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.14:
 - Flow steering for InfiniBand UD traffic
 - IP-based addressing for IBoE aka RoCE
 - Pass SRP submaintainership from Dave to Bart
 - SRP transport fixes from Bart
 - Add the new Cisco usNIC low-level device driver
 - Various other fixes


Bart Van Assche (4):
  scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo 
= off
  scsi_transport_srp: Fix a race condition
  scsi_transport_srp: Add rport state diagram
  scsi_transport_srp: Fix kernel-doc warnings

Dan Carpenter (2):
  mlx5_core: Remove dead code
  IB/usnic: Use GFP_ATOMIC under spinlock

David Dillow (1):
  MAINTAINERS: Pass the torch of SRP submaintainership

Devesh Sharma (2):
  RDMA/ocrdma: Fix AV_VALID bit position
  RDMA/ocrdma: Fix OCRDMA_GEN2_FAMILY macro definition

Ding Tianhong (1):
  RDMA/nes: Slight optimization of Ethernet address compare

Eli Cohen (13):
  IB/mlx5: Remove unused code in mr.c
  IB/mlx5: Fix micro UAR allocator
  IB/mlx5: Clear out struct before create QP command
  mlx5_core: Use mlx5 core style warning
  IB/mlx5: Make sure doorbell record is visible before doorbell
  IB/mlx5: Implement modify CQ
  IB/mlx5: Add support for resize CQ
  mlx5_core: Improve debugfs readability
  mlx5_core: Fix PowerPC support
  IB/mlx5: Allow creation of QPs with zero-length work queues
  IB/mlx5: Abort driver cleanup if teardown hca fails
  IB/mlx5: Remove old field for create mkey mailbox
  IB/mlx5: Verify reserved fields are cleared

Haggai Eran (1):
  mlx5_core: Fix out arg size in access_register command

Ira Weiny (1):
  IB/qib: Fix QP check when looping back to/from QP1

Julia Lawall (1):
  IB/mlx4: Fix error return code

Matan Barak (9):
  IB/core: Add flow steering support for IPoIB UD traffic
  IB/core: Add support for IB L2 device-managed steering
  mlx4_core: Add support for steerable IB UD QPs
  IB/mlx4: Enable device-managed steering support for IB ports too
  IB/mlx4: Add mechanism to support flow steering over IB links
  IB/mlx4: Add support for steerable IB UD QPs
  IB/core: Ethernet L2 attributes in verbs/cm structures
  IB/core: Make ib_addr a core IB module
  IB/mlx4: Add dependency INET

Michal Schmidt (1):
  IPoIB: Report operstate consistently when brought up without a link

Moni Shoua (5):
  IB/cma: IBoE (RoCE) IP-based GID addressing
  IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table
  IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
  RDMA/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing
  RDMA/ocrdma: Populate GID table with IP based gids

Or Gerlitz (2):
  IB/core: Resolve Ethernet L2 addresses when modifying QP
  IB/core: Fix unused variable warning

Paul Bolle (1):
  RDMA/cxgb4: Fix gcc warning on 32-bit arch

Roland Dreier (6):
  IB/usnic: Fix typo "Ignorning" -> "Ignoring"
  RDMA/ocrdma: Move ocrdma_inetaddr_event outside of "#if CONFIG_IPV6"
  RDMA/ocrdma: Add dependency on INET
  IB/mlx4: Use IS_ENABLED(CONFIG_IPV6)
  Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 
'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
  Merge branch 'ip-roce' into for-next

Somnath Kotur (1):
  RDMA/cma: Handle global/non-linklocal IPv6 addresses in 
cma_check_linklocal()

Svetlana Mavrina (1):
  RDMA/amso1100: Add check if cache memory was allocated before freeing it

Upinder Malhi (22):
  IB/usnic: Add Cisco VIC low-level hardware driver
  IB/usnic: Change WARN_ON to lockdep_assert_held
  IB/usnic: Add struct usnic_transport_spec
  IB/usnic: Push all forwarding state to usnic_fwd.[hc]
  IB/usnic: Port over main.c and verbs.c to the usnic_fwd.h
  IB/usnic: Port over usnic_ib_qp_grp.[hc] to new usnic_fwd.h
  IB/usnic: Port over sysfs to new usnic_fwd.h
  IB/usnic: Update ABI and Version file for UDP support
  IB/usnic: Add UDP support to usnic_fwd.[hc]
  IB:usnic: Add UDP support to usnic_transport.[hc]
  IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h
  IB/usnic: Add UDP support in usnic_ib_qp_grp.[hc]
  IB/core: Add RDMA_TRANSPORT_USNIC_UDP
  IB/usnic: Remove superflous parentheses
  IB/usnic: Use for_each_sg instead of a for-loop
  IB/usnic: Expose flows via debugfs
  IB/usnic: Append documentation to usnic_transport.h and cleanup
  IB/usnic: Fix endianness-related warnings
  IB/usnic: Add d

Re: linux rdma 3.14 merge plans

2014-01-21 Thread Roland Dreier
On Tue, Jan 21, 2014 at 2:00 PM, Or Gerlitz  wrote:
> Roland, ping! the signature patches were posted > three months ago. We
> deserve a response from the maintainer that goes beyond "I need to
> think on that".
>
> Responsiveness was stated by Linus to be the #1 requirement from
> kernel maintainers.

Or, I'm not sure what response you're after from me.  Linus has also
said that maintainers should say "no" a lot more
(http://lwn.net/Articles/571995/) so maybe you want me to say, "No, I
won't merge this patch set, since it adds a bunch of complexity to
support a feature no one really cares about."  Is that it?  (And yes I
am skeptical about this stuff — I work at an enterprise storage
company and even here it's hard to find anyone who cares about
DIF/DIX, especially offload features that stop it from being
end-to-end)

I'm sure you're not expecting me to say, "Sure, I'll merge it without
understanding the problem it's solving or how it's doing that,"
especially given the your recent history of pushing me to merge stuff
like the IP-RoCE patches back when they broke the userspace ABI.

I'd really rather spend my time on something actually useful like
cleaning up softroce.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-12-23 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus

The following changes since commit 374b105797c3d4f29c685f3be535c35f5689b30e:

  Linux 3.13-rc3 (2013-12-06 09:34:04 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus

for you to fetch changes up to 22f12c60e12a4112fdca31582e66fe501600ee2b:

  Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus (2013-12-23 
09:19:02 -0800)



Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
 - Additional checks for uverbs to ensure forward compatibility, handle
   malformed input better.
 - Fix potential use-after-free in iWARP connection manager.
 - Make a function static.


Rashika (1):
  RDMA/cxgb4: Make _c4iw_write_mem_dma() static

Roland Dreier (2):
  IB/uverbs: New macro to set pointers to NULL if length is 0 in 
INIT_UDATA()
  Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus

Steve Wise (1):
  RDMA/iwcm: Don't touch cm_id after deref in rem_ref

Yann Droneaud (7):
  IB/core: const'ify inbuf in struct ib_udata
  IB/uverbs: Check reserved field in extended command header
  IB/uverbs: Check comp_mask in destroy_flow
  IB/uverbs: Check reserved fields in create_flow
  IB/uverbs: Set error code when fail to consume all flow_spec items
  IB/uverbs: Check input length in flow steering uverbs
  IB/uverbs: Check access to userspace response buffer in extended command

 drivers/infiniband/core/iwcm.c| 11 +--
 drivers/infiniband/core/uverbs.h  | 10 +-
 drivers/infiniband/core/uverbs_cmd.c  | 17 +
 drivers/infiniband/core/uverbs_main.c | 27 ---
 drivers/infiniband/hw/cxgb4/mem.c |  2 +-
 include/rdma/ib_verbs.h   |  2 +-
 6 files changed, 53 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-11-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.13:

 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC


Bart Van Assche (11):
  IB/srp: Keep rport as long as the IB transport layer
  scsi_transport_srp: Add transport layer error handling
  IB/srp: Use SRP transport layer error recovery
  IB/srp: Start timers if a transport layer error occurs
  scsi_transport_srp: Add periodic reconnect support
  IB/srp: Add periodic reconnect functionality
  IB/srp: Export sgid to sysfs
  IB/srp: Introduce srp_alloc_req_data()
  IB/srp: Make queue size configurable
  IB/srp: Avoid offlining operational SCSI devices
  IB/srp: Report receive errors correctly

Ben Hutchings (1):
  IB/cxgb4: Fix formatting of physical address

Dan Carpenter (1):
  RDMA/ocrdma: Silence an integer underflow warning

Dave Jones (1):
  RDMA/nes: Remove self-assignment from nes_query_qp()

Doug Ledford (2):
  IB/cma: Use cached gids
  IB/cma: Check for GID on listening device first

Eli Cohen (17):
  IB/mlx5: Fix check of number of entries in create CQ
  IB/mlx5: Multithreaded create MR
  IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR
  IB/mlx5: Simplify mlx5_ib_destroy_srq
  mlx5: Fix cleanup flow when DMA mapping fails
  mlx5: Support communicating arbitrary host page size to firmware
  mlx5: Clear reserved area in set_hca_cap()
  IB/mlx5: Remove dead code in mr.c
  IB/mlx5: Remove "Always false" comparison
  IB/mlx5: Update opt param mask for RTS2RTS
  mlx5: Use enum to indicate adapter page size
  IB/mlx4: Fix endless loop in resize CQ
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx5: Remove dead code
  IB/mlx5: Fix list_del of empty list
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix page shift in create CQ for userspace

Erez Shitrit (6):
  IPoIB: Fix crash in dev_open error flow
  IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()
  IPoIB: Avoid flushing the driver workqueue on dev_down
  IPoIB: Fix usage of uninitialized multicast objects
  IPoIB: Add path query flushing in ipoib_ib_dev_cleanup
  IPoIB: Start multicast join process only on active ports

Jack Wang (1):
  IB/srp: Add change_queue_depth and change_queue_type support

Jan Kara (2):
  IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast()
  IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()

Joe Perches (1):
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table

Latchesar Ionkov (1):
  IB/core: Pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly

Matan Barak (2):
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/core: Re-enable create_flow/destroy_flow uverbs

Mathias Krause (1):
  IB/netlink: Remove superfluous RDMA_NL_GET_OP() masking

Michal Nazarewicz (1):
  RDMA/cma: Remove unused argument and minor dead code

Michal Schmidt (1):
  IPoIB: lower NAPI weight

Mike Marciniszyn (2):
  IB/qib: Fix checkpatch __packed warnings
  IB/qib: Fix txselect regression

Moshe Lazer (2):
  IB/mlx5: Fix srq free in destroy qp
  mlx5_core: Change optimal_reclaimed_pages for better performance

Naresh Gottumukkala (2):
  RDMA/ocrdma: Fix a crash in rmmod
  RDMA/ocrdma: Remove redundant check in ocrdma_build_fr()

Roland Dreier (1):
  Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 
'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next

Sean Hefty (1):
  RDMA/ucma: Discard events for IDs not yet claimed by user space

Tal Alon (1):
  IPoIB: Change CM skb memory allocation to be non-atomic during init

Upinder Malhi \(umalhi\) (1):
  IB/core: Add Cisco usNIC rdma node and transport types

Vu Pham (2):
  IB/srp: Make transport layer retry count configurable
  IB/srp: Remove target from list before freeing Scsi_Host structure

Yann Droneaud (5):
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: extended command: an improved infrastructure for uverbs command

Re: linux-next: build warning after merge of the infiniband tree

2013-11-08 Thread Roland Dreier
Sorry about that, folded in the fix I missed and will push out the tree shortly.

On Mon, Nov 4, 2013 at 5:42 AM, Marciniszyn, Mike
 wrote:
> This issue was caught by Tetsuo Handa and Acked on 10/30: 
> http://marc.info/?t=13831336458&r=1&w=2.
>
> Roland, I noticed that the Tetsuo's original message didn't cc the linux-rdma 
> list?
>
> Mike
>
>> -Original Message-
>> From: Stephen Rothwell [mailto:s...@canb.auug.org.au]
>> Sent: Sunday, November 03, 2013 11:55 PM
>> To: Roland Dreier; linux-r...@vger.kernel.org
>> Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Jan Kara;
>> Marciniszyn, Mike
>> Subject: linux-next: build warning after merge of the infiniband tree
>>
>> Hi all,
>>
>> After merging the infiniband tree, today's linux-next build (x86_64
>> allmodconfig) produced this warning:
>>
>> drivers/infiniband/hw/ipath/ipath_user_sdma.c: In function
>> 'ipath_user_sdma_pin_pages':
>> drivers/infiniband/hw/ipath/ipath_user_sdma.c:283:6: warning: 'j' is used
>> uninitialized in this function [-Wuninitialized]
>>   ret = get_user_pages_fast(addr, j, 0, pages);
>>   ^
>>
>> Introduced by commit 18fec3c6bdcb ("IB/ipath: Convert
>> ipath_user_sdma_pin_pages() to use get_user_pages_fast()").  How did that
>> pass review or testing?
>>
>> --
>> Cheers,
>> Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-10-22 Thread Roland Dreier
Hi Linus,

This is the "disable ABI we don't want to freeze" pull I warned you about.

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Disable not-quite-ready userspace ABI for IB flow steering


Yann Droneaud (1):
  IB/core: Temporarily disable create_flow/destroy_flow uverbs

 drivers/infiniband/Kconfig| 11 +++
 drivers/infiniband/core/uverbs.h  |  2 ++
 drivers/infiniband/core/uverbs_cmd.c  |  4 
 drivers/infiniband/core/uverbs_main.c |  6 ++
 drivers/infiniband/hw/mlx4/main.c |  2 ++
 include/uapi/rdma/ib_user_verbs.h |  6 ++
 6 files changed, 31 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] please pull infiniband.git

2013-10-15 Thread Roland Dreier
On Mon, Oct 14, 2013 at 5:52 PM, Linus Torvalds
 wrote:
> So get your act together, and push back on the people you are supposed
> to manage. Because this is *not* acceptable for post-rc5, and I'm
> giving this single warning. Next time, I'll just ignore the sh*t you
> send me.
>
> Comprende?

Fair enough.  I've been AWOL for a month due to real life / non-kernel
stuff, and I didn't want the Mellanox guys to miss a kernel cycle just
because I couldn't get my act together.  So this one is totally on me
-- I know it's late in the cycle and I tried to sneak it in.

I do expect to send one more patch turning off a not-fully-baked new
feature for 3.12, but other than that everything else will wait for
3.13.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-10-14 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Last batch of IB changes for 3.12: many mlx5 hardware driver fixes plus
one trivial semicolon cleanup.


Eli Cohen (12):
  IB/mlx5: Fix send work queue size calculation
  mlx5: Remove checksum on command interface commands
  IB/mlx5: Decrease memory consumption of mr caches
  IB/mlx5: Avoid async events on invalid port number
  mlx5: Keep polling to reclaim pages while any returned
  mlx5: Fix layout of struct mlx5_init_seg
  IB/mlx5: Disable atomic operations
  mlx5: Fix opt param mask for sq err to rts transition
  IB/mlx5: Fix opt param mask according to firmware spec
  mlx5: Fix error code translation from firmware to driver
  IB/mlx5: Fix alignment of reg umr gather buffers
  IB/mlx5: Ensure proper synchronization accessing memory

Joe Perches (1):
  IB: Remove unnecessary semicolons

Moshe Lazer (2):
  IB/mlx5: Flush cache workqueue before destroying it
  IB/mlx5: Fix memory leak in mlx5_ib_create_srq

Roland Dreier (1):
  Merge branch 'misc' into for-next

Sagi Grimberg (1):
  IB/mlx5: Fix eq names to display nicely in /proc/interrupts

 drivers/infiniband/hw/amso1100/c2_ae.c |  2 +-
 drivers/infiniband/hw/mlx5/main.c  | 16 +++--
 drivers/infiniband/hw/mlx5/mr.c| 70 +--
 drivers/infiniband/hw/mlx5/qp.c| 80 --
 drivers/infiniband/hw/mlx5/srq.c   |  4 +-
 drivers/infiniband/hw/mthca/mthca_eq.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c   |  6 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 28 
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 21 ++
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 16 -
 include/linux/mlx5/device.h|  4 +-
 include/linux/mlx5/driver.h|  6 +-
 15 files changed, 126 insertions(+), 141 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-09-04 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.12 merge window:

 - Large ocrdma HW driver update: add "fast register" work requests,
   fixes, cleanups
 - Add receive flow steering support for raw QPs
 - Fix IPoIB neighbour race that leads to crash
 - iSER updates including support for using "fast register" memory
   registration
 - IPv6 support for iWARP
 - XRC transport fixes


CQ Tang (1):
  IB/qib: Improve SDMA performance

Hadar Hen Zion (3):
  IB/core: Add receive flow steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive flow steering support

Igor Ivanov (1):
  IB/core: Infrastructure for extensible uverbs commands

Ira Weiny (1):
  IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards

Jim Foraker (1):
  IPoIB: Fix race in deleting ipoib_neigh entries

Matan Barak (1):
  IB/core: Better checking of userspace values for receive flow steering

Naresh Gottumukkala (19):
  RDMA/ocrdma: Style and redundant code cleanup
  RDMA/ocrdma: Remove redundant dev reference
  RDMA/ocrdma: Don't allow zero/invalid sgid usage
  RDMA/ocrdma: Remove driver QP state machine
  RDMA/ocrdma: Remove __packed
  RDMA/ocrdma: Cache recv DB until QP moved to RTR
  RDMA/ocrdma: Create IRD queue fix
  RDMA/ocrdma: Add support for fast register work requests (FRWR)
  RDMA/ocrdma: Remove the MTU check based on Ethernet MTU
  RDMA/ocrdma: Fix to work with even a single MSI-X vector
  RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24
  RDMA/ocrdma: FRMA code cleanup
  RDMA/ocrdma: Dont use PD 0 for userpace CQ DB
  RDMA/ocrdma: Increase STAG array size
  RDMA/ocrdma: Fix for displaying proper link speed
  RDMA/ocrdma: Consider multiple SGES in case of DPP
  RDMA/ocrdma: Add ABI versioning support
  RDMA/ocrdma: Fill PVID in UMC case
  RDMA/ocrdma: Fix passing wrong opcode to modify_srq

Or Gerlitz (1):
  IB/iser: Use proper debug level value for info prints

Paul Bolle (1):
  IB/qib: Make qib_driver static

Roi Dayan (1):
  IB/iser: Fix possible memory leak in iser_create_frwr_pool()

Roland Dreier (2):
  RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch
  Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' 
and 'qib' into for-next

Sagi Grimberg (5):
  IB/iser: Generalize rdma memory registration
  IB/iser: Handle unaligned SG in separate function
  IB/iser: Place the fmr pool into a union in iser's IB conn struct
  IB/iser: Introduce fast memory registration model (FRWR)
  IB/iser: Fix redundant pointer check in dealloc flow

Shlomo Pongratz (2):
  IB/iser: Restructure allocation/deallocation of connection resources
  IB/iser: Accept session->cmds_max from user space

Steve Wise (9):
  RDMA/cma: Add IPv6 support for iWARP
  RDMA/cxgb4: Use correct bit shift macros for vlan filter tuples
  RDMA/cxgb4: Handle newer firmware changes
  RDMA/cxgb4: Fix QP flush logic
  RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap
  RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages
  RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK
  RDMA/cxgb4: Advertise ~0ULL as max MR size
  RDMA/cxgb4: Issue RI.FINI before closing when entering TERM

Vipul Pandya (3):
  cxgb4: Add routines to create and remove listening IPv6 servers
  cxgb4: Add CLIP support to store compressed IPv6 address
  RDMA/cxgb4: Add support for active and passive open connection with IPv6 
address

Yijing Wang (1):
  IB/qib: Clean up unnecessary MSI/MSI-X capability find

Yishai Hadas (3):
  mlx4_core: Fix XRC QPs detection in the resource tracker
  IB/core: Add locking around event dispatching on XRC target QPs
  IB/core: Fixes to XRC reference counting in uverbs

 drivers/infiniband/core/cma.c  |  44 +-
 drivers/infiniband/core/uverbs.h   |   4 +
 drivers/infiniband/core/uverbs_cmd.c   | 250 +-
 drivers/infiniband/core/uverbs_main.c  |  42 +-
 drivers/infiniband/core/verbs.c|  30 +
 drivers/infiniband/hw/amso1100/c2_ae.c |  18 +-
 drivers/infiniband/hw/amso1100/c2_cm.c |  16 +-
 drivers/infiniband/hw/cxgb3/iwch_cm.c  |  46 +-
 drivers/infiniband/hw/cxgb4/Kconfig|   2 +-
 drivers/infiniband/hw/cxgb4/cm.c   | 860 ---
 drivers/infiniband/hw/cxgb4/cq.c   | 329 +---
 drivers/infinib

Re: [PATCH 9/9] tcm_qla2xxx: Add special case for COMPARE_AND_WRITE data_direction

2013-08-21 Thread Roland Dreier
On Wed, Aug 21, 2013 at 7:38 AM, Roland Dreier  wrote:
> I don't understand this.  In fact the whole patch series looks quite
> confused.  COMPARE AND WRITE is a normal Data-Out command, with no
> requirement for special bidirectional handling or anything like that.
> The only slightly unusual thing is that a CAW command with a NUMBER OF
> LOGICAL BLOCKS equal to N will actually transfer 2*N worth of data --
> one set of data for the compare operation and a second set to write if
> the compare succeeds.  But just to be clear, the transfer of those 2*N
> blocks happens as a single transfer during the Data-Out phase.

OK, I understand the patch set a bit better.  You're using the bidi
infrastructure to have a place to stick the data that you internally
read to implement the compare, but then you end up having places like
this where you have to say, "oh it's not really a bidi command, it's
just a compare and write."

Shouldn't there be a way to confine the COMPARE AND WRITE handling to
the actual implementation of that command?  Or maybe make the bidi
handling more generic so that this becomes clearer?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] tcm_qla2xxx: Add special case for COMPARE_AND_WRITE data_direction

2013-08-21 Thread Roland Dreier
On Tue, Aug 20, 2013 at 1:08 PM, Nicholas A. Bellinger
 wrote:
> Add a special case for COMPARE_AND_WRITE for the reverse data direction
> mapping used for pci_map_sg() + friends.

I don't understand this.  In fact the whole patch series looks quite
confused.  COMPARE AND WRITE is a normal Data-Out command, with no
requirement for special bidirectional handling or anything like that.
The only slightly unusual thing is that a CAW command with a NUMBER OF
LOGICAL BLOCKS equal to N will actually transfer 2*N worth of data --
one set of data for the compare operation and a second set to write if
the compare succeeds.  But just to be clear, the transfer of those 2*N
blocks happens as a single transfer during the Data-Out phase.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-15 Thread Roland Dreier
Jens / James, do you guys plan to send this to Linus for 3.11?
Triggering this bug is a bit esoteric but the impact is pretty nasty
(corrupting an unrelated process).

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-08 Thread Roland Dreier
On Wed, Aug 7, 2013 at 9:31 AM, Douglas Gilbert  wrote:
> So what kind of signal was leading to your "stomping on the memory"?
> Was it user generated or something like SIGIO, SIGPIPE or a RT signal?

It was sometimes SIGHUP (for reopening log files) and sometimes
SIGALARM (for various periodic things).

> To get around the SG_IO ioctl restart problem (for non idempotent
> SCSI commands) could we replace a -ERESTARTSYS return value
> with -EINTR ?
>
> As I noted in a previous post, for robust user space code using the
> SG_IO ioctl, masking signals during the IO may help.

Yes, absolutely.  But process A should be able to keep its memory
uncorrupted even if process B is coded wrong :)

> And what about bsg? Is it any better or worse than sg in the case
> of interrupted SG_IO ioctls? Apart from the interface (sg_io_hdr
> v3 versus v4) it should be a drop in replacement for sg.

As far as I can tell bsg looks much better w.r.t. signals -- I don't
see anywhere that it schedules work onto a workqueue or other kernel
thread, and it looks like the SG_IO ioctl there actually has nowhere
that checks for signals.  All sleeps will be uninterruptible, which I
guess may be better or worse depending on your perspective.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-07 Thread Roland Dreier
On Wed, Aug 7, 2013 at 7:38 AM, David Milburn  wrote:
> I was able to succesfully test this patch overnight, I had been experimenting 
> with the
> sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a 
> orphan process
> which prevented the corruption, but your solution seems much better.

Very cool, thanks for the testing.

I actually looked at using BIO_NULL_MAPPED as well, but it seemed a
bit too fragile to me -- it had the right effect of skipping
__bio_copy_iov(), and skipping the __free_pages() stuff in there is OK
because sg owns its pages rather than the bio layer, but all that
seemed vulnerable to being broken by an unrelated change.

Out of curiousity, were you already working on this bug?  Because if
you had fixed it a few weeks earlier we might not have spent so long
wondering WTF was stomping on the memory of one of our processes :)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
From: Roland Dreier 

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(&sfp->rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
kref_put(&sfp->f_ref, sg_remove_sfp);
} else {
INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
schedule_work(&srp->ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley ,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis  for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 
Cc: 
---
 fs/bio.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..c5eae72 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
bio_vec *iovecs,
 int bio_uncopy_user(struct bio *bio)
 {
struct bio_map_data *bmd = bio->bi_private;
-   int ret = 0;
+   struct bio_vec *bvec;
+   int ret = 0, i;
 
-   if (!bio_flagged(bio, BIO_NULL_MAPPED))
-   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
-bmd->nr_sgvecs, bio_data_dir(bio) == READ,
-0, bmd->is_our_pages);
+   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
+   /*
+* if we're in a workqueue, the request is orphaned, so
+* don't copy into a random user address space, just free.
+*/
+   if (current->mm)
+   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
+bmd->nr_sgvecs, bio_data_dir(bio) 
== READ,
+0, bmd->is_our_pages);
+   else if (bmd->is_our_pages)
+   bio_for_each_segment_all(bvec, bio, i)
+   __free_page(bvec->bv_page);
+   }
 

Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley
 wrote:
> I agree with the analysis.  The fix is a bit draconian, though.  A
> workqueue actually runs in a kernel thread and there's a simple test for
> that (!current->mm), so how about this instead (which is much less
> intrusive)

> ---

> diff --git a/fs/bio.c b/fs/bio.c
> index 94bbc04..e2ab39c 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
> bio_vec *iovecs,
>  int bio_uncopy_user(struct bio *bio)
>  {
> struct bio_map_data *bmd = bio->bi_private;
> -   int ret = 0;
> +   struct bio_vec *bvec;
> +   int ret = 0, i;
>
> -   if (!bio_flagged(bio, BIO_NULL_MAPPED))
> -   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> -bmd->nr_sgvecs, bio_data_dir(bio) == 
> READ,
> -0, bmd->is_our_pages);
> +   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
> +   /*
> +* if we're in a workqueue, the request is orphaned, so
> +* don't copy into the kernel address space, just free
> +*/
> +   if (current->mm)
> +   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> +bmd->nr_sgvecs, 
> bio_data_dir(bio) == READ,
> +0, bmd->is_our_pages);
> +   else if (bmd->is_our_pages)
> +   bio_for_each_segment_all(bvec, bio, i)
> +   __free_page(bvec->bv_page);
> +   }
> bio_free_map_data(bmd);
> bio_put(bio);
>     return ret;

Yes, looks reasonable -- I can't think of any reason why anyone would
ever want the bio code to copy to a random userspace address space.

Acked-by: Roland Dreier 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
From: Roland Dreier 

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(&sfp->rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
kref_put(&sfp->f_ref, sg_remove_sfp);
} else {
INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
schedule_work(&srp->ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

Fix this by telling sg_finish_rem_req() whether we're on a workqueue
or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
that does everything the original blk_rq_unmap_user() does except
calling copy_{to,from}_user().  This requires a few levels of plumbing
through a "copy" flag in the bio layer.

I also considered fixing this by having the sg code just set
BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which
happens to work because the __free_page() part of __bio_copy_iov()
isn't needed for sg (because sg handles its own pages).  However, this
seems coincidental and fragile, so I preferred making the fix
explicit, at the cost of minor tweaks to the bio code.

Huge thanks to Costa Sapuntzakis  for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 
Cc: 
---
 block/blk-map.c| 15 ---
 drivers/scsi/sg.c  | 19 ++-
 fs/bio.c   | 22 +++---
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h | 11 ++-
 5 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 623e1cd..bd63201 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -25,7 +25,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request 
*rq,
return 0;
 }
 
-static int __blk_rq_unmap_user(struct bio *bio)
+static int __blk_rq_unmap_user(struct bio *bio, bool copy)
 {
int ret = 0;
 
@@ -33,7 +33,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
if (bio_flagged(bio, BIO_USER_MAPPED))
bio_unmap_user(bio);
else
-   ret = bio_uncopy_user(bio);
+   ret = bio_uncopy_user(bio, copy);
}
 
return ret;
@@ -80,7 +80,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct 
request *rq,
 
/* if it was boucned we must

[GIT PULL] please pull infiniband.git

2013-08-02 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand/RDMA fixes for 3.11-rc:
 - Fixes for the newly merged mlx5 hardware driver
 - Stack info leak fixes from Dan Carpenter
 - Fixes for pkey table handling with SR-IOV
 - A few other small things


Andi Shyti (1):
  mlx5_core: Variable may be used uninitialized

Dan Carpenter (6):
  RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()
  RDMA/ocrdma: Fix several stack info leaks
  RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq()
  RDMA/cxgb3: Fix stack info leak in iwch_create_cq()
  IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()
  mlx5_core: Fix use after free in mlx5_cmd_comp_handler()

Eli Cohen (1):
  mlx5_core: Implement new initialization sequence

Erez Shitrit (1):
  IPoIB: Fix pkey change flow for virtualization environments

Jack Morgenstein (2):
  IB/mlx4: Use default pkey when creating tunnel QPs
  IB/core: Create QP1 using the pkey index which contains the default pkey

Mike Marciniszyn (1):
  IB/qib: Add err_decode() call for ring dump

Or Gerlitz (1):
  IPoIB: Make sure child devices use valid/proper pkeys

Paul Bolle (1):
  RDMA/cma: Fix gcc warning

Roland Dreier (3):
  RDMA/ocrdma: Remove unused include
  Revert "RDMA/nes: Fix compilation error when nes_debug is enabled"
  Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 
'nes', 'ocrdma' and 'qib' into for-next

Sean Hefty (2):
  RDMA/cma: Fix accessing invalid private data for UD
  RDMA/cma: Only call cma_save_ib_info() for CM REQs

Wei Yongjun (1):
  IB/mlx5: Fix error return code in init_one()

 drivers/infiniband/core/cma.c  | 29 +
 drivers/infiniband/core/mad.c  |  8 ++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c|  1 +
 drivers/infiniband/hw/cxgb4/qp.c   |  2 +
 drivers/infiniband/hw/mlx4/mad.c   | 10 ++-
 drivers/infiniband/hw/mlx5/main.c  | 11 ++--
 drivers/infiniband/hw/mlx5/qp.c|  2 +-
 drivers/infiniband/hw/nes/nes_hw.c |  4 +-
 drivers/infiniband/hw/nes/nes_verbs.c  |  3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  1 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|  5 +-
 drivers/infiniband/hw/qib/qib_iba7322.c|  2 +
 drivers/infiniband/hw/qib/qib_sdma.c   |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 76 ++
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c   |  9 +++
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 19 --
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 69 ++--
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 20 --
 include/linux/mlx5/device.h| 20 ++
 include/linux/mlx5/driver.h|  4 +-
 21 files changed, 239 insertions(+), 60 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-07-11 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main (updated) batch of InfiniBand/RDMA changes for 3.11 merge window:

 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - New mlx5 driver for Mellanox Connect-IB adapters (including post merge 
request fixes)
 - SRP fixes from Bart Van Assche (including fix to first merge request)
 - qib HW driver updates
 - Resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - Other small changes and fixes


Bart Van Assche (6):
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Skip host settle delay
  IB/srp: Fail I/O fast if target offline
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Make HCA completion vector configurable
  IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline

Dan Carpenter (2):
  RDMA/cxgb3: Timeout condition is never true
  mlx5: Return -EFAULT instead of -EPERM

Dean Luick (1):
  IB/qib: Log all SDMA errors unconditionally

Dotan Barak (1):
  IB/srp: Fix remove_one crash due to resource exhaustion

Eli Cohen (1):
  mlx5: Add driver for Mellanox Connect-IB adapters

Gottumukkala, Naresh (1):
  RDMA/ocrdma: Remove use_cnt for queues

Jack Morgenstein (1):
  IB/core: Add reserved values to enums for low-level driver use

Mike Marciniszyn (7):
  IB/qib: Add DCA support
  IB/qib: Remove atomic_inc_not_zero() from QP RCU
  IB/qib: Optimize CQ callbacks
  IB/qib: Convert opcode counters to per-context
  IB/qib: Add per-context stats interface
  IB/qib: Add qp_stats debug file
  IB/qib: Fix module-level leak

Mitko Haralanov (1):
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane

Moshe Lazer (1):
  mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec

Naresh Gottumukkala (5):
  RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create
  RDMA/ocrdma: Replace ocrdma_err with pr_err
  RDMA/ocrdma: Set bad_wr in error case
  RDMA/ocrdma: Change macros to inline funtions
  RDMA/ocrdma: Reorg structures to avoid padding

Ramkrishna Vepa (2):
  IB/qib: Add optional NUMA affinity
  IB/qib: Add dual-rail NUMA awareness for PSM processes

Roland Dreier (6):
  mlx5: Fix parameter type of health_handler_t
  IB/mlx5: Make profile[] static in main.c
  mlx5_core: Fixes for sparse warnings
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 
'srp' into for-next
  Merge branches 'mlx5', 'qib' and 'srp' into for-next

Sean Hefty (28):
  RDMA/cma: Define native IB address
  RDMA/cma: Allow enabling reuseaddr in any state
  RDMA/cma: Include AF_IB in loopback and any address checks
  IB/addr: Add AF_IB support to ip_addr_size
  RDMA/cma: Update port reservation to support AF_IB
  RDMA/cma: Allow user to specify AF_IB when binding
  RDMA/cma: Do not modify sa_family when setting loopback address
  RDMA/cma: Add helper functions to return id address information
  RDMA/cma: Restrict AF_IB loopback to binding to IB devices only
  RDMA/cma: Verify that source and dest sa_family are the same
  RDMA/cma: Add support for AF_IB to rdma_resolve_addr()
  RDMA/cma: Add support for AF_IB to rdma_resolve_route()
  RDMA/cma: Add support for AF_IB to cma_get_service_id()
  RDMA/cma: Remove unused SDP related code
  RDMA/cma: Merge cma_get/save_net_info
  RDMA/cma: Expose private data when using AF_IB
  RDMA/cma: Set qkey for AF_IB
  RDMA/cma: Only listen on IB devices when using AF_IB
  RDMA/ucma: Support querying for AF_IB addresses
  IB/sa: Export function to pack a path record into wire format
  RDMA/ucma: Support querying when IB paths are not reversible
  RDMA/cma: Export cma_get_service_id()
  RDMA/ucma: Add ability to query GID addresses
  RDMA/ucma: Name changes to indicate only IP addresses supported
  RDMA/ucma: Allow user space to bind to AF_IB
  RDMA/ucma: Allow user space to pass AF_IB into resolve
  RDMA/ucma: Allow user space to specify AF_IB when joining multicast
  RDMA/cma: Export AF_IB statistics

Vinit Agnihotri (1):
  IB/qib: Update minor version number

Vu Pham (1):
  IB/srp: Bump driver version and release date

Wei Yongjun (3):
  IB/ehca: Fix error return code in ehca_create_slab_caches()
  RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd()
  IB/core: Fix error return code in add_port()

 Documentation/ABI/stable/sysfs-driver-ib_srp   |7 +
 MAINTAINERS   

Re: [GIT PULL] please pull infiniband.git

2013-07-10 Thread Roland Dreier
On Wed, Jul 10, 2013 at 7:35 AM, Sebastian Riemer
 wrote:
>
> I've checked the commits on that tag and the following commit is not
> what we've agreed on:

Sorry about that.  The discussion was long and complex and I probably
made a mistake in aplying the patches.  Please me send a patch to fix
the driver to what it should be, and I will merge it ASAP.

 - Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-07-09 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.11 merge window:

 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - New mlx5 driver for Mellanox Connect-IB adapters
 - SRP fixes from Bart Van Assche
 - qib HW driver updates
 - Resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - Other small changes and fixes


Bart Van Assche (5):
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Skip host settle delay
  IB/srp: Fail I/O fast if target offline
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Make HCA completion vector configurable

Dan Carpenter (1):
  RDMA/cxgb3: Timeout condition is never true

Dotan Barak (1):
  IB/srp: Fix remove_one crash due to resource exhaustion

Eli Cohen (1):
  mlx5: Add driver for Mellanox Connect-IB adapters

Gottumukkala, Naresh (1):
  RDMA/ocrdma: Remove use_cnt for queues

Jack Morgenstein (1):
  IB/core: Add reserved values to enums for low-level driver use

Mike Marciniszyn (6):
  IB/qib: Add DCA support
  IB/qib: Remove atomic_inc_not_zero() from QP RCU
  IB/qib: Optimize CQ callbacks
  IB/qib: Convert opcode counters to per-context
  IB/qib: Add per-context stats interface
  IB/qib: Add qp_stats debug file

Mitko Haralanov (1):
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane

Naresh Gottumukkala (5):
  RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create
  RDMA/ocrdma: Replace ocrdma_err with pr_err
  RDMA/ocrdma: Set bad_wr in error case
  RDMA/ocrdma: Change macros to inline funtions
  RDMA/ocrdma: Reorg structures to avoid padding

Ramkrishna Vepa (2):
  IB/qib: Add optional NUMA affinity
  IB/qib: Add dual-rail NUMA awareness for PSM processes

Roland Dreier (5):
  mlx5: Fix parameter type of health_handler_t
  IB/mlx5: Make profile[] static in main.c
  mlx5_core: Fixes for sparse warnings
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 
'srp' into for-next

Sean Hefty (28):
  RDMA/cma: Define native IB address
  RDMA/cma: Allow enabling reuseaddr in any state
  RDMA/cma: Include AF_IB in loopback and any address checks
  IB/addr: Add AF_IB support to ip_addr_size
  RDMA/cma: Update port reservation to support AF_IB
  RDMA/cma: Allow user to specify AF_IB when binding
  RDMA/cma: Do not modify sa_family when setting loopback address
  RDMA/cma: Add helper functions to return id address information
  RDMA/cma: Restrict AF_IB loopback to binding to IB devices only
  RDMA/cma: Verify that source and dest sa_family are the same
  RDMA/cma: Add support for AF_IB to rdma_resolve_addr()
  RDMA/cma: Add support for AF_IB to rdma_resolve_route()
  RDMA/cma: Add support for AF_IB to cma_get_service_id()
  RDMA/cma: Remove unused SDP related code
  RDMA/cma: Merge cma_get/save_net_info
  RDMA/cma: Expose private data when using AF_IB
  RDMA/cma: Set qkey for AF_IB
  RDMA/cma: Only listen on IB devices when using AF_IB
  RDMA/ucma: Support querying for AF_IB addresses
  IB/sa: Export function to pack a path record into wire format
  RDMA/ucma: Support querying when IB paths are not reversible
  RDMA/cma: Export cma_get_service_id()
  RDMA/ucma: Add ability to query GID addresses
  RDMA/ucma: Name changes to indicate only IP addresses supported
  RDMA/ucma: Allow user space to bind to AF_IB
  RDMA/ucma: Allow user space to pass AF_IB into resolve
  RDMA/ucma: Allow user space to specify AF_IB when joining multicast
  RDMA/cma: Export AF_IB statistics

Vinit Agnihotri (1):
  IB/qib: Update minor version number

Vu Pham (1):
  IB/srp: Bump driver version and release date

Wei Yongjun (3):
  IB/ehca: Fix error return code in ehca_create_slab_caches()
  RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd()
  IB/core: Fix error return code in add_port()

 Documentation/ABI/stable/sysfs-driver-ib_srp|7 +
 MAINTAINERS |   22 ++
 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 drivers/infiniband/core/addr.c  |   20 +-
 drivers/infiniband/core/cma.c   |  906 
++-
 drivers/infiniband/core/sa_query.c  |6 +
 drivers/infiniband/core/sysfs.c

Re: [PATCH 03/13] infiniband: use get_unused_fd_flags(0) instead of get_unused_fd()

2013-07-08 Thread Roland Dreier
Thanks, I just applied a patch to convert to
get_unused_fd_flags(O_CLOEXEC) in uverbs, since there isn't anything
useful that can be done with uverbs fds across an exec.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Roland Dreier
On Thu, Jun 20, 2013 at 7:48 AM, Christoph Lameter  wrote:
> There is no way that user space can initiate a page pin right now. Perf is
> pinning the page from the kernel. Similarly the IB subsystem pins memory
> meeded for device I/O.

Christoph, your argument would be a lot more convincing if you stopped
repeating this nonsense.  Sure, in a strict sense, it might be true
that the IB subsystem in the kernel is the code that actually pins
memory, but given that unprivileged userspace can tell the kernel to
pin arbitrary parts of its memory for any amount of time, is that
relevant?  And in fact taking your "initiate" word choice above, I
don't even think your statement is true -- userspace initiates the
pinning by, for example, doing an IB memory registration (libibverbs
ibv_reg_mr() call), which turns into a system call, which leads to the
kernel trying to pin pages.  The pages aren't unpinned until userspace
unregisters the memory (or causes a cleanup by closing the context
fd).

Here's an argument by analogy.  Would it make any sense for me to say
userspace can't mlock memory, because only the kernel can set
VM_LOCKED on a vma?  Of course not.  Userspace has the mlock() system
call, and although the actual work happens in the kernel, we clearly
want to be able to limit the amount of memory locked by the kernel ON
BEHALF OF USERSPACE.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: just chaeking correctness of the infiniband tree

2013-06-20 Thread Roland Dreier
On Thu, Jun 20, 2013 at 5:09 PM, Stephen Rothwell  wrote:
> I noticed that the infiniband tree is now based on the net-next tree.  I
> assume that is deliberate?  I do have to question how much testing that
> tree has had since it is now based on a tree that Dave only released in
> the last 24 hours ...

That is intentional since there is work coming that relies on net-next
prerequisites.

The tree hasn't had much testing, but pushing it out a few weeks
before the merge window is the way it gets testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-06-07 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand fixes for 3.10-rc:
 - qib RCU/lockdep fix
 - iser device removal fix, plus doc fixes


Mike Marciniszyn (1):
  IB/qib: Fix lockdep splat in qib_alloc_lkey()

Or Gerlitz (2):
  IB/iser: Add Mellanox copyright
  MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator

Roi Dayan (1):
  IB/iser: Fix device removal flow

Roland Dreier (1):
  Merge branches 'iser' and 'qib' into for-next

 MAINTAINERS  | 10 ++
 drivers/infiniband/hw/qib/qib_keys.c |  2 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c |  1 +
 drivers/infiniband/ulp/iser/iscsi_iser.h |  1 +
 drivers/infiniband/ulp/iser/iser_initiator.c |  1 +
 drivers/infiniband/ulp/iser/iser_memory.c|  1 +
 drivers/infiniband/ulp/iser/iser_verbs.c | 16 +---
 7 files changed, 24 insertions(+), 8 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-05-08 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand/RDMA changes for the 3.10 merge window:
 - XRC transport fixes
 - Fix DHCP on IPoIB
 - mlx4 preparations for flow steering
 - iSER fixes
 - miscellaneous other fixes

Sorry for being so late on this -- I moved houses and the system with
all my private keys was offline for a week or so.


Akinobu Mita (1):
  RDMA: Rename random32() to prandom_u32()

Cong Ding (1):
  RDMA/cxgb3: Fix uninitialized variable

Dotan Barak (1):
  IB/mlx4: Disable VLAN stripping for RAW PACKET QPs

Doug Ledford (1):
  IPoIB: Fix ipoib_hard_header() return value

Eli Cohen (1):
  IB/mlx4: Set link type for RAW PACKET QPs in the QP context

Grant Grundler (1):
  SRPT: Fix odd use of WARN_ON()

Hadar Hen Zion (5):
  mlx4_core: Move DMFS HW structs to common header file
  mlx4: Match DMFS promiscuous field names to firmware spec
  mlx4_core: Change a few DMFS fields names to match firmare spec
  mlx4_core: Directly expose fields of DMFS HW rule control segment
  mlx4_core: Expose a few helpers to fill DMFS HW strucutures

Jack Morgenstein (1):
  mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level

Mike Marciniszyn (2):
  IB/ipath: Correct ipath_verbs_register_sysfs() error handling
  IB/qib: Correct qib_verbs_register_sysfs() error handling

Or Gerlitz (2):
  IB/iser: Return error to upper layers on EAGAIN registration failures
  IB/iser: Add support for iser CM REQ additional info

Roi Dayan (2):
  IB/iser: Add module version
  IB/iser: Move informational messages from error to info level

Roland Dreier (1):
  Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' 
into for-next

Shlomo Pongratz (3):
  IB/core: Verify that QP handler is valid before dispatching events
  mlx4_core: Implement SRQ object lookup from srqn
  IB/mlx4: Fetch XRC SRQ in the CQ polling code

Steve Wise (1):
  RDMA/iwcm: Don't touch cmid after dropping reference

Thadeu Lima de Souza Cascardo (1):
  RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled

 drivers/infiniband/core/iwcm.c  |   2 +
 drivers/infiniband/core/verbs.c |   3 +-
 drivers/infiniband/hw/cxgb3/cxio_resource.c |   4 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c |   2 +-
 drivers/infiniband/hw/cxgb4/id_table.c  |   4 +-
 drivers/infiniband/hw/cxgb4/qp.c|  25 ++---
 drivers/infiniband/hw/ipath/ipath_verbs.c   |  19 ++--
 drivers/infiniband/hw/mlx4/cq.c |  21 +
 drivers/infiniband/hw/mlx4/mad.c|   2 +-
 drivers/infiniband/hw/mlx4/qp.c |   6 ++
 drivers/infiniband/hw/qib/qib_sysfs.c   |   6 +-
 drivers/infiniband/hw/qib/qib_verbs.c   |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c   |   2 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c|  24 ++---
 drivers/infiniband/ulp/iser/iscsi_iser.h|  24 -
 drivers/infiniband/ulp/iser/iser_memory.c   |   3 +-
 drivers/infiniband/ulp/iser/iser_verbs.c|  36 ---
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  16 ++--
 drivers/net/ethernet/mellanox/mlx4/eq.c |   4 +-
 drivers/net/ethernet/mellanox/mlx4/mcg.c| 120 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |  79 
 drivers/net/ethernet/mellanox/mlx4/srq.c|  15 +++
 include/linux/mlx4/device.h | 104 ++--
 include/linux/mlx4/srq.h|   2 +
 27 files changed, 328 insertions(+), 204 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Apr 17 (infiniband/rdma)

2013-04-17 Thread Roland Dreier
On Wed, Apr 17, 2013 at 11:06 AM, Randy Dunlap  wrote:
> on x86_64:
>
> drivers/built-in.o: In function `isert_free_np':
> ib_isert.c:(.text+0x6e8a77): undefined reference to `rdma_destroy_id'
> drivers/built-in.o: In function `isert_conn_setup_qp':
> ib_isert.c:(.text+0x6e9038): undefined reference to `rdma_create_qp'

Nic, I think isert needs a "depends on INFINIBAND_ADDR_TRANS" to avoid this.
(this is coming from the SCSI target tree, not the InfiniBand/RDMA tree)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Roland Dreier
On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
 wrote:
> Sorry, I was wrong. ignore the comments about cgroups. That's still broken.
> (i.e. trying to register RDMA memory while using a cgroup swap limit cause
> the process get killed).
>
> But the GIFT flag patch works (my understanding is that GIFT flag allows the
> adapter to transmit stale memory information, it does not have anything to
> do with cgroups specifically).

The point of the GIFT patch is to avoid triggering copy-on-write so
that memory doesn't blow up during migration.  If that doesn't work
then there's no point to the patch.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Roland Dreier
On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines
 wrote:
> I also removed the IBV_*_WRITE flags on the sender-side and activated
> cgroups with the "memory.memsw.limit_in_bytes" activated and the migration
> with RDMA also succeeded without any problems (both with *and* without GIFT
> also worked).

Not sure I'm interpreting this correctly.  Are you saying that things
worked without actually setting the GIFT flag?   In which case why are
we adding this flag?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-02 Thread Roland Dreier
On Tue, Apr 2, 2013 at 8:51 AM, Michael S. Tsirkin  wrote:
>> At the moment registering an MR breaks COW.  This breaks memory
>> overcommit for users such as KVM: we have a lot of COW pages, e.g.
>> instances of the zero page or pages shared using KSM.
>>
>> If the application does not care that adapter sees stale data (for
>> example, it tracks writes reregisters and resends), it can use a new
>> IBV_ACCESS_GIFT flag to prevent registration from breaking COW.
>>
>> The semantics are similar to that of SPLICE_F_GIFT thus the name.
>>
>> Signed-off-by: Michael S. Tsirkin 
>
> Roland, Michael is yet to test this but could you please
> confirm whether this looks acceptable to you?

The patch itself is reasonable I guess, given the needs of this particular app.

I'm not particularly happy with the name of the flag.  The analogy
with SPLICE_F_GIFT doesn't seem particularly strong and I'm not
convinced even the splice flag name is very understandable.  But in
the RDMA case there's not really any sense in which we're "gifting"
memory to the adapter -- we're just telling the library "please don't
trigger copy-on-write" and it doesn't seem particularly easy for users
to understand that from the flag name.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-03-25 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Small batch of InfiniBand/RDMA fixes for 3.9:

 - Fix for TX lockup in IPoIB
 - QLogic -> Intel update for qib driver
 - Small static checker fix for qib
 - Fix error path return value in cxgb4


Dan Carpenter (1):
  IB/ipath: Silence a static checker warning

Mike Marciniszyn (1):
  IPoIB: Fix send lockup due to missed TX completion

Roland Dreier (1):
  Merge branches 'cxgb4', 'ipoib' and 'qib' into for-next

Vinit Agnihotri (1):
  IB/qib: change QLogic to Intel

Wei Yongjun (1):
  RDMA/cxgb4: Fix error return code in create_qp()

 drivers/infiniband/hw/cxgb4/qp.c  | 4 +++-
 drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +-
 drivers/infiniband/hw/qib/Kconfig | 6 +++---
 drivers/infiniband/hw/qib/qib_driver.c| 5 +++--
 drivers/infiniband/hw/qib/qib_iba6120.c   | 3 ++-
 drivers/infiniband/hw/qib/qib_init.c  | 8 
 drivers/infiniband/hw/qib/qib_sd7220.c| 4 ++--
 drivers/infiniband/hw/qib/qib_verbs.c | 4 ++--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 8 ++--
 firmware/Makefile | 2 +-
 firmware/{qlogic => intel}/sd7220.fw.ihex | 0
 11 files changed, 27 insertions(+), 19 deletions(-)
 rename firmware/{qlogic => intel}/sd7220.fw.ihex (100%)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Roland Dreier
On Thu, Mar 21, 2013 at 1:51 AM, Michael S. Tsirkin  wrote:
>> In that case, no, I don't see any reason for LOCAL_WRITE, since the
>> only RDMA operations that will access this memory are remote reads.
>
> What is the meaning of LOCAL_WRITE then? There are no local
> RDMA writes as far as I can see.

Umm, it means you're giving the local adapter permission to write to
that memory.  So you can use it as a receive buffer or as the target
for remote data from an RDMA read operation.

> OK then what we need is a new flag saying "I really do not
> intend to write into this memory please do not break
> COW or do anything else just in case I do".

Isn't that a shared read-only mapping?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rdma: don't make pages writeable if not requiested

2013-03-21 Thread Roland Dreier
>> I think this change will break the case where userspace tries to
>> register an MR with read-only permission, but intends locally through
>> the CPU to write to the memory.

> Shouldn't it set LOCAL_WRITE then?

We're talking about the permissions for the register MR operation,
right?  (That's what the kernel RDMA driver code that does
get_user_pages() sees)

In that case, no, I don't see any reason for LOCAL_WRITE, since the
only RDMA operations that will access this memory are remote reads.
The writing (that triggers COW) is coming from normal process access
triggering a page fault, etc.  This is a pretty standard way of using
RDMA... For example, I allocate some memory and register it for RDMA
read (and pass the R_Key to the remote system) with only REMOTE_READ
permission.  Then I fill in the memory with the results of some
computation and the remote system does an RDMA read to get those
results.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rdma: don't make pages writeable if not requiested

2013-03-20 Thread Roland Dreier
On Wed, Mar 20, 2013 at 11:18 PM, Michael S. Tsirkin  wrote:
> core/umem.c seems to get the arguments to get_user_pages
> in the reverse order: it sets writeable flag and
> breaks COW for MAP_SHARED if and only if hardware needs to
> write the page.
>
> This breaks memory overcommit for users such as KVM:
> each time we try to register a page to send it to remote, this
> breaks COW.  It seems that for applications that only have
> REMOTE_READ permission, there is no reason to break COW at all.

I proposed a similar (but not exactly the same, see below) patch a
while ago: https://lkml.org/lkml/2012/1/26/7 but read the thread,
especially https://lkml.org/lkml/2012/2/6/265

I think this change will break the case where userspace tries to
register an MR with read-only permission, but intends locally through
the CPU to write to the memory.  If the memory registration is done
while the memory is mapped read-only but has VM_MAYWRITE, then
userspace gets into trouble when COW happens.  In the case you're
describing (although I'm not sure where in KVM we're talking about
using RDMA), what happens if you register memory with only REMOTE_READ
and then COW is triggered because of a local write?  (I'm assuming you
don't want remote access to continue to get the old contents of the
page)

I have to confess that I still haven't had a chance to implement the
proposed FOLL_FOLLOW solution to all of this.

> If the page that is COW has lots of copies, this makes the user process
> quickly exceed the cgroups memory limit.  This makes RDMA mostly useless
> for virtualization, thus the stable tag.

The actual problem description here is a bit too terse for me to
understand.  How do we end up with lots of copies of a COW page?  Why
is RDMA registering the memory any more  special than having everyone
who maps this page actually writing to it and triggering COW?

> ret = get_user_pages(current, current->mm, cur_base,
>  min_t(unsigned long, npages,
>PAGE_SIZE / sizeof (struct page 
> *)),
> -1, !umem->writable, page_list, vma_list);
> +!umem->writable, 1, page_list, vma_list);

The first two parameters in this line being changed are "write" and "force".

I think if we do change this, then we need to pass umem->writable (as
opposed to !umem->writable) for the "write" parameter.  Not sure
whether "force" makes sense or not.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-02-26 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.9:
 - SRP error handling fixes from Bart Van Assche
 - Implementation of memory windows for mlx4 from Shani Michaeli
 - Lots of cxgb4 HW driver fixes from Vipul Pandya
 - Make iSER work for virtual functions, other fixes from Or Gerlitz
 - Fix for bug in qib HW driver from Mike Marciniszyn
 - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
 - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun


Bart Van Assche (4):
  IB/srp: Track connection state properly
  IB/srp: Avoid sending a task management function needlessly
  IB/srp: Avoid endless SCSI error handling loop
  IB/srp: Fail I/O requests if the transport is offline

Dan Carpenter (1):
  IB/mlx4: Fix bug unwinding on error in mlx4_ib_init_sriov()

Itai Garbi (1):
  IPoIB: Don't attempt to release resources on error flow

Julia Lawall (1):
  IB/mlx4: Adjust duplicate test

Mike Marciniszyn (1):
  IB/qib: Fix QP locate/remove race

Or Gerlitz (3):
  IB/iser: Use proper define for the commands per LUN value advertised to 
SCSI ML
  IB/iser: Avoid error prints on EAGAIN registration failures
  IB/iser: Enable iser when FMRs are not supported

Paul Bolle (2):
  RDMA/cxgb4: "cookie" can stay in host endianness
  IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable

Roland Dreier (3):
  IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
  IPoIB: Free ipoib neigh on path record failure so path rec queries are 
retried
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' 
and 'srp' into for-next

Shani Michaeli (10):
  IB/mlx4_ib: Remove local invalidate segment unused fields
  mlx4_core: Rename MPT-related functions to have mpt_ prefix
  mlx4_core: Propagate MR deregistration failures to caller
  IB/core: Add "type 2" memory windows support
  IB/uverbs: Implement memory windows support in uverbs
  mlx4_core: Disable memory windows for virtual functions
  mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
  mlx4: Implement memory windows allocation and deallocation
  IB/mlx4: Support memory window binding
  IB/mlx4: Advertise MW support

Shlomo Pongratz (1):
  IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets

Stefan Hasko (1):
  RDMA/cxgb4: Fix cast warning

Syam Sidhardhan (1):
  IB/mlx4: Remove redundant NULL check before kfree

Vipul Pandya (11):
  RDMA/cxgb4: Abort connections that receive unexpected streaming mode data
  RDMA/cxgb4: Abort connections when moving to ERROR state
  RDMA/cxgb4: Display streaming mode error only if detected in RTS
  RDMA/cxgb4: Keep QP referenced until TID released
  RDMA/cxgb4: Always log async errors
  RDMA/cxgb4: Only log rx_data warnings if cpl status is non-zero
  RDMA/cxgb4: Fix endpoint timeout race condition
  RDMA/cxgb4: Don't reconnect on abort for mpa_rev 1
  RDMA/cxgb4: Don't wakeup threads for MPAv2
  RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish
  RDMA/cxgb4: Address sparse warnings

Wei Yongjun (1):
  RDMA/amso1100: Use module_pci_driver() to simplify the code

Yan Burman (1):
  IPoIB: Add version and firmware info to ethtool reporting

 drivers/infiniband/core/uverbs.h   |   2 +
 drivers/infiniband/core/uverbs_cmd.c   | 121 ++
 drivers/infiniband/core/uverbs_main.c  |  13 +-
 drivers/infiniband/core/verbs.c|   5 +-
 drivers/infiniband/hw/amso1100/c2.c|  13 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c|   5 +-
 drivers/infiniband/hw/cxgb3/iwch_qp.c  |  15 +-
 drivers/infiniband/hw/cxgb4/cm.c   | 170 +++
 drivers/infiniband/hw/cxgb4/device.c   |   5 +-
 drivers/infiniband/hw/cxgb4/ev.c   |   8 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   4 +-
 drivers/infiniband/hw/cxgb4/mem.c  |   5 +-
 drivers/infiniband/hw/cxgb4/qp.c   |   1 +
 drivers/infiniband/hw/ehca/ehca_iverbs.h   |   2 +-
 drivers/infiniband/hw/ehca/ehca_mrmw.c |   5 +-
 drivers/infiniband/hw/mlx4/mad.c   |   7 +-
 drivers/infiniband/hw/mlx4/main.c  |  22 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  18 +-
 drivers/infiniband/hw/mlx4/mr.c|  87 +-
 drivers/infiniband/hw/mlx4/qp.c|  49 --
 drivers/infiniband/hw/mlx4/sysfs

Re: [PATCH v2] IB/mlx4: silence GCC warning

2013-02-25 Thread Roland Dreier
On Mon, Feb 25, 2013 at 8:54 AM, Roland Dreier  wrote:
> I'm finally noticing that this is in the build_mlx_header() function,
> which is pretty much a slow path.  Certainly another compare isn't
> going to change performance given all the other stuff we do there.
>
> Let me look at the patches that have gone by and see what the cleanest
> way to handle this is.

OK, after playing around a bit, I see that just initializing vlan
doesn't really change the generated code (my gcc at least was already
if effect setting vlan in the generated assembly code), so I'll just
merge that.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] IB/mlx4: silence GCC warning

2013-02-25 Thread Roland Dreier
On Sun, Feb 24, 2013 at 4:34 AM, Jack Morgenstein
 wrote:
> However, this approach does add the line below to processing for an IB port 
> (ETH/RoCE port stays same, more or less).
> Processing time is therefore increased (at least on the IB side) relative to 
> just living with the warning.
>
> Roland?

I'm finally noticing that this is in the build_mlx_header() function,
which is pretty much a slow path.  Certainly another compare isn't
going to change performance given all the other stuff we do there.

Let me look at the patches that have gone by and see what the cleanest
way to handle this is.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2013-02-06 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



IB regression fixes for 3.8:
 - Fix mlx4 VFs not working on old guests because of 64B CQE changes
 - Fix ill-considered sparse fix for qib
 - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1


Mike Marciniszyn (1):
  IB/qib: Fix for broken sparse warning fix

Or Gerlitz (1):
  mlx4_core: Fix advertisement of wrong PF context behaviour

Roland Dreier (1):
  Merge branches 'ipoib', 'mlx4' and 'qib' into for-next

Shlomo Pongratz (1):
  IPoIB: Fix crash due to skb double destruct

 drivers/infiniband/hw/qib/qib_qp.c| 11 +++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |  6 +++---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   |  6 +++---
 drivers/net/ethernet/mellanox/mlx4/main.c |  2 +-
 4 files changed, 10 insertions(+), 15 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Fix incorrect length from print_time() when seconds > 99999

2012-12-29 Thread Roland Dreier
On Sat, Dec 29, 2012 at 12:08 PM, Joe Perches  wrote:
> Sylvan Munaut did something similar
> https://lkml.org/lkml/2012/12/5/168

Missed that and duplicated the debugging :(
Sorry Sylvain.

I guess my patch may be preferable, since I happened to use the snprintf()
method that you suggest -- all the open-coded digit-counting seems a bit
verbose and perhaps hard to read and see the equivalence to the sprintf.

But certainly Sylvain fixed this quite a bit earlier and he should get credit.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Fix incorrect length from print_time() when seconds > 99999

2012-12-29 Thread Roland Dreier
On Sat, Dec 29, 2012 at 9:56 AM, Greg Kroah-Hartman
 wrote:
> Nice work.  When did you start seeing this problem, 3.6 or so?  I ask as
> it's probably something that should go to stable as well if so.

We happened to see it when we rebased to the 3.6 kernel, but as far as I
can see, the bug has been there as long as print_time(), which comes from
084681d14e42 ("printk: flush continuation lines immediately to console")
in 3.5-rc5.  When I was doing a web search for info on the problem, I found
at least the following reports:

https://bbs.archlinux.org/viewtopic.php?id=148100
https://lkml.org/lkml/2012/10/17/81

so yeah this seems to be stable material.

> Andrew seems to be keeping the printk patches these days, so I'll let
> him pick this up with:
>
> Signed-off-by: Greg Kroah-Hartman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] printk: Fix incorrect length from print_time() when seconds > 99999

2012-12-28 Thread Roland Dreier
From: Roland Dreier 

print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value.  However, this is obviously incorrect when
the whole seconds part of the time gets beyond 5 digits (10
seconds is a bit more than a day of uptime).

The simple fix is to use snprintf(NULL, 0, ...) to calculate the
actual length of the time prefix.  This could be micro-optimized but
it seems better to have simpler, more readable code here.

The bug leads to the syslog system call miscomputing which messages
fit into the userspace buffer.  If there are enough messages to fill
log_buf_len and some have a timestamp >= 10, dmesg may fail with:

# dmesg
klogctl: Bad address

When this happens, strace shows that the failure is indeed EFAULT due
to the kernel mistakenly accessing past the end of dmesg's buffer,
since dmesg asks the kernel how big a buffer it needs, allocates a bit
more, and then gets an error when it asks the kernel to fill it:

syslog(0xa, 0, 0)   = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x18)   = -1 EFAULT (Bad address)

Signed-off-by: Roland Dreier 
---
 kernel/printk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 19c0d7b..357f714 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -870,10 +870,11 @@ static size_t print_time(u64 ts, char *buf)
if (!printk_time)
return 0;
 
+   rem_nsec = do_div(ts, 10);
+
if (!buf)
-   return 15;
+   return snprintf(NULL, 0, "[%5lu.00] ", (unsigned long)ts);
 
-   rem_nsec = do_div(ts, 10);
return sprintf(buf, "[%5lu.%06lu] ",
   (unsigned long)ts, rem_nsec / 1000);
 }
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2012-12-21 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Second batch of InfiniBand/RDMA changes for 3.8:
 - cxgb4 changes to fix lookup engine hash collisions
 - mlx4 changes to make flow steering usable
 - fix to IPoIB to avoid pinning dst reference for too long


Hadar Hen Zion (2):
  mlx4_core: Add QPN enforcement for flow steering rules set by VFs
  mlx4_core: Fix error flow in the flow steering wrapper

Jack Morgenstein (2):
  mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
  mlx4_core: Allow choosing flow steering mode

Roland Dreier (2):
  IPoIB: Call skb_dst_drop() once skb is enqueued for sending
  Merge branches 'cxgb4', 'ipoib' and 'mlx4' into for-next

Vipul Pandya (5):
  cxgb4: Add T4 filter support
  cxgb4: Add LE hash collision bug fix path in LLD driver
  RDMA/cxgb4: Fix LE hash collision bug for active open connection
  RDMA/cxgb4: Fix LE hash collision bug for passive open connection
  RDMA/cxgb4: Fix bug for active and passive LE hash collision path

 drivers/infiniband/hw/cxgb4/cm.c   | 791 ++---
 drivers/infiniband/hw/cxgb4/device.c   | 210 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  33 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|   3 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|   3 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 136 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 459 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |  23 +-
 drivers/net/ethernet/chelsio/cxgb4/l2t.c   |  32 +
 drivers/net/ethernet/chelsio/cxgb4/l2t.h   |   3 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  22 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|  66 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |  37 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  | 418 +++
 drivers/net/ethernet/mellanox/mlx4/fw.c|  15 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h|   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c  | 115 ++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c   |   7 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |   6 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  28 +-
 drivers/scsi/csiostor/t4fw_api_stor.h  |  39 -
 include/linux/mlx4/device.h|   1 +
 22 files changed, 2234 insertions(+), 214 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2012-12-19 Thread Roland Dreier
On Wed, Dec 19, 2012 at 2:44 PM, Stephen Rothwell  wrote:
> Hi all,
>
> After merging the infiniband tree, today's linux-next build (x86_64_
> allmodconfig) failed like this:
>
> In file included from drivers/scsi/csiostor/csio_wr.h:42:0,
>  from drivers/scsi/csiostor/csio_scsi.h:49,
>  from drivers/scsi/csiostor/csio_init.h:45,
>  from drivers/scsi/csiostor/csio_attr.c:45:
> drivers/scsi/csiostor/t4fw_api_stor.h:43:6: error: nested redefinition of 
> 'enum fw_retval'
> drivers/scsi/csiostor/t4fw_api_stor.h:43:6: error: redeclaration of 'enum 
> fw_retval'
> drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:42:6: note: originally defined 
> here
> drivers/scsi/csiostor/t4fw_api_stor.h:48:2: error: redeclaration of 
> enumerator 'FW_ENOEXEC'
> drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:39:2: note: previous definition 
> of 'FW_ENOEXEC' was here
> drivers/scsi/csiostor/t4fw_api_stor.h:50:2: error: redeclaration of 
> enumerator 'FW_ENOMEM'
> drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:43:2: note: previous definition 
> of 'FW_ENOMEM' was here
> drivers/scsi/csiostor/t4fw_api_stor.h:58:2: error: redeclaration of 
> enumerator 'FW_EADDRINUSE'
> drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h:44:2: note: previous definition 
> of 'FW_EADDRINUSE' was here
>
> And several others similar.
>
> Caused by commit f65b56b15931 ("RDMA/cxgb4: Fix LE hash collision bug for
> active open connection").


Vipul, is the right fix to pull the full list of FW return values from
t4fw_api_stor.h into t4fw_api.h as part of this patch (f65b56b15931)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >