Re: [PATCH] iser-target: Handle errors from isert_put_datain and isert_get_dataout

2015-03-06 Thread Nicholas A. Bellinger
On Sat, 2015-03-07 at 04:16 +0200, Sagi Grimberg wrote:
> On 3/6/2015 7:56 PM, Chris Moore wrote:
> > isert_put_datain() always returns 1 and isert_get_dataout() always returns 
> > 0, even if
> > ib_post_send() fails.   They should return an error in this case so the 
> > caller can handle it.
> > Also, in the case of an ib_post_send() failure, user isert_err instead of 
> > isert_warn.
> >
> > With these changes, these two functions handle errors from ib_post_send() 
> > in the
> > same way as other functions within ib_isert.c
> >
> 
> Hi Chris,
> 
> This is indeed needed, but I'm afraid this is not complete given the
> rc is completely ignored by the callers (see 
> lio_queue_data_in/lio_write_pending).
> 

So lio_write_pending() is propagating up the return back to
transport_generic_new_cmd().  When the return is -EAGAIN or -ENOMEM,
it triggers transport_handle_queue_full() to retry ->write_pending()
from se_device->qf_work_queue context.

It's lio_queue_data_in() + lio_queue_status() that aren't propagating up
failures to trigger queue_full in target_complete_ok_work().  Looking at
this code again for traditional iscsi-target, I don't see a reason why
iscsit_add_cmd_to_response_queue() failure should not be triggering
queue_full logic to kick in.. 

On the iser-target side, is it OK for isert_put_datain() +
isert_put_response() to be re-invoked from transport_complete_qf()
context after ib_post_send() failure..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA in SLinux

2015-03-06 Thread Sagi Grimberg

On 3/5/2015 9:54 PM, Francisco Manuel Cardoso wrote:

Hello,



Sorry newcomer to the group at the moment, brief question i hope someone can
at least point me.

Are there any considerations regarding NFS over RDMA on Linux SL6 ?

Question I've been setting up/using an HPC cluster and NFS over IPoIB it's
cool as soon as start dishing out things onto with the RDMA things go crazy.

The tipical setup is each machine is able to handle max 40 processes, using
all of those to mpi, I seem to be having some performance issues, if I scale
down to 39 I get much better performance still it crashes.

Anyone got any pointers ?


I'm not sure if you're asking about NFS over IPoIB or NFSoRDMA?

CC'ing Chuck which is probably the best help you can get...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iser-target: Handle errors from isert_put_datain and isert_get_dataout

2015-03-06 Thread Sagi Grimberg

On 3/6/2015 7:56 PM, Chris Moore wrote:

isert_put_datain() always returns 1 and isert_get_dataout() always returns 0, 
even if
ib_post_send() fails.   They should return an error in this case so the caller 
can handle it.
Also, in the case of an ib_post_send() failure, user isert_err instead of 
isert_warn.

With these changes, these two functions handle errors from ib_post_send() in the
same way as other functions within ib_isert.c



Hi Chris,

This is indeed needed, but I'm afraid this is not complete given the
rc is completely ignored by the callers (see 
lio_queue_data_in/lio_write_pending).


Did you really see any difference with this patch?


Signed-off-by: Chris Moore 

---

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 075b19c..7394ba9 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2860,8 +2860,10 @@ isert_put_datain(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd)
}

rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
-   if (rc)
-   isert_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
+   if (rc) {
+   isert_err("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
+   return rc;
+   }

if (!isert_prot_cmd(isert_conn, se_cmd))
isert_dbg("Cmd: %p posted RDMA_WRITE + Response for iSER Data "
@@ -2894,8 +2896,10 @@ isert_get_dataout(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd, bool recovery)
}

rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
-   if (rc)
-   isert_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
+   if (rc) {
+   isert_err("ib_post_send() failed for IB_WR_RDMA_READ\n");
+   return rc;
+   }

isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
 isert_cmd);
---

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-06 Thread Weiny, Ira
> 
> On Wed, Mar 04, 2015 at 07:21:48AM +, Weiny, Ira wrote:
> 
> > I think this is going to break quite a bit.  I have prototyped setting
> > OPA devices to "OPA Link Layer" and the perftest tools just fall over.
> > Any changes to the Link layer or the transport types will require a
> > transition period for ULPs.
> 
> How do the perftest tools work with OPA in the first place? OPA seems to have
> 32 bit lids. Do you mean it 'works' as long as the lid is < 16 bits? Same 
> general
> point about all of verbs, lots of 'uint16_t lid' in the interfaces?

The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA 
does not support > 16 bit LIDs.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iser-target: Handle errors from isert_put_datain and isert_get_dataout

2015-03-06 Thread Chris Moore
isert_put_datain() always returns 1 and isert_get_dataout() always returns 0, 
even if
ib_post_send() fails.   They should return an error in this case so the caller 
can handle it.
Also, in the case of an ib_post_send() failure, user isert_err instead of 
isert_warn.  

With these changes, these two functions handle errors from ib_post_send() in 
the 
same way as other functions within ib_isert.c

Signed-off-by: Chris Moore 

---

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 075b19c..7394ba9 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2860,8 +2860,10 @@ isert_put_datain(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd)
}
 
rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
-   if (rc)
-   isert_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
+   if (rc) {
+   isert_err("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
+   return rc;
+   }
 
if (!isert_prot_cmd(isert_conn, se_cmd))
isert_dbg("Cmd: %p posted RDMA_WRITE + Response for iSER Data "
@@ -2894,8 +2896,10 @@ isert_get_dataout(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd, bool recovery)
}
 
rc = ib_post_send(isert_conn->conn_qp, wr->send_wr, &wr_failed);
-   if (rc)
-   isert_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
+   if (rc) {
+   isert_err("ib_post_send() failed for IB_WR_RDMA_READ\n");
+   return rc;
+   }
 
isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
 isert_cmd);
---

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag

2015-03-06 Thread Jason Gunthorpe
On Wed, Mar 04, 2015 at 07:21:48AM +, Weiny, Ira wrote:

> I think this is going to break quite a bit.  I have prototyped
> setting OPA devices to "OPA Link Layer" and the perftest tools just
> fall over.  Any changes to the Link layer or the transport types
> will require a transition period for ULPs.

How do the perftest tools work with OPA in the first place? OPA seems
to have 32 bit lids. Do you mean it 'works' as long as the lid is < 16
bits? Same general point about all of verbs, lots of 'uint16_t lid' in
the interfaces?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH librdmacm] rstream.c: Add missing binding to source address in client_connect

2015-03-06 Thread Hal Rosenstock

This is needed for IPv6 connections.

Signed-off-by: Hal Rosenstock 
---
diff --git a/examples/rstream.c b/examples/rstream.c
index 05598a8..d93e9aa 100644
--- a/examples/rstream.c
+++ b/examples/rstream.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2011-2012 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2014-2015 Mellanox Technologies LTD. All rights reserved.
  *
  * This software is available to you under the OpenIB.org BSD license
  * below:
@@ -401,8 +402,8 @@ static int server_connect(void)
 
 static int client_connect(void)
 {
-   struct rdma_addrinfo *rai = NULL;
-   struct addrinfo *ai;
+   struct rdma_addrinfo *rai = NULL, *rai_src = NULL;
+   struct addrinfo *ai, *ai_src;
struct pollfd fds;
int ret, err;
socklen_t len;
@@ -415,6 +416,20 @@ static int client_connect(void)
return ret;
}
 
+   if (src_addr) {
+   if (use_rgai) {
+   rai_hints.ai_flags |= RAI_PASSIVE;
+   ret = rdma_getaddrinfo(src_addr, port, &rai_hints, 
&rai_src);
+   } else {
+   ai_hints.ai_flags |= RAI_PASSIVE;
+   ret = getaddrinfo(src_addr, port, &ai_hints, &ai_src);
+   }
+   if (ret) {
+   perror("getaddrinfo src_addr");
+   return ret;
+   }
+   }
+
rs = rai ? rs_socket(rai->ai_family, SOCK_STREAM, 0) :
   rs_socket(ai->ai_family, SOCK_STREAM, 0);
if (rs < 0) {
@@ -424,7 +439,15 @@ static int client_connect(void)
}
 
set_options(rs);
-   /* TODO: bind client to src_addr */
+
+   if (src_addr) {
+   ret = rai ? rs_bind(rs, rai_src->ai_src_addr, 
rai_src->ai_src_len) :
+   rs_bind(rs, ai_src->ai_addr, ai_src->ai_addrlen);
+   if (ret) {
+   perror("rbind");
+   goto close;
+   }
+   }
 
if (rai && rai->ai_route) {
ret = rs_setsockopt(rs, SOL_RDMA, RDMA_ROUTE, rai->ai_route,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] IB/srp: Add 64-bit LUN support

2015-03-06 Thread Yann Droneaud
Hi,

Le mercredi 04 mars 2015 à 16:58 +0100, Bart Van Assche a écrit :
> The SCSI standard defines 64-bit values for LUNs. Large arrays
> employing large or hierarchical LUN numbers become more and more
> common. So update the SRP initiator to use 64-bit LUN numbers.
> See also Hannes Reinecke, commit 9cb78c16f5da ("scsi: use 64-bit LUNs"),
> June 2014.
> 
> The largest LUN number that has been tested is 0xd2003fff.
> 
> The following structure sizes have been verified with gdb:
> * sizeof(struct srp_cmd) = 48
> * sizeof(struct srp_tsk_mgmt) = 48
> * sizeof(struct srp_aer_req) = 36
> 
> The ibmvscsi changes have been compile tested only (on a PPC system).
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Sagi Grimberg 
> Cc: Sebastian Parschauer 
> Cc: Brian King 
> Cc: Nathan Fontenot 
> Cc: Tyrel Datwyler 
> ---
> 
> Changes compared to v1:
> - Removed SRP_MAX_LUN definition from ib_srp.h
> 

Thanks.

>  drivers/infiniband/ulp/srp/ib_srp.c | 12 ++--
>  drivers/infiniband/ulp/srp/ib_srp.h |  1 -
>  drivers/scsi/ibmvscsi/ibmvscsi.c|  6 +++---
>  include/scsi/srp.h  |  7 ---
>  4 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index a0e24a8..e427454 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1842,7 +1842,7 @@ static void srp_process_aer_req(struct srp_rdma_ch *ch,
>   s32 delta = be32_to_cpu(req->req_lim_delta);
>  
>   shost_printk(KERN_ERR, target->scsi_host, PFX
> -  "ignoring AER for LUN %llu\n", be64_to_cpu(req->lun));
> +  "ignoring AER for LUN %llu\n", scsilun_to_int(&req->lun));
>  
>   if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
>   shost_printk(KERN_ERR, target->scsi_host, PFX
> @@ -2034,7 +2034,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
> struct scsi_cmnd *scmnd)
>   memset(cmd, 0, sizeof *cmd);
>  
>   cmd->opcode = SRP_CMD;
> - cmd->lun= cpu_to_be64((u64) scmnd->device->lun << 48);
> + int_to_scsilun(scmnd->device->lun, &cmd->lun);
>   cmd->tag= tag;
>   memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
>  
> @@ -2414,8 +2414,8 @@ srp_change_queue_depth(struct scsi_device *sdev, int 
> qdepth)
>   return scsi_change_queue_depth(sdev, qdepth);
>  }
>  
> -static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag,
> -  unsigned int lun, u8 func)
> +static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag, u64 lun,
> +  u8 func)
>  {
>   struct srp_target_port *target = ch->target;
>   struct srp_rport *rport = target->rport;
> @@ -2449,7 +2449,7 @@ static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, 
> u64 req_tag,
>   memset(tsk_mgmt, 0, sizeof *tsk_mgmt);
>  
>   tsk_mgmt->opcode= SRP_TSK_MGMT;
> - tsk_mgmt->lun   = cpu_to_be64((u64) lun << 48);
> + int_to_scsilun(lun, &tsk_mgmt->lun);
>   tsk_mgmt->tag   = req_tag | SRP_TAG_TSK_MGMT;
>   tsk_mgmt->tsk_mgmt_func = func;
>   tsk_mgmt->task_tag  = req_tag;
> @@ -3146,7 +3146,7 @@ static ssize_t srp_create_target(struct device *dev,
>   target_host->transportt  = ib_srp_transport_template;
>   target_host->max_channel = 0;
>   target_host->max_id  = 1;
> - target_host->max_lun = SRP_MAX_LUN;
> + target_host->max_lun = -1LL;
>   target_host->max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)->cdb;
>  
>   target = host_to_target(target_host);
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
> b/drivers/infiniband/ulp/srp/ib_srp.h
> index a611556..ce6dcf8 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -54,7 +54,6 @@ enum {
>   SRP_DLID_REDIRECT   = 2,
>   SRP_STALE_CONN  = 3,
>  
> - SRP_MAX_LUN = 512,
>   SRP_DEF_SG_TABLESIZE= 12,
>  
>   SRP_DEFAULT_QUEUE_SIZE  = 1 << 6,
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index acea5d6..6a41c36 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -1053,7 +1053,7 @@ static int ibmvscsi_queuecommand_lck(struct scsi_cmnd 
> *cmnd,
>   memset(srp_cmd, 0x00, SRP_MAX_IU_LEN);
>   srp_cmd->opcode = SRP_CMD;
>   memcpy(srp_cmd->cdb, cmnd->cmnd, sizeof(srp_cmd->cdb));
> - srp_cmd->lun = cpu_to_be64(((u64)lun) << 48);
> + int_to_scsilun(lun, &srp_cmd->lun);
>  
>   if (!map_data_for_srp_cmd(cmnd, evt_struct, srp_cmd, hostdata->dev)) {
>   if (!firmware_has_feature(FW_FEATURE_CMO))
> @@ -1529,7 +1529,7 @@ static int ibmvscsi_eh_abort_handler(struct scsi_cmnd 
> *cmd)
>   /* Set up an abort SRP command */
>   memset(tsk_mgmt, 0x00, sizeof(*tsk_mgmt));
> 

Re: Mellanox Technologies MT23108 causes #MC exceptions under heavy load

2015-03-06 Thread Maxim Levitsky
False alarm, had exactly the same failure with infiniband disabled.

Best regards,
 Maxim Levitsky

On Fri, Mar 6, 2015 at 5:35 AM, Maxim Levitsky  wrote:
> We are running CPU and network heavy test on marmot.pdl.cmu.edu cluster.
> It has Mellanox Technologies MT23108 InfiniHost controller.
>
> When we start using it for network communications, after just few
> minutes some of the nodes of the cluster die
> with the following machine check exception.
> I repeated this test with Ethernet few times and had not an single
> failure so far (I thought to had one but it turned to be another
> unrelated issue)
>
> It happened already on most nodes of this 128 node cluster, thus I
> expect this to be kernel bug.
> Do you have any pointers what we could try?
>
> I compiled and tested current HEAD  of the vanilla kernel
> (99aedde0869ce194539166ac5a4d2e1a20995348)
> 4.0.0-rc2
> but this happens even on 2.6.38 (which was in one of
> their stock kernel images).
>
> Best regards,
>   Maxim Levitsky
>
> The kernel log of failure captured via serial console:
>
> [  297.575167] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  564.704428] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  951.619320] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  956.790789] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  957.301036] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  957.333938] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  957.924656] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  958.125879] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  958.147588] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  958.485607] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  959.050155] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  959.120109] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  960.048666] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  960.110928] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  960.754363] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  961.390093] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  972.199782] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  972.496511] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  983.078444] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  983.618178] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [  991.365565] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [ 1003.344498] ib0: can't use GFP_NOIO for QPs on device mthca0, using
> GFP_KERNEL
> [ 1013.748036] Disabling lock debugging due to kernel taint
> [ 1013.747903] [Hardware Error]: System Fatal error.
> [ 1013.747903] [Hardware Error]: CPU:0 (f:5:1)
> MC4_STATUS[-|UE|-|PCC|-]: 0xb2070f0f
> [ 1013.747903] [Hardware Error]: MC4 Error (node 0): Watchdog timeout
> due to lack of progress.
> [ 1013.747903] [Hardware Error]: cache level: L3/GEN, mem/io: GEN,
> mem-tx: GEN, part-proc: GEN (timed out)
> [ 1013.747903] mce: [Hardware Error]: CPU 0: Machine Check Exception:
> 4 Bank 4: b2070f0f
> [ 1013.747903] mce: [Hardware Error]: TSC 1a2dcecb6b8
> [ 1013.747903] mce: [Hardware Error]: PROCESSOR 2:f51 TIME 1425610753
> SOCKET 0 APIC 0 microcode 0
> [ 1013.747903] [Hardware Error]: System Fatal error.
> [ 1013.747903] [Hardware Error]: CPU:0 (f:5:1)
> MC4_STATUS[-|UE|-|PCC|-]: 0xb2070f0f
> [ 1013.747903] [Hardware Error]: MC4 Error (node 0): Watchdog timeout
> due to lack of progress.
> [ 1013.747903] [Hardware Error]: cache level: L3/GEN, mem/io: GEN,
> mem-tx: GEN, part-proc: GEN (timed out)
> [ 1013.747903] mce: [Hardware Error]: Machine check: Processor context corrupt
> [ 1013.747903] Kernel panic - not syncing: Fatal machine check on current CPU
> [ 1013.748036] [Hardware Error]: System Fatal error.
> [ 1013.748036] [Hardware Error]: CPU:1 (f:5:1)
> MC4_STATUS[-|UE|-|PCC|-]: 0xb2070f0f
> [ 1013.748036] [Hardware Error]: MC4 Error (node 1): Watchdog timeout
> due to lack of progress.
> [ 1013.748036] [Hardware Error]: cache level: L3/GEN, mem/io: GEN,
> mem-tx: GEN, part-proc: GEN (timed out)
> [ 1013.747903] Kernel Offset: disabled
> [ 1013.747903] ---[ end Kernel panic - not syncing: Fatal machine
> check on current CPU
> [ 1019.239423] [ cut here ]
> [ 1019.244144] WARNING: CPU: 0 PID: 13875 at arch/x86/kernel/smp.c:124
> native_smp_send_reschedule+0x5f/0x70()
> [ 1019.249416] Modules linked in: ib_ipoib ib_cm ib_sa nfsv2 nfs lockd
> sunrpc grace i2c_piix4 ib_mthca ib_mad ib_core ib_addr shpchp
> amd64_edac_mod i2c_amd756