Re: [lustre-discuss] server_bulk_callback errors until server reboots

2018-06-07 Thread Hebenstreit, Michael
Thanks – I do not have different type IB within one fabric. But with this info 
I found a few nodes that showed that error, but they are not matching the 
errors I see on the server.

Btw - I got the problem resolved on one FS after upgrading to Lustre 2.11

From: Raj [mailto:rajgau...@gmail.com]
Sent: Thursday, June 07, 2018 10:36 AM
To: Hebenstreit, Michael 
Cc: White, Cliff ; lustre-discuss 

Subject: Re: [lustre-discuss] server_bulk_callback errors until server reboots

I seen the error when we had mix of FDR (using mlx4) and EDR(using mlx5) 
devices in lustre network. server_bulk_callback should have the corresponding 
client_bulk_callback in client.

http://wiki.lustre.org/Infiniband_Configuration_Howto
On Thu, Jun 7, 2018 at 11:24 AM Hebenstreit, Michael 
mailto:michael.hebenstr...@intel.com>> wrote:
No, clients do not show any issues.

-Original Message-
From: White, Cliff
Sent: Thursday, June 07, 2018 9:26 AM
To: Hebenstreit, Michael 
mailto:michael.hebenstr...@intel.com>>; 
lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] server_bulk_callback errors until server reboots


On 6/7/18, 7:00 AM, "lustre-discuss on behalf of Hebenstreit, Michael" 
mailto:lustre-discuss-boun...@lists.lustre.org>
 on behalf of 
michael.hebenstr...@intel.com> wrote:

Hello

I have now 2 Lustre systems that suddenly show this error - on a single OST 
the kernel log is filling with messages

[58858.365663] LustreError: 123642:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880524f7e000
[58865.328317] LustreError: 123640:0:(events.c:447:server_bulk_callback()) 
event type 5, status -61, desc 880cab4ec800
[58865.340792] LustreError: 123641:0:(events.c:447:server_bulk_callback()) 
event type 5, status -61, desc 880524f7c600
[58865.353167] LustreError: 123640:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880cab4ec800
[58865.365503] LustreError: 123641:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880524f7c600

until the server reboots. Clients are on 2.11/RH7.5, servers are on 
2.7.19.10/RH7.4 . Has anyone experienced this before?

There should be some corresponding error messages on your clients, have you 
checked there?
cliffw

Thanks
Michael


Michael Hebenstreit Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  Core and Visual Compute Group (DCE)
4100 Sara 
Road
  Tel.:   +1 505-794-3144
Rio Rancho, NM 87124
UNITED STATES   E-mail: 
michael.hebenstr...@intel.com



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] server_bulk_callback errors until server reboots

2018-06-07 Thread Raj
I seen the error when we had mix of FDR (using mlx4) and EDR(using mlx5)
devices in lustre network. server_bulk_callback should have the
corresponding client_bulk_callback in client.

http://wiki.lustre.org/Infiniband_Configuration_Howto
On Thu, Jun 7, 2018 at 11:24 AM Hebenstreit, Michael <
michael.hebenstr...@intel.com> wrote:

> No, clients do not show any issues.
>
> -Original Message-
> From: White, Cliff
> Sent: Thursday, June 07, 2018 9:26 AM
> To: Hebenstreit, Michael ; lustre-discuss <
> lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] server_bulk_callback errors until server
> reboots
>
>
> On 6/7/18, 7:00 AM, "lustre-discuss on behalf of Hebenstreit, Michael" <
> lustre-discuss-boun...@lists.lustre.org on behalf of
> michael.hebenstr...@intel.com> wrote:
>
> Hello
>
> I have now 2 Lustre systems that suddenly show this error - on a
> single OST the kernel log is filling with messages
>
> [58858.365663] LustreError:
> 123642:0:(events.c:447:server_bulk_callback()) event type 3, status -61,
> desc 880524f7e000
> [58865.328317] LustreError:
> 123640:0:(events.c:447:server_bulk_callback()) event type 5, status -61,
> desc 880cab4ec800
> [58865.340792] LustreError:
> 123641:0:(events.c:447:server_bulk_callback()) event type 5, status -61,
> desc 880524f7c600
> [58865.353167] LustreError:
> 123640:0:(events.c:447:server_bulk_callback()) event type 3, status -61,
> desc 880cab4ec800
> [58865.365503] LustreError:
> 123641:0:(events.c:447:server_bulk_callback()) event type 3, status -61,
> desc 880524f7c600
>
> until the server reboots. Clients are on 2.11/RH7.5, servers are on
> 2.7.19.10/RH7.4 . Has anyone experienced this before?
>
> There should be some corresponding error messages on your clients, have
> you checked there?
> cliffw
>
> Thanks
> Michael
>
>
> 
> Michael Hebenstreit Senior Cluster Architect
> Intel Corporation, MS: RR1-105/H14  Core and Visual Compute Group (DCE)
> 4100 Sara Road
> 
> Tel.:   +1 505-794-3144
> Rio Rancho, NM 87124
> UNITED STATES   E-mail:
> michael.hebenstr...@intel.com
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] server_bulk_callback errors until server reboots

2018-06-07 Thread Hebenstreit, Michael
No, clients do not show any issues. 

-Original Message-
From: White, Cliff 
Sent: Thursday, June 07, 2018 9:26 AM
To: Hebenstreit, Michael ; lustre-discuss 

Subject: Re: [lustre-discuss] server_bulk_callback errors until server reboots


On 6/7/18, 7:00 AM, "lustre-discuss on behalf of Hebenstreit, Michael" 
 wrote:

Hello

I have now 2 Lustre systems that suddenly show this error - on a single OST 
the kernel log is filling with messages 

[58858.365663] LustreError: 123642:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880524f7e000
[58865.328317] LustreError: 123640:0:(events.c:447:server_bulk_callback()) 
event type 5, status -61, desc 880cab4ec800
[58865.340792] LustreError: 123641:0:(events.c:447:server_bulk_callback()) 
event type 5, status -61, desc 880524f7c600
[58865.353167] LustreError: 123640:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880cab4ec800
[58865.365503] LustreError: 123641:0:(events.c:447:server_bulk_callback()) 
event type 3, status -61, desc 880524f7c600

until the server reboots. Clients are on 2.11/RH7.5, servers are on 
2.7.19.10/RH7.4 . Has anyone experienced this before?

There should be some corresponding error messages on your clients, have you 
checked there? 
cliffw

Thanks
Michael


Michael Hebenstreit Senior Cluster Architect
Intel Corporation, MS: RR1-105/H14  Core and Visual Compute Group (DCE)
4100 Sara Road  Tel.:   +1 505-794-3144 
Rio Rancho, NM 87124
UNITED STATES   E-mail: michael.hebenstr...@intel.com



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] checking and turning on quota enforcement.

2018-06-07 Thread Guido Laubender



Hi Phill,

On Wed, 30 May 2018, Phill Harvey-Smith wrote:

However it has just become apparent that the new servers are not enforcing 
quotas, though they are set for all users on two of our volumes. This has 
lead to the situation where one of our volumes is almost full, and is 
negatively affecting performance.


Looking at the documentation for lfs suotaon / quotaoff it suggests that they 
are depreciated since lustre version 2.4.x. We are currently running 2.9.0 on 
Centos 7.


How can I check to see if quota enforcement is turned on, and if not turn it 
on once we have people back under quota again (or adjusted their quota).


You may have a look here for enabling disk quotas:
http://doc.lustre.org/lustre_manual.xhtml#enabling_disk_quotas

To verify that quota enforcement is configured, run the command 'lctl 
get_param osd-*.*.quota_slave.info'on the MDS (see section 24.2.2.1. 
Quota Verification).


Cheers,
Guido
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org