Re: [lustre-discuss] lustre-discuss Digest, Vol 150, Issue 14

2018-09-11 Thread Andreas Dilger
s]
>> [1353498.970868]  [] ? llog_init_handle+0xd5/0x9a0
>> [obdclass] [1353498.970878]  [] ?
>> llog_open_create+0x78/0x320 [obdclass] [1353498.970883]
>> [] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.970887]
>> [] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.970894]
>> [] mdt_prepare+0x57/0x3b0 [mdt] [1353498.970908]
>> [] server_start_targets+0x234d/0x2bd0 [obdclass]
>> [1353498.970924]  [] ?
>> class_config_dump_handler+0x7e0/0x7e0 [obdclass] [1353498.970938]
>> [] server_fill_super+0x109d/0x185a [obdclass]
>> [1353498.970950]  [] lustre_fill_super+0x328/0x950
>> [obdclass] [1353498.970962]  [] ?
>> lustre_common_put_super+0x270/0x270 [obdclass] [1353498.970964]
>> [] mount_nodev+0x4f/0xb0 [1353498.970976]
>> [] lustre_mount+0x38/0x60 [obdclass]
>> [1353498.970978]  [] mount_fs+0x3e/0x1b0
>> [1353498.970980]  [] vfs_kern_mount+0x67/0x110
>> [1353498.970982]  [] do_mount+0x1ef/0xce0
>> [1353498.970984]  [] ?
>> kmem_cache_alloc_trace+0x3c/0x200 [1353498.970986]
>> [] SyS_mount+0x83/0xd0 [1353498.970989]
>> [] system_call_fastpath+0x1c/0x21 [1353498.970996]
>> LustreError: 25582:0:(mdd_device.c:354:mdd_changelog_llog_init())
>> 36ca26b-MDD: changelog init failed: rc = -30 [1353498.972790]
>> LustreError: 25582:0:(mdd_device.c:427:mdd_changelog_init())
>> 36ca26b-MDD: changelog setup during init failed: rc = -30
>> [1353498.974525] LustreError:
>> 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD: failed to
>> initialize changelog: rc = -30 [1353498.976229] LustreError:
>> 25582:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start
>> targets: -30 [1353499.072002] LustreError:
>> 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-30)
>> 
>> 
>> I?m hoping those traces mean something to someone - any ideas?
>> 
>> Thanks!
>> 
>> --
>> Benjamin S. Kirk
>> 
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> Cheers, Andreas
>> ---
>> Andreas Dilger
>> CTO Whamcloud
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> 
>> lustre-discuss mailing list
>> 
>> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> 
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> 
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/160cde21/attachment.html>
>> 
>> --
>> 
>> Subject: Digest Footer
>> 
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> 
>> --
>> 
>> End of lustre-discuss Digest, Vol 150, Issue 14
>> ***
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud






signature.asc
Description: Message signed with OpenPGP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre error when trying to mount

2018-09-11 Thread Riccardo Veraldi

here is the reason, it's a CENTOS 7.5 kernel bug

https://bugs.centos.org/view.php?id=15193

On 9/10/18 11:05 PM, Riccardo Veraldi wrote:


hello,

I installed a new Lustre system where MDS and OSSes are version 2.10.5

the lustre clients are running 2.10.1 and 2.9.0

when I try to mount the filesystem it fails with these errors:

OSS:

Sep 10 22:39:46 psananehoss01 kernel: LNetError: 
10055:0:(o2iblnd_cb.c:2513:kiblnd_passive_connect()) Can't accept 
172.21.52.33@o2ib2: -22
Sep 10 22:39:46 psananehoss01 kernel: LNet: 
10055:0:(o2iblnd_cb.c:2212:kiblnd_reject()) Error -22 sending reject


Client:

Sep 10 22:41:26 psana101 kernel: LNetError: 
336:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.90@o2ib2 
rejected: consumer defined fatal error



I Am afraid this is the consequence of a mixed configuration.

on the client side Lustre is configured in /etc/modprobe/lustre.conf

options lnet networks=o2ib2(ib0),tcp0(enp6s0),tcp1(enp6s0),tcp2(enp6s0)

on the OSS site I am using lnet.conf

ip2nets:
 - net-spec: o2ib2
   interfaces:
  0: ib0
 - net-spec: tcp2
   interfaces:
  0: enp8s0f0

I supposed that peers could be discovered automatically and added 
automatically to lnet


Should I revert back to static lustre.conf  on the OSS side too ?

I have several lustre clients I cannot add all of them in a peers 
section inside lnet.conf on the OSS side


any hints are very welcomed.

thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 150, Issue 14

2018-09-11 Thread Kirk, Benjamin (JSC-EG311)
elog setup during init failed: rc = -30
> [1353498.974525] LustreError:
> 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD: failed to
> initialize changelog: rc = -30 [1353498.976229] LustreError:
> 25582:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start
> targets: -30 [1353499.072002] LustreError:
> 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-30)
> 
> 
> I?m hoping those traces mean something to someone - any ideas?
> 
> Thanks!
> 
> --
> Benjamin S. Kirk
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> Cheers, Andreas
> ---
> Andreas Dilger
> CTO Whamcloud
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> 
> lustre-discuss mailing list
> 
> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
> 
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/160cde21/attachment.html>
> 
> --
> 
> Subject: Digest Footer
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> --
> 
> End of lustre-discuss Digest, Vol 150, Issue 14
> ***
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre/ZFS snapshots mount error

2018-09-11 Thread Yong, Fan
Changelog is just one of the users for llog. There are many other users for 
llog. Means that even if without changelog, it is still possible to hit such 
trouble. So running robinhood when making snapshot may increase such race 
possibility, but disabling robinhood does not means resolved the issue. The 
final solution should be the enhancement of snapshot-mount logic.

I did not find related LU ticket for this issue.

--
Cheers,
Nasf

From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Robert Redl
Sent: Tuesday, September 11, 2018 6:54 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error


Thanks for the fast reply! If I understood correctly, it is currently not 
possible to use the changelog feature together with the snapshot feature, right?

Is there already a LU-Ticket about that?

Cheers,
Robert

On 09/10/2018 02:57 PM, Yong, Fan wrote:
It is suspected that there were some llog to be handled when the snapshot was 
making Then when mount-up such snapshot, some conditions trigger the llog 
cleanup/modification automatically. So it is not related with your actions when 
mount the snapshot. Since we cannot control the system status when making the 
snapshot, then we have to skip llog related cleanup/modification against the 
snapshot when mount the snapshot. Such “skip” related logic is just what we 
need.

Cheers,
Nasf
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Robert Redl
Sent: Saturday, September 8, 2018 9:04 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error


Dear All,

we have a similar setup with Lustre on ZFS and we make regular use of snapshots 
for the purpose of backups (backups on tape use snapshots as source). We would 
like to use robinhood in future and the question is now how to do it.

Would it be a workaround to disable the robinhood daemon temporary during the 
mount process?
Does the problem only occur when changelogs are consumed during the process of 
mounting a snapshot? Or is it also a problem when changelogs are consumed while 
the snapshot remains mounted (which is for us typically several hours)?
Is there already an LU-ticket about this issue?

Thanks!
Robert
--
Dr. Robert Redl
Scientific Programmer, "Waves to Weather" (SFB/TRR165)
Meteorologisches Institut
Ludwig-Maximilians-Universität München
Theresienstr. 37, 80333 München, Germany
Am 03.09.2018 um 08:16 schrieb Yong, Fan:
I would say that it is not your operations order caused trouble. Instead, it is 
related with the snapshot mount logic. As mentioned in former reply, we need 
some patch for the llog logic to avoid modifying llog under snapshot mode.


--
Cheers,
Nasf
From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov]
Sent: Tuesday, August 28, 2018 7:53 PM
To: lustre-discuss@lists.lustre.org
Cc: Andreas Dilger ; Yong, 
Fan 
Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error

The MDS situation is very basic: active/passive mds0/mds1 for both fas & fsB.  
fsA has the combined msg/mdt in a single zfs filesystem, and fsB has its own 
mdt in a separate zfs filesystem.  mds0 is primary for all.

fsA & fsB DO both have changelogs enabled to feed robinhood databases.

What’s the recommended procedure here we should follow before mounting the 
snapshots?

1) disable changelogs on the active mdt’s (this will compromise robinhood, 
requiring a rescan…), or
2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in our 
case) and then mount the snapshot?

Thanks for the help!

--
Benjamin S. Kirk, Ph.D.
NASA Lyndon B. Johnson Space Center
Acting Chief, Aeroscience & Flight Mechanics Division

On Aug 27, 2018, at 7:33 PM, Yong, Fan 
mailto:fan.y...@intel.com>> wrote:

According to the stack trace, someone was trying to cleanup old empty llogs 
during mount the snapshot. We do NOT allow any modification during mount 
snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add 
LASSERT() when start the transaction. One possible solution is that, we can add 
some check in the llog logic to avoid modifying llog under snapshot mode.


--
Cheers,
Nasf

-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Andreas Dilger
Sent: Tuesday, August 28, 2018 5:57 AM
To: Kirk, Benjamin (JSC-EG311) 
mailto:benjamin.k...@nasa.gov>>
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error

It's probably best to file an LU ticket for this issue.

It looks like there is something with the log processing at mount that is 
trying to modify the configuration files.  I'm not sure whether that should be 
allowed or not.

Does fab have the same MGS as fsA?  Does 

Re: [lustre-discuss] Lustre/ZFS snapshots mount error

2018-09-11 Thread Robert Redl
Thanks for the fast reply! If I understood correctly, it is currently
not possible to use the changelog feature together with the snapshot
feature, right?

Is there already a LU-Ticket about that?

Cheers,
Robert


On 09/10/2018 02:57 PM, Yong, Fan wrote:
>
> It is suspected that there were some llog to be handled when the
> snapshot was making Then when mount-up such snapshot, some conditions
> trigger the llog cleanup/modification automatically. So it is not
> related with your actions when mount the snapshot. Since we cannot
> control the system status when making the snapshot, then we have to
> skip llog related cleanup/modification against the snapshot when mount
> the snapshot. Such “skip” related logic is just what we need.
>
>  
>
> Cheers,
>
> Nasf
>
> *From:*lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org]
> *On Behalf Of * Robert Redl
> *Sent:* Saturday, September 8, 2018 9:04 PM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
>  
>
> Dear All,
>
> we have a similar setup with Lustre on ZFS and we make regular use of
> snapshots for the purpose of backups (backups on tape use snapshots as
> source). We would like to use robinhood in future and the question is
> now how to do it.
>
> Would it be a workaround to disable the robinhood daemon temporary
> during the mount process?
> Does the problem only occur when changelogs are consumed during the
> process of mounting a snapshot? Or is it also a problem when
> changelogs are consumed while the snapshot remains mounted (which is
> for us typically several hours)?
> Is there already an LU-ticket about this issue?
>
> Thanks!
> Robert
>
> -- 
> Dr. Robert Redl
> Scientific Programmer, "Waves to Weather" (SFB/TRR165)
> Meteorologisches Institut
> Ludwig-Maximilians-Universität München
> Theresienstr. 37, 80333 München, Germany
>
> Am 03.09.2018 um 08:16 schrieb Yong, Fan:
>
> I would say that it is not your operations order caused trouble.
> Instead, it is related with the snapshot mount logic. As mentioned
> in former reply, we need some patch for the llog logic to avoid
> modifying llog under snapshot mode.
>
>  
>
>  
>
> --
>
> Cheers,
>
> Nasf
>
> *From:*Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov]
> *Sent:* Tuesday, August 28, 2018 7:53 PM
> *To:* lustre-discuss@lists.lustre.org
> 
> *Cc:* Andreas Dilger 
> ; Yong, Fan 
> 
> *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
>  
>
> The MDS situation is very basic: active/passive mds0/mds1 for both
> fas & fsB.  fsA has the combined msg/mdt in a single zfs
> filesystem, and fsB has its own mdt in a separate zfs filesystem.
>  mds0 is primary for all.
>
>  
>
> fsA & fsB DO both have changelogs enabled to feed robinhood databases.
>
>  
>
> What’s the recommended procedure here we should follow before
> mounting the snapshots?
>
>  
>
> 1) disable changelogs on the active mdt’s (this will compromise
> robinhood, requiring a rescan…), or  
>
> 2) temporarily halt changelog consumption / cleanup (e.g. stop
> robinhood in our case) and then mount the snapshot?
>
>  
>
> Thanks for the help!
>
>  
>
> --
>
> Benjamin S. Kirk, Ph.D.
>
> NASA Lyndon B. Johnson Space Center
>
> Acting Chief, Aeroscience & Flight Mechanics Division
>
>  
>
> On Aug 27, 2018, at 7:33 PM, Yong, Fan  > wrote:
>
>  
>
> According to the stack trace, someone was trying to cleanup
> old empty llogs during mount the snapshot. We do NOT allow any
> modification during mount snapshot; otherwise, it will trigger
> ZFS backend BUG(). That is why we add LASSERT() when start the
> transaction. One possible solution is that, we can add some
> check in the llog logic to avoid modifying llog under snapshot
> mode.
>
>
> --
> Cheers,
> Nasf
>
> -Original Message-
> From: lustre-discuss
> [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of
> Andreas Dilger
> Sent: Tuesday, August 28, 2018 5:57 AM
> To: Kirk, Benjamin (JSC-EG311)  >
> Cc: lustre-discuss@lists.lustre.org
> 
> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>
> It's probably best to file an LU ticket for this issue.
>
> It looks like there is something with the log processing at
> mount that is trying to modify the configuration files.  I'm
> not sure whether that should be allowed or not.
>
> Does fab have the same MGS as fsA?  Does it have the same MDS
> node 

[lustre-discuss] lustre error when trying to mount

2018-09-11 Thread Riccardo Veraldi


hello,

I installed a new Lustre system where MDS and OSSes are version 2.10.5

the lustre clients are running 2.10.1 and 2.9.0

when I try to mount the filesystem it fails with these errors:

OSS:

Sep 10 22:39:46 psananehoss01 kernel: LNetError: 
10055:0:(o2iblnd_cb.c:2513:kiblnd_passive_connect()) Can't accept 
172.21.52.33@o2ib2: -22
Sep 10 22:39:46 psananehoss01 kernel: LNet: 
10055:0:(o2iblnd_cb.c:2212:kiblnd_reject()) Error -22 sending reject


Client:

Sep 10 22:41:26 psana101 kernel: LNetError: 
336:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.90@o2ib2 rejected: 
consumer defined fatal error



I Am afraid this is the consequence of a mixed configuration.

on the client side Lustre is configured in /etc/modprobe/lustre.conf

options lnet networks=o2ib2(ib0),tcp0(enp6s0),tcp1(enp6s0),tcp2(enp6s0)

on the OSS site I am using lnet.conf

ip2nets:
 - net-spec: o2ib2
   interfaces:
  0: ib0
 - net-spec: tcp2
   interfaces:
  0: enp8s0f0

I supposed that peers could be discovered automatically and added 
automatically to lnet


Should I revert back to static lustre.conf  on the OSS side too ?

I have several lustre clients I cannot add all of them in a peers 
section inside lnet.conf on the OSS side


any hints are very welcomed.

thank you


Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org