[lustre-discuss] user/group squashing

2020-03-12 Thread Steve Brasier
The manual says
 that:

"(Required) Use the same user IDs (UID) and group IDs (GID) on all clients.
If use of supplemental groups is required, see Section 41.1, “User/Group
Upcall” for information about supplementary user and group cache upcall
(identity_upcall)."

but is that actually true if using user/group squashing? I'd have imagined
that if I have a nodemap with `squash_gid/uid=4000` then any client user
would be able to access the filesystem as long as (and only requiring)
uid/gid 4000 exists on the server (and has the right permissions in the
filesystem obviously).

Reason I'm asking is that I can't get squashing to work how I'd expect so
wondering if my mental model of what Lustre is doing is wrong. I note the
docs about identity upcalls but it seems that only relates to getting the
supplementary groups for client users, so with squashing that shouldn't be
relevant either?

thanks for any clarifications
Steve Brasier

http://stackhpc.com/
Please note I work Tuesday to Friday.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-12 Thread Torsten Harenberg
Dear all,

Am 10.03.20 um 08:18 schrieb Torsten Harenberg:
> During the last days (since thursday), our Lustre instance was
> surprisingly stable. We lowered a bit the load by limiting the # of
> running jobs which might also helped to stablize the system.
> 
> We enabled kdump, so if another crash is happening anytime soon, we hope
> to get at least a dump for a hint where the problem is.

now it crashed again. But we got a backtrace and a dump.

the backtrace is:

<4>general protection fault:  [#1] SMP
<4>last sysfs file:
/sys/devices/pci:00/:00:01.0/:04:00.1/host4/rport-4:0-1/target4:0:1/4:0:1:14/state
<4>CPU 13
<4>Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U)
osd_ldiskfs(U) ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U)
lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic
crc32c_intel libcfs(U) autofs4 bonding ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
iTCO_wdt iTCO_vendor_support hpilo hpwdt serio_raw lpc_ich mfd_core
ioatdma dca ses enclosure sg bnx2x ptp pps_core libcrc32c mdio
power_meter acpi_ipmi ipmi_si ipmi_msghandler shpchp ext4 jbd2 mbcache
dm_round_robin sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt
pata_acpi ata_generic ata_piix dm_multipath dm_mirror dm_region_hash
dm_log dm_mod hpsa [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 18657, comm: ll_ost_io02_077 Not tainted
2.6.32-573.12.1.el6_lustre.x86_64 #1 HP ProLiant DL360p Gen8
<4>RIP: 0010:[]  []
ldiskfs_ext_insert_extent+0xb3/0x10c0 [ldiskfs]
<4>RSP: 0018:8806fa8136c0  EFLAGS: 00010246
<4>RAX:  RBX: 0002 RCX: dead00200200
<4>RDX: 8806fa813800 RSI: 88196f62a2c0 RDI: 880106fc3901
<4>RBP: 8806fa813790 R08:  R09: 8807ff69f3c0
<4>R10: 0009 R11: 0002 R12: 88196f62a240
<4>R13:  R14: 0002 R15: 88196f62a2c0
<4>FS:  () GS:88009a5a()
knlGS:
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
<4>CR2: 00426820 CR3: 01a8d000 CR4: 000407e0
<4>DR0:  DR1:  DR2: 
<4>DR3:  DR6: 0ff0 DR7: 0400
<4>Process ll_ost_io02_077 (pid: 18657, threadinfo 8806fa81,
task 8806faffaab0)
<4>Stack:
<4> 8806fa8136f0 811beb8b 0002 dead00200200
<4> 881f80925c00 880106fc39b0 8806fa813790 a0be3dd1
<4> 88196f62a240 880106fc39b0 8806fa8137e8 fa8137d4
<4>Call Trace:
<4> [] ? __mark_inode_dirty+0x3b/0x160
<4> [] ? ldiskfs_mb_new_blocks+0x241/0x640 [ldiskfs]
<4> [] ldiskfs_ext_new_extent_cb+0x5d9/0x6d0 [osd_ldiskfs]
<4> [] ? call_rwsem_wake+0x18/0x30
<4> [] ldiskfs_ext_walk_space+0x142/0x310 [ldiskfs]
<4> [] ? ldiskfs_ext_new_extent_cb+0x0/0x6d0 [osd_ldiskfs]
<4> [] osd_ldiskfs_map_nblocks+0x7d/0x110 [osd_ldiskfs]
<4> [] osd_ldiskfs_map_inode_pages+0x278/0x2e0
[osd_ldiskfs]
<4> [] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
<4> [] osd_write_commit+0x39b/0x9a0 [osd_ldiskfs]
<4> [] ofd_commitrw_write+0x664/0xfa0 [ofd]
<4> [] ofd_commitrw+0x5bf/0xb10 [ofd]
<4> [] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
<4> [] obd_commitrw+0x114/0x380 [ptlrpc]
<4> [] tgt_brw_write+0xc70/0x1540 [ptlrpc]
<4> [] ? enqueue_task+0x66/0x80
<4> [] ? check_preempt_curr+0x6d/0x90
<4> [] ? try_to_wake_up+0x24e/0x3e0
<4> [] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
<4> [] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
<4> [] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
<4> [] ptlrpc_main+0xd21/0x1800 [ptlrpc]
<4> [] ? pick_next_task_fair+0xd0/0x130
<4> [] ? schedule+0x176/0x3a0
<4> [] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
<4> [] kthread+0x9e/0xc0
<4> [] child_rip+0xa/0x20
<4> [] ? kthread+0x0/0xc0
<4> [] ? child_rip+0x0/0x20
<4>Code: 48 85 c9 0f 84 05 10 00 00 4d 85 ff 74 0a f6 45 8c 08 0f 84 33
07 00 00 45 31 ed 4c 89 e8 66 2e 0f 1f 84 00 00 00 00 00 49 63 de <44>
0f b7 49 02 48 8d 14 dd 00 00 00 00 49 89 df 49 c1 e7 06 49
<1>RIP  [] ldiskfs_ext_insert_extent+0xb3/0x10c0 [ldiskfs]
<4> RSP 
[root@lustre3 127.0.0.1-2020-03-11-19:14:18]#


I still have to read (I am not experienced with kernel debugging) how to
attach the vmcore to a gdb.

But if you already have a guess reading the trace I would be very happy
to take any advice.

By the way: we mounted the OST now exactly the other way round than
usual and now the other machine crashed, so it seems that it has
something to do with the content on the LUNs rather than it's a server
hardware problem.

Thanks again

  Torsten


-- 
Dr. Torsten Harenberg harenb...@physik.uni-wuppertal.de
Bergische Universitaet
Fakultät 4 - Physik   Tel.: +49 (0)202 439-3521
Gaussstr. 20  Fax : +49 (0)202 439-2811
42097 Wuppertal



smime.p7s
Description: S/MIME Cryptographic Signature
__