Re: [lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe

2020-07-30 Thread 肖正刚
Hi,
Thanks for your suggestion.
But , to reboot the OSSs in production under massive IO pressure  will make
another long long story .

Regards.


Weiss, Karsten  于2020年7月30日周四 下午11:31写道:

> Hi!
>
>
>
> (Caveat: I ran into this issue not on Lustre but on HPC MPI jobs on CentOS
> 7.7. They only run stable
>
> with the workaround.)
>
>
>
> I’ve opened a bug with Red Hat at
> https://bugzilla.redhat.com/show_bug.cgi?id=1796825 but unfortunately,
>
> it is no longer public (or fixed/closed) i.e. you probably won’t be able
> to read it.
>
>
>
> To make a long story short: You may try to boot with the kernel parameter
> “iommu=pt” as a workaround(!).
>
>
>
> Please let me know if this “fixes” the problem for you. YMMV.
>
>
>
> Best regards,
>
> Karsten
>
>
>
> --
>
> *Dipl.-Inf. Karsten Weiss *s+c / Atos
>
> T +49 7071 9457 452
>
> karsten.we...@atos.net
>
> https://atos.net/de/deutschland/sc-en
>
>
>
> *From:* lustre-discuss  *On
> Behalf Of *???
> *Sent:* Thursday, July 30, 2020 16:05
> *To:* lustre-discuss 
> *Subject:* [lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761):
> dump error cqe
>
>
>
> Hi, all
>
>
>
> we installed lustre-2.12.2 both server and clients ,recently,our oss's
> syslog flooding with messages like below:
>
> “
>
> infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe
> : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0030: 00 00 00 00 00 00 88 13 08 00 84 79 01 04 4c d0
> LustreError: 25762:0:(events.c:450:server_bulk_callback()) event type 5,
> status -5, desc 9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc 9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc 9ffdf58c0a00
> LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
> status -103, desc 9ffdf58c0a00
>
> ”
>
> Does anyone hit this beforce or any suggestions?
>
>
>
> Thanks?
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe

2020-07-30 Thread 肖正刚
Hi, all

we installed lustre-2.12.2 both server and clients ,recently,our oss's
syslog flooding with messages like below:
“
infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe
: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 00 00 88 13 08 00 84 79 01 04 4c d0
LustreError: 25762:0:(events.c:450:server_bulk_callback()) event type 5,
status -5, desc 9ffdf58c0a00
LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
status -103, desc 9ffdf58c0a00
LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
status -103, desc 9ffdf58c0a00
LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5,
status -103, desc 9ffdf58c0a00
”
Does anyone hit this beforce or any suggestions?

Thanks?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org