Re: [ceph-users] [Ceph-maintainers] Debian buster information

2019-05-31 Thread Dan Mick
Péter:

I'm forwarding this to ceph-users for a better answer/discussion


On 5/29/19 6:52 AM, Erdősi Péter wrote:
> Dear CEPH maintainers,
> 
> I would like to ask a few information about CEPH and Debian 10 (Buster).
> We would like to install a CEPH to the RC buster. As I can see, the ceph
> packages in buster now are 12.2.11, however I cannot find the
> ceph-deploy package in the repository.
> 
> I've tried to add the repository from the install guide, but there are
> no buster repo there.
> 
> My questions are:
>  - Is there any non debian repository for buster now?
>  - The buster version from the debian repository should work properly?
> (we intrested in RBD and the libvirt driver, no cephfs or object store
> will be used)
>  - Are there any testing happen with the packages, which are in the
> debian repository? (quality and/or functional.)
>  - Why no ceph-deploy package exist in the debian buster, if the osd and
> mon packages are there?
>  - When will be able to use the repository in buster at
> download.ceph.com? (after buster becomes stable, maybe sooner?)
>  - Could you guess a timerange, when the buster repo will work? (weeks,
> months)
> 
> Thanks,
>  Peter ERDOSI - KIFU
> ___
> Ceph-maintainers mailing list
> ceph-maintain...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-maintainers-ceph.com


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object read error - enough copies available

2019-05-31 Thread Oliver Freyermuth
Hi,

Am 31.05.19 um 12:07 schrieb Burkhard Linke:
> Hi,
> 
> 
> see my post in the recent 'CephFS object mapping.' thread. It describes the 
> necessary commands to lookup a file based on its rados object name.

many thanks! I somehow missed the important part in that thread earlier and 
only got the functional, but not really scaling "find . -xdev -inum 
xxx"-approach before I stopped reading,
but now I have followed it in full - very enlightening indeed, so one needs to 
look at the xattrs of the RADOS objects! 
Very logical once you know it. 

Thanks again!
Oliver

> 
> 
> Regards,
> 
> Burkhard
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSDs fail to start with RDMA

2019-05-31 Thread Lazuardi Nasution
Hi Orlando,

Thank you for your confirmation. I hope somebody else helps about this
issue.

Best regards,

On Sat, Jun 1, 2019, 03:19 Moreno, Orlando  wrote:

> Hi,
>
>
>
> I have not received any response to this and I haven’t worked on this
> lately. I hope to revisit RDMA messenger on Nautilus in the future.
>
>
>
> Thanks,
>
> Orlando
>
>
>
>
>
> *From:* Lazuardi Nasution [mailto:mrxlazuar...@gmail.com]
> *Sent:* Saturday, May 25, 2019 9:14 PM
> *To:* Moreno, Orlando ; Tang, Haodong <
> haodong.t...@intel.com>
> *Cc:* Ceph Users 
> *Subject:* Re: ceph-users Digest, Vol 60, Issue 26
>
>
>
> Hi Orlando and Haodong,
>
>
>
> Is there any response of this thread? I'm interested with this too.
>
>
>
> Best regards,
>
>
>
> Date: Fri, 26 Jan 2018 21:53:59 +
> From: "Moreno, Orlando" 
> To: "ceph-users@lists.ceph.com" , Ceph
> Development 
> Cc: "Tang, Haodong" 
> Subject: [ceph-users] Ceph OSDs fail to start with RDMA
> Message-ID:
> <
> 034aad465c6cbe4f96d9fb98573a79a63719e...@fmsmsx108.amr.corp.intel.com>
>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi all,
>
> I am trying to bring up a Ceph cluster where the private network is
> communicating via RoCEv2. The storage nodes have 2 dual-port 25Gb Mellanox
> ConnectX-4 NICs, with each NIC's ports bonded (2x25Gb mode 4). I have set
> memory limits to unlimited, can rping to each node, and
> ms_async_rdma_device_name set to the ibdev (mlx5_bond_1). Everything goes
> smoothly until I start bringing up OSDs. Nothing appears in stderr, but
> upon further inspection of the OSD log, I see the following error:
>
> RDMAConnectedSocketImpl activate failed to transition to RTR state: (19)
> No such device
> /build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In
> function 'void RDMAConnectedSocketImpl::handle_connection()' thread
> 7f908633c700 time 2018-01-26 10:47:51.607573
> /build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 221:
> FAILED assert(!r)
>
> ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x564a2ccf7892]
> 2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x564a2d007fba]
> 3: (EventCenter::process_events(int, std::chrono::duration std::ratio<1l, 10l> >*)+0xa08) [0x564a2cd9a418]
> 4: (()+0xb4f3a8) [0x564a2cd9e3a8]
> 5: (()+0xb8c80) [0x7f9088c04c80]
> 6: (()+0x76ba) [0x7f90892f36ba]
> 7: (clone()+0x6d) [0x7f908836a41d]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> Anyone see this before or have any suggestions?
>
> Thanks,
> Orlando
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-users Digest, Vol 60, Issue 26

2019-05-31 Thread Moreno, Orlando
Hi,

I have not received any response to this and I haven’t worked on this lately. I 
hope to revisit RDMA messenger on Nautilus in the future.

Thanks,
Orlando


From: Lazuardi Nasution [mailto:mrxlazuar...@gmail.com]
Sent: Saturday, May 25, 2019 9:14 PM
To: Moreno, Orlando ; Tang, Haodong 

Cc: Ceph Users 
Subject: Re: ceph-users Digest, Vol 60, Issue 26

Hi Orlando and Haodong,

Is there any response of this thread? I'm interested with this too.

Best regards,

Date: Fri, 26 Jan 2018 21:53:59 +
From: "Moreno, Orlando" 
mailto:orlando.mor...@intel.com>>
To: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>, Ceph
Development 
mailto:ceph-de...@vger.kernel.org>>
Cc: "Tang, Haodong" mailto:haodong.t...@intel.com>>
Subject: [ceph-users] Ceph OSDs fail to start with RDMA
Message-ID:

<034aad465c6cbe4f96d9fb98573a79a63719e...@fmsmsx108.amr.corp.intel.com>

Content-Type: text/plain; charset="us-ascii"

Hi all,

I am trying to bring up a Ceph cluster where the private network is 
communicating via RoCEv2. The storage nodes have 2 dual-port 25Gb Mellanox 
ConnectX-4 NICs, with each NIC's ports bonded (2x25Gb mode 4). I have set 
memory limits to unlimited, can rping to each node, and 
ms_async_rdma_device_name set to the ibdev (mlx5_bond_1). Everything goes 
smoothly until I start bringing up OSDs. Nothing appears in stderr, but upon 
further inspection of the OSD log, I see the following error:

RDMAConnectedSocketImpl activate failed to transition to RTR state: (19) No 
such device
/build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 
'void RDMAConnectedSocketImpl::handle_connection()' thread 7f908633c700 time 
2018-01-26 10:47:51.607573
/build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 221: FAILED 
assert(!r)

ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) 
[0x564a2ccf7892]
2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x564a2d007fba]
3: (EventCenter::process_events(int, std::chrono::duration >*)+0xa08) [0x564a2cd9a418]
4: (()+0xb4f3a8) [0x564a2cd9e3a8]
5: (()+0xb8c80) [0x7f9088c04c80]
6: (()+0x76ba) [0x7f90892f36ba]
7: (clone()+0x6d) [0x7f908836a41d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Anyone see this before or have any suggestions?

Thanks,
Orlando
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-05-31 Thread Patrick Donnelly
Hi Stefan,

Sorry I couldn't get back to you sooner.

On Mon, May 27, 2019 at 5:02 AM Stefan Kooman  wrote:
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > Hi Patrick,
> >
> > Quoting Stefan Kooman (ste...@bit.nl):
> > > Quoting Stefan Kooman (ste...@bit.nl):
> > > > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > > > somewhere it's not even outputting any log messages. If possible, it'd
> > > > > be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or,
> > > > > if you're comfortable with gdb, a backtrace of any threads that look
> > > > > suspicious (e.g. not waiting on a futex) including `info threads`.
> > >
> > > Today the issue reappeared (after being absent for ~ 3 weeks). This time
> > > the standby MDS could take over and would not get into a deadlock
> > > itself. We made gdb traces again, which you can find over here:
> > >
> > > https://8n1.org/14011/d444
> >
> > We are still seeing these crashes occur ~ every 3 weeks or so. Have you
> > find the time to look into the backtraces / gdb dumps?
>
> We have not seen this issue anymore for the past three months. We have
> updated the cluster to 12.2.11 in the meantime, but not sure if that is
> related. Hopefully it stays away.

Looks like you hit the infinite loop bug in OpTracker. It was fixed in
12.2.11: https://tracker.ceph.com/issues/37977

The problem was introduced in 12.2.8.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-31 Thread Reed Dier
Is there any other evidence of this?

I have 20 5100 MAX (MTFDDAK1T9TCC) and have not experienced any real issues 
with them.
I would pick my Samsung SM863a's or any of my Intel's over the Micron's, but I 
haven't seen the Micron's cause any issues for me.
For what its worth, they are all FW D0MU027, which is likely more out of date, 
but it is working for me.

However, I would steer people away from the Micron 9100 MAX 
(MTFDHAX1T2MCF-1AN1ZABYY) as an NVMe disk to use for WAL/DB, as I have seen 
performance, and reliability issues with those.

Just my 2¢

Reed

> On May 29, 2019, at 12:52 PM, Paul Emmerich  wrote:
> 
> 
> 
> On Wed, May 29, 2019 at 9:36 AM Robert Sander  > wrote:
> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
> 
> The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
> SSDs and will get a batch of Micron 5200 in the next days
> 
> And there's your bottleneck ;)
> The Micron 5100 performs horribly in Ceph, I've seen similar performance in 
> another cluster with these disks.
> Basically they max out at around 1000 IOPS and report 100% utilization and 
> feel slow.
> 
> Haven't seen the 5200 yet.
> 
> 
> Paul
>  
> 
> We have identified the performance settings in the BIOS as a major
> factor. Ramping that up we got a remarkable performance increase.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de 
> 
> Tel: 030-405051-43
> Fax: 030-405051-19
> 
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object read error - enough copies available

2019-05-31 Thread Burkhard Linke

Hi,


see my post in the recent 'CephFS object mapping.' thread. It describes 
the necessary commands to lookup a file based on its rados object name.



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] auth: could not find secret_id=6403

2019-05-31 Thread 解决
Hi all,
 we use ceph(hammer) + openstack(mitaka) in my datacenter and there are 300 
osds and 3. Because the accident datacenter is powered off, all the servers are 
shut down. when power returns to normal ,we start 3 mon service at first, About 
two hours later we start  500 osd service,and later cluster is ok.
but one day later, there are "auth: could not find secret_id=6403" error on 
several osd hosts,and there are op is blocked 


2019-05-27 19:23:44.316416 7fb75451e700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.210:6816/1004678 pipe(0x128f5000 sd=586 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4f8c0).accept connect_seq 5 vs existing 5 state standby
2019-05-27 19:23:44.316519 7fb755029700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.203:6834/1004655 pipe(0x173dd000 sd=584 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27472ec0).accept connect_seq 8 vs existing 7 state standby
2019-05-27 19:23:44.316561 7fb788d5c700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.211:6808/5206 pipe(0x144db000 sd=587 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f51700).accept connect_seq 7 vs existing 7 state standby
2019-05-27 19:23:44.316656 7fb77c198700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.219:6818/8946 pipe(0x1f575000 sd=588 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4e580).accept connect_seq 7 vs existing 7 state standby
2019-05-27 19:23:44.316719 7fb78aa79700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.213:6810/5387 pipe(0x1f57 sd=93 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f512e0).accept connect_seq 5 vs existing 5 state standby
2019-05-27 19:23:44.316852 7fb754922700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.203:6828/1004835 pipe(0x128f sd=585 :6832 s=0 pgs=0 cs=0 l=0 
c=0x21d65b20).accept connect_seq 4 vs existing 3 state standby
2019-05-27 19:23:44.316929 7fb788d5c700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.211:6808/5206 pipe(0x144db000 sd=587 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f51700).accept connect_seq 8 vs existing 7 state standby
2019-05-27 19:23:44.317004 7fb75451e700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.210:6816/1004678 pipe(0x128f5000 sd=586 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4f8c0).accept connect_seq 6 vs existing 5 state standby
2019-05-27 19:23:44.317148 7fb78aa79700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.213:6810/5387 pipe(0x1f57 sd=93 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f512e0).accept connect_seq 6 vs existing 5 state standby
2019-05-27 19:23:44.317207 7fb77c198700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.219:6818/8946 pipe(0x1f575000 sd=588 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4e580).accept connect_seq 8 vs existing 7 state standby
2019-05-27 19:28:51.828430 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:51.828446 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:51.828453 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x27616000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f50aa0).accept: got bad authorizer
2019-05-27 19:28:51.829282 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:51.829296 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:51.829303 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x21ba7000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4f4a0).accept: got bad authorizer
2019-05-27 19:28:52.030139 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:52.030153 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:52.030161 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x20be9000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f51860).accept: got bad authorizer
2019-05-27 19:28:52.431002 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:52.431017 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:52.431024 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x27616000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4f600).accept: got bad authorizer
2019-05-27 19:28:53.231883 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:53.231896 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:53.231903 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x21ba7000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x27f4e9a0).accept: got bad authorizer
2019-05-27 19:28:54.832790 7fb756e47700  0 auth: could not find secret_id=6403
2019-05-27 19:28:54.832805 7fb756e47700  0 cephx: verify_authorizer could not 
get service secret for service osd secret_id=6403
2019-05-27 19:28:54.832812 7fb756e47700  0 -- 10.22.9.197:6832/1005404 >> 
10.22.9.216:6816/28337 pipe(0x27616000 sd=145 :6832 s=0 pgs=0 cs=0 l=0 
c=0x16f579c0).accept: got bad authorizer
2019-05-27 19:28:58.033720 7fb756e47700  0 auth: could not find secret_id=6403