date:20200323

[ceph-users] Exporting

2020-03-23 Thread Rhian Resnick

Evening,

We are running into issues exporting a disk image from ceph rbd. When we 
attempt to export an rbd image in a cache tiered erasure-coded pool on Luminus.

All the other disks are working fine but this one is acting up. We have a bit 
of important data on other disks so obviously want to make sure this doesn't 
happen to those.

[root@ceph-p-mon1 home]# rbd export one/one-177-588-0 one-177-588-0
Exporting image: 8% complete...rbd: error reading from source image at offset 
5456789504: (5) Input/output error
2020-03-23 20:11:29.210718 7f2f3effd700 -1 librbd::io::ObjectRequest: 
0x7f2f2c128f90 handle_read_object: failed to read from object: (5) Input/output 
error
2020-03-23 20:11:29.565184 7f2f3e7fc700 -1 librbd::io::ObjectRequest: 
0x7f2f280c84d0 handle_read_cache: failed to read from cache: (5) Input/output 
error
Exporting image: 8% complete...failed.
rbd: export error: (5) Input/output error


Any thoughts would be appreciated.


Some info:

[root@ceph-p-mon1 home]# rbd info one/one-177-588-0
rbd image 'one-177-588-0':
size 58.6GiB in 15000 objects
order 22 (4MiB objects)
block_name_prefix: rbd_data.84a01279e2a9e3
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Fri Apr 20 17:06:09 2018
parent: one/one-177@snap
overlap: 2.20GiB

[root@ceph-p-mon1 home]# ceph status
  cluster:
id: 6a2e8f21-bca2-492b-8869-eecc995216cc
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-p-mon2,ceph-p-mon1,ceph-p-mon3
mgr: ceph-p-mon2(active)
mds: cephfsec-1/1/1 up  {0=ceph-p-mon2=up:active}, 6 up:standby
osd: 155 osds: 154 up, 154 in

  data:
pools:   6 pools, 5904 pgs
objects: 145.53M objects, 192TiB
usage:   253TiB used, 290TiB / 543TiB avail
pgs: 5896 active+clean
 8active+clean+scrubbing+deep

  io:
client:   921KiB/s rd, 5.68MiB/s wr, 110op/s rd, 29op/s wr
cache:5.65MiB/s flush, 0op/s promote


[root@ceph-p-mon1 home]# rpm -qa | grep ceph
ceph-common-12.2.9-0.el7.x86_64
ceph-mds-12.2.9-0.el7.x86_64
ceph-radosgw-12.2.9-0.el7.x86_64
ceph-mgr-12.2.9-0.el7.x86_64
ceph-12.2.9-0.el7.x86_64
collectd-ceph-5.8.1-1.el7.x86_64
ceph-deploy-2.0.1-0.noarch
libcephfs2-12.2.9-0.el7.x86_64
python-cephfs-12.2.9-0.el7.x86_64
ceph-selinux-12.2.9-0.el7.x86_64
ceph-osd-12.2.9-0.el7.x86_64
ceph-base-12.2.9-0.el7.x86_64
ceph-mon-12.2.9-0.el7.x86_64
ceph-release-1-1.el7.noarch


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Docker deploy osd

2020-03-23 Thread Oscar Segarra

Hi,

I'm not able to bootstrap an OSD container for a physical device or LVM.

¿Anyone has been able to bootstrap it?

Sorry if it is not the correct place to post this question. If not, I
apologize and I will be grateful if anyone can redirect-me to the correct
place.

Thanks in advance
Oscar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Docker deploy osd

2020-03-23 Thread Oscar Segarra

Hi,

I'm not able to bootstrap an OSD container for a physical device or LVM.

¿Anyone has been able to bootstrap it?

Sorry if it is not the correct place to post this question. If not, I
apologize and I will be grateful if anyone can redirect-me to the correct
place.

Thanks in advance
Oscar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-23 Thread Maged Mokhtar



On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write 
caching on, or should it be configured off for failover ?


You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid is any
sort of caching at the ganesha daemon layer.


Hi Jeff,

Thanks for your reply. I meant caching by libcepfs used within the 
ganesha ceph fsal plugin, which i am not sure from your reply if this is 
what you refer to as ganesha daemon layer (or does the later mean the 
internal mdcache in ganesha). I really appreciate if you can clarify 
this point.


I really have doubts that it is safe to leave write caching in the 
plugin and have safe failover, yet i see comments in the conf file such as:

# The libcephfs client will aggressively cache information while it
# can, so there is little benefit to ganesha actively caching the same
# objects.

Or is it up to the NFS client to issue cache syncs and re-submit writes 
if it detects failover ?


Appreciate your help.  /Maged
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread Gencer W . Genç

Hi Volker,

Thank you so much for your quick fix for me. It worked. I got my dashboard back 
and ceph is in HEALTH_OK state.

Thank you so much again and stay safe!

Regards,
Gencer.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread Volker Theile

Hi Gencer,

you can fix the Dashboard user database with the following command:

# ceph config-key get "mgr/dashboard/accessdb_v2" | jq -cM
".users[].pwdUpdateRequired = false" | ceph config-key set
"mgr/dashboard/accessdb_v2" -i -

Regards
Volker

Am 23.03.20 um 16:22 schrieb gen...@gencgiyen.com:
> Hi Volker,
>
> Sure, here you go:
>
> 
> {"users": {"gencer": {"username": "gencer", "password": "", 
> "roles": ["administrator"], "name": "Gencer Gen\u00e7", "email": 
> "gencer@xxx", "lastUpdate": 1580029921, "enabled": true, "pwdExpirationDate": 
> null}}, "roles": {}, "version": 2}
> 
>
> Gencer.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Volker Theile
Software Engineer | Ceph | openATTIC
Phone: +49 173 5876879
E-Mail: vthe...@suse.com

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany

(HRB 36809, AG Nürnberg)
Managing Director: Felix Imendörffer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs mount error 1 = Operation not permitted

2020-03-23 Thread Dungan, Scott A.

That was it! I am not sure how I got confused with the client name syntax. When 
I issued the command to create a client key, I used:

ceph fs authorize cephfs client.1 / r / rw

I assumed from the syntax that my client name is "client.1"

I suppose the correct syntax is that anything after "client." is the name? So:

ceph fs authorize cephfs client.bob / r / rw

Would authorize a client named bob?

-Scott

From: Eugen Block 
Sent: Monday, March 23, 2020 11:30 AM
To: Dungan, Scott A. 
Cc: Yan, Zheng ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

Wait, your client name is just "1"? In that case you need to specify
that in your mount command:

mount ... -o name=1,secret=...

It has to match your ceph auth settings, where "client" is only a
prefix and is followed by the client's name

[client.1]


Zitat von "Dungan, Scott A." :

> Tried that:
>
> [client.1]
> key = ***
> caps mds = "allow rw path=/"
> caps mon = "allow r"
> caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"
>
> No change.
>
>
> 
> From: Yan, Zheng 
> Sent: Sunday, March 22, 2020 9:28 PM
> To: Dungan, Scott A. 
> Cc: Eugen Block ; ceph-users@ceph.io 
> Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted
>
> On Sun, Mar 22, 2020 at 8:21 AM Dungan, Scott A.  wrote:
>>
>> Zitat, thanks for the tips.
>>
>> I tried appending the key directly in the mount command
>> (secret=) and that produced the same error.
>>
>> I took a look at the thread you suggested and I ran the commands
>> that Paul at Croit suggested even though I the ceph dashboard
>> showed "cephs" as already set as the application on both my data
>> and metadata pools:
>>
>> [root@ceph-n4 ~]# ceph osd pool application set data cephfs data cephfs
>> set application 'cephfs' key 'data' to 'cephfs' on pool 'data'
>> [root@ceph-n4 ~]# ceph osd pool application set meta_data cephfs
>> metadata cephfs
>> set application 'cephfs' key 'metadata' to 'cephfs' on pool 'meta_data'
>>
>> No change. I get the "mount error 1 = Operation not permitted"
>> error the same as before.
>>
>> I also tried manually editing the caps osd pool tags for my
>> client.1, to allow rw to both the data pool as well as the metadata
>> pool, as suggested further in the thread:
>>
>> [client.1]
>> key = ***
>> caps mds = "allow rw path=all"
>
>
> try replacing this with  "allow rw path=/"
>
>> caps mon = "allow r"
>> caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"
>>
>> No change.
>>
>> 
>> From: Eugen Block 
>> Sent: Saturday, March 21, 2020 1:16 PM
>> To: ceph-users@ceph.io 
>> Subject: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted
>>
>> I just remembered there was a thread [1] about that a couple of weeks
>> ago. Seems like you need to add the capabilities to the client.
>>
>> [1]
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/23FDDSYBCDVMYGCUTALACPFAJYITLOHJ/#I6LJR72AJGOCGINVOVEVSCKRIWV5TTZ2
>>
>>
>> Zitat von Eugen Block :
>>
>> > Hi,
>> >
>> > have you tried to mount with the secret only instead of a secret file?
>> >
>> > mount -t ceph ceph-n4:6789:/ /ceph -o name=client.1,secret=
>> >
>> > If that works your secret file is not right. If not you should check
>> > if the client actually has access to the cephfs pools ('ceph auth
>> > list').
>> >
>> >
>> >
>> > Zitat von "Dungan, Scott A." :
>> >
>> >> I am still very new to ceph and I have just set up my first small
>> >> test cluster. I have Cephfs enabled (named cephfs) and everything
>> >> is good in the dashboard. I added an authorized user key for cephfs
>> >> with:
>> >>
>> >> ceph fs authorize cephfs client.1 / r / rw
>> >>
>> >> I then copied the key to a file with:
>> >>
>> >> ceph auth get-key client.1 > /tmp/client.1.secret
>> >>
>> >> Copied the file over to the client and then attempt mount witth the
>> >> kernel driver:
>> >>
>> >> mount -t ceph ceph-n4:6789:/ /ceph -o
>> >> name=client.1,secretfile=/root/client.1.secret
>> >> mount error 1 = Operation not permitted
>> >>
>> >> I looked in the logs on the mds (which is also the mgr and mon for
>> >> the cluster) and I don't see any events logged for this. I also
>> >> tried the mount command with verbose and I didn't get any further
>> >> detail. Any tips would be most appreciated.
>> >>
>> >> --
>> >>
>> >> Scott Dungan
>> >> California Institute of Technology
>> >> Office: (626) 395-3170
>> >> sdun...@caltech.edu
>> >>
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To

[ceph-users] Re: multi-node NFS Ganesha + libcephfs caching

2020-03-23 Thread Jeff Layton

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:
> Hello all,
> 
> For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write 
> caching on, or should it be configured off for failover ?
> 

You can do libcephfs write caching, as the caps would need to be
recalled for any competing access. What you really want to avoid is any
sort of caching at the ganesha daemon layer.

-- 
Jeff Layton 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs mount error 1 = Operation not permitted

2020-03-23 Thread Eugen Block

Wait, your client name is just "1"? In that case you need to specify  
that in your mount command:


mount ... -o name=1,secret=...

It has to match your ceph auth settings, where "client" is only a  
prefix and is followed by the client's name


[client.1]


Zitat von "Dungan, Scott A." :


Tried that:

[client.1]
key = ***
caps mds = "allow rw path=/"
caps mon = "allow r"
caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"

No change.



From: Yan, Zheng 
Sent: Sunday, March 22, 2020 9:28 PM
To: Dungan, Scott A. 
Cc: Eugen Block ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

On Sun, Mar 22, 2020 at 8:21 AM Dungan, Scott A.  wrote:


Zitat, thanks for the tips.

I tried appending the key directly in the mount command  
(secret=) and that produced the same error.


I took a look at the thread you suggested and I ran the commands  
that Paul at Croit suggested even though I the ceph dashboard  
showed "cephs" as already set as the application on both my data  
and metadata pools:


[root@ceph-n4 ~]# ceph osd pool application set data cephfs data cephfs
set application 'cephfs' key 'data' to 'cephfs' on pool 'data'
[root@ceph-n4 ~]# ceph osd pool application set meta_data cephfs  
metadata cephfs

set application 'cephfs' key 'metadata' to 'cephfs' on pool 'meta_data'

No change. I get the "mount error 1 = Operation not permitted"  
error the same as before.


I also tried manually editing the caps osd pool tags for my  
client.1, to allow rw to both the data pool as well as the metadata  
pool, as suggested further in the thread:


[client.1]
key = ***
caps mds = "allow rw path=all"



try replacing this with  "allow rw path=/"


caps mon = "allow r"
caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"

No change.


From: Eugen Block 
Sent: Saturday, March 21, 2020 1:16 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

I just remembered there was a thread [1] about that a couple of weeks
ago. Seems like you need to add the capabilities to the client.

[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/23FDDSYBCDVMYGCUTALACPFAJYITLOHJ/#I6LJR72AJGOCGINVOVEVSCKRIWV5TTZ2


Zitat von Eugen Block :

> Hi,
>
> have you tried to mount with the secret only instead of a secret file?
>
> mount -t ceph ceph-n4:6789:/ /ceph -o name=client.1,secret=
>
> If that works your secret file is not right. If not you should check
> if the client actually has access to the cephfs pools ('ceph auth
> list').
>
>
>
> Zitat von "Dungan, Scott A." :
>
>> I am still very new to ceph and I have just set up my first small
>> test cluster. I have Cephfs enabled (named cephfs) and everything
>> is good in the dashboard. I added an authorized user key for cephfs
>> with:
>>
>> ceph fs authorize cephfs client.1 / r / rw
>>
>> I then copied the key to a file with:
>>
>> ceph auth get-key client.1 > /tmp/client.1.secret
>>
>> Copied the file over to the client and then attempt mount witth the
>> kernel driver:
>>
>> mount -t ceph ceph-n4:6789:/ /ceph -o
>> name=client.1,secretfile=/root/client.1.secret
>> mount error 1 = Operation not permitted
>>
>> I looked in the logs on the mds (which is also the mgr and mon for
>> the cluster) and I don't see any events logged for this. I also
>> tried the mount command with verbose and I didn't get any further
>> detail. Any tips would be most appreciated.
>>
>> --
>>
>> Scott Dungan
>> California Institute of Technology
>> Office: (626) 395-3170
>> sdun...@caltech.edu
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs mount error 1 = Operation not permitted

2020-03-23 Thread Dungan, Scott A.

Tried that:

[client.1]
key = ***
caps mds = "allow rw path=/"
caps mon = "allow r"
caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"

No change.



From: Yan, Zheng 
Sent: Sunday, March 22, 2020 9:28 PM
To: Dungan, Scott A. 
Cc: Eugen Block ; ceph-users@ceph.io 
Subject: Re: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted

On Sun, Mar 22, 2020 at 8:21 AM Dungan, Scott A.  wrote:
>
> Zitat, thanks for the tips.
>
> I tried appending the key directly in the mount command 
> (secret=) and that produced the same error.
>
> I took a look at the thread you suggested and I ran the commands that Paul at 
> Croit suggested even though I the ceph dashboard showed "cephs" as already 
> set as the application on both my data and metadata pools:
>
> [root@ceph-n4 ~]# ceph osd pool application set data cephfs data cephfs
> set application 'cephfs' key 'data' to 'cephfs' on pool 'data'
> [root@ceph-n4 ~]# ceph osd pool application set meta_data cephfs metadata 
> cephfs
> set application 'cephfs' key 'metadata' to 'cephfs' on pool 'meta_data'
>
> No change. I get the "mount error 1 = Operation not permitted" error the same 
> as before.
>
> I also tried manually editing the caps osd pool tags for my client.1, to 
> allow rw to both the data pool as well as the metadata pool, as suggested 
> further in the thread:
>
> [client.1]
> key = ***
> caps mds = "allow rw path=all"


try replacing this with  "allow rw path=/"

> caps mon = "allow r"
> caps osd = "allow rw tag cephfs pool=meta_data, allow rw pool=data"
>
> No change.
>
> 
> From: Eugen Block 
> Sent: Saturday, March 21, 2020 1:16 PM
> To: ceph-users@ceph.io 
> Subject: [ceph-users] Re: Cephfs mount error 1 = Operation not permitted
>
> I just remembered there was a thread [1] about that a couple of weeks
> ago. Seems like you need to add the capabilities to the client.
>
> [1]
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/23FDDSYBCDVMYGCUTALACPFAJYITLOHJ/#I6LJR72AJGOCGINVOVEVSCKRIWV5TTZ2
>
>
> Zitat von Eugen Block :
>
> > Hi,
> >
> > have you tried to mount with the secret only instead of a secret file?
> >
> > mount -t ceph ceph-n4:6789:/ /ceph -o name=client.1,secret=
> >
> > If that works your secret file is not right. If not you should check
> > if the client actually has access to the cephfs pools ('ceph auth
> > list').
> >
> >
> >
> > Zitat von "Dungan, Scott A." :
> >
> >> I am still very new to ceph and I have just set up my first small
> >> test cluster. I have Cephfs enabled (named cephfs) and everything
> >> is good in the dashboard. I added an authorized user key for cephfs
> >> with:
> >>
> >> ceph fs authorize cephfs client.1 / r / rw
> >>
> >> I then copied the key to a file with:
> >>
> >> ceph auth get-key client.1 > /tmp/client.1.secret
> >>
> >> Copied the file over to the client and then attempt mount witth the
> >> kernel driver:
> >>
> >> mount -t ceph ceph-n4:6789:/ /ceph -o
> >> name=client.1,secretfile=/root/client.1.secret
> >> mount error 1 = Operation not permitted
> >>
> >> I looked in the logs on the mds (which is also the mgr and mon for
> >> the cluster) and I don't see any events logged for this. I also
> >> tried the mount command with verbose and I didn't get any further
> >> detail. Any tips would be most appreciated.
> >>
> >> --
> >>
> >> Scott Dungan
> >> California Institute of Technology
> >> Office: (626) 395-3170
> >> sdun...@caltech.edu
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW failing to create bucket

2020-03-23 Thread Abhinav Singh

please someone help me

On Mon, 23 Mar 2020, 19:44 Abhinav Singh, 
wrote:

>
>
> -- Forwarded message -
> From: Abhinav Singh 
> Date: Mon, Mar 23, 2020 at 7:43 PM
> Subject: RGW failing to create bucket
> To: 
>
>
> ceph : octopus
> JaegerTracing : master
> ubuntu : 18.04
>
> When I implementing jaeger tracing it is unable to create a bucket.
> (I m using swif to perform testing.)
> /src/librados/IoCtxImpl.cc
>
> ```
> void librados::IoCtxImpl::queue_aio_write(AioCompletionImpl *c)
> {
> std::cout<<"yes"< JTracer tracer;
> tracer.initTracer("Writing Started",
> "/home/abhinav/Desktop/GSOC/deepika/ceph/src/librados/tracerConfig.yaml");
> Span span=tracer.newSpan("writing started");
> span->Finish();
> try{
> auto yaml = YAML::LoadFile("tracerConfig.yaml");
> }catch(const YAML::ParserException& pe){
> // ldout< std::cout< ofstream f;
> f.open("/home/abhinav/Desktop/err.txt");
> f< f.close();
> }
> // auto config = jaegertracing::Config::parse(yaml);
> // auto tracer=jaegertracing::Tracer::make(
> // "Writing",
> // config,
> // jaegertracing::logging::consoleLogger()
> // );
> // opentracing::Tracer::InitGlobal(
> // static_pointer_cast(tracer)
> // );
> // auto span = opentracing::Tracer::Global()->StartSpan("Span1");
> get();
> ofstream file;
> file.open("/home/abhinav/Desktop/write.txt",std::ios::out | std::ios
> ::app);
> file<<"Writing /src/librados/IoCtxImpl.cc 310.\n";
> file.close();
> std::scoped_lock l{aio_write_list_lock};
> ceph_assert(c->io == this);
> c->aio_write_seq = ++aio_write_seq;
> ldout(client->cct, 20) << "queue_aio_write " << this << " completion " <<
> c
> << " write_seq " << aio_write_seq << dendl;
> aio_write_list.push_back(>aio_write_list_item);
> // opentracing::Tracer::Global()->Close();
> }
> ```
>  /include/tracer.h
> ```
> typedef std::unique_ptr Span;
>
> class JTracer{
> public:
> JTracer(){}
> ~JTracer(){
> opentracing::Tracer::Global()->Close();
> }
> void static inline loadYamlConfigFile(const char* path){
> return;
> }
> void initTracer(const char* tracerName,const char* filePath){
> auto yaml = YAML::LoadFile(filePath);
> auto configuration = jaegertracing::Config::parse(yaml);
> auto tracer = jaegertracing::Tracer::make(
> tracerName,
> configuration,
> jaegertracing::logging::consoleLogger());
> opentracing::Tracer::InitGlobal(
> std::static_pointer_cast(tracer));
> Span s=opentracing::Tracer::Global()->StartSpan("Testing");
> s->Finish();
> }
> Span newSpan(const char* spanName){
> Span span=opentracing::Tracer::Global()->StartSpan(spanName);
> return std::move(span);
> }
> Span childSpan(const char* spanName,const Span& parentSpan){
> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
> opentracing::ChildOf(>context())});
> return std::move(span);
> }
> Span followUpSpan(const char *spanName, const Span& parentSpan){
> Span span = opentracing::Tracer::Global()->StartSpan(spanName, {
> opentracing::FollowsFrom(>context())});
> return std::move(span);
> }
> };
> ```
>
> Output when trying to create new container
>
> ```
> errno 111 connection refused
> ```
> But when I remove the tracer part in IoCtxImpl.cc it is workng fine.
>
> I m new to ceph, and dont know what information to share to correctly
> track down the problem, if any extra informtion is needed I will share it
> instantly.
>
> Been stuck into this issue for one week.
> Please someone help me!
>
> Thank you.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph ignoring cluster/public_network when initiating TCP connections

2020-03-23 Thread DHilsbos

Liviu;

First: what version of Ceph are you running?

Second: I don't see a cluster network option in you configuration file?

At least for us, running Nautilus, there are no underscores (_) in the options, 
so our configuration files look like this:

[global]
auth clust required = cphx
public network = /
cluster network = /

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Liviu Sas [mailto:droop...@gmail.com] 
Sent: Sunday, March 22, 2020 5:03 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ceph ignoring cluster/public_network when initiating TCP 
connections

Hello,

While testing our ceph cluster setup, I noticed a possible issue with the
cluster/public network configuration being ignored for TCP session
initiation.

Looks like the daemons (mon/mgr/mds/osd) are all listening on the right IP
address but are initiating TCP sessions from the wrong interfaces.
Would it be possible to force ceph daemons to use the cluster/public IP
addresses to initiate new TCP connections instead of letting the kernel
chose?

Some details below:

We set everything up to use our "10.2.1.0/24" network:
10.2.1.x (x=node number 1,2,3)
But we can see TCP sessions being initiated from "10.2.0.0/24" network.

So the daemons are listening to the right IP addresses.
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep LISTE
ceph-mds  1541648 ceph   16u IPv48169344
 0t0TCP 10.2.1.1:6800 (LISTEN)
ceph-mds  1541648 ceph   17u IPv48169346
 0t0TCP 10.2.1.1:6801 (LISTEN)
ceph-mgr  1541654 ceph   25u IPv48163039
 0t0TCP 10.2.1.1:6810 (LISTEN)
ceph-mgr  1541654 ceph   27u IPv48163051
 0t0TCP 10.2.1.1:6811 (LISTEN)
ceph-mon  1541703 ceph   27u IPv48170914
 0t0TCP 10.2.1.1:3300 (LISTEN)
ceph-mon  1541703 ceph   28u IPv48170915
 0t0TCP 10.2.1.1:6789 (LISTEN)
ceph-osd  1541711 ceph   16u IPv48169353
 0t0TCP 10.2.1.1:6802 (LISTEN)
ceph-osd  1541711 ceph   17u IPv48169357
 0t0TCP 10.2.1.1:6803 (LISTEN)
ceph-osd  1541711 ceph   18u IPv48169362
 0t0TCP 10.2.1.1:6804 (LISTEN)
ceph-osd  1541711 ceph   19u IPv48169368
 0t0TCP 10.2.1.1:6805 (LISTEN)
ceph-osd  1541711 ceph   20u IPv48169375
 0t0TCP 10.2.1.1:6806 (LISTEN)
ceph-osd  1541711 ceph   21u IPv48169383
 0t0TCP 10.2.1.1:6807 (LISTEN)
ceph-osd  1541711 ceph   22u IPv48169392
 0t0TCP 10.2.1.1:6808 (LISTEN)
ceph-osd  1541711 ceph   23u IPv48169402
 0t0TCP 10.2.1.1:6809 (LISTEN)

Sessions to the other nodes use the wrong IP address:

@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.2
ceph-mds  1541648 ceph   28u IPv48279520
 0t0TCP 10.2.0.2:44180->10.2.1.2:6800 (ESTABLISHED)
ceph-mgr  1541654 ceph   41u IPv48289842
 0t0TCP 10.2.0.2:44146->10.2.1.2:6800 (ESTABLISHED)
ceph-mon  1541703 ceph   40u IPv48174827
 0t0TCP 10.2.0.2:40864->10.2.1.2:3300 (ESTABLISHED)
ceph-osd  1541711 ceph   65u IPv48171035
 0t0TCP 10.2.0.2:58716->10.2.1.2:6804 (ESTABLISHED)
ceph-osd  1541711 ceph   66u IPv48172960
 0t0TCP 10.2.0.2:54586->10.2.1.2:6806 (ESTABLISHED)
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.3
ceph-mds  1541648 ceph   30u IPv48292421
 0t0TCP 10.2.0.2:45710->10.2.1.3:6802 (ESTABLISHED)
ceph-mon  1541703 ceph   46u IPv48173025
 0t0TCP 10.2.0.2:40164->10.2.1.3:3300 (ESTABLISHED)
ceph-osd  1541711 ceph   67u IPv48173043
 0t0TCP 10.2.0.2:56920->10.2.1.3:6804 (ESTABLISHED)
ceph-osd  1541711 ceph   68u IPv48171063
 0t0TCP 10.2.0.2:41952->10.2.1.3:6806 (ESTABLISHED)
ceph-osd  1541711 ceph   69u IPv48178891
 0t0TCP 10.2.0.2:57890->10.2.1.3:6808 (ESTABLISHED)


See below our cluster config:

[global]
 auth_client_required = cephx
 auth_cluster_required = cephx
 auth_service_required = cephx
 cluster_network = 10.2.1.0/24
 fsid = 0f19b6ff-0432-4c3f-b0cb-730e8302dc2c
 mon_allow_pool_delete = true
 mon_host = 10.2.1.1 10.2.1.2 10.2.1.3
 osd_pool_default_min_size = 2
 osd_pool_default_size = 3
 public_network = 10.2.1.0/24

[client]
 keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
 keyring =

[ceph-users] Re: Q release name

2020-03-23 Thread Gencer W . Genç

What about Quasar? (https://www.google.com/search?q=quasar)

It's belong to the universe.

True that there are no so much options for Q.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Yuri Weinstein

+1 Quincy

On Mon, Mar 23, 2020 at 10:11 AM Sage Weil  wrote:
>
> Hi everyone,
>
> As we wrap up Octopus and kick of development for Pacific, now it seems
> like a good idea to sort out what to call the Q release.
> Traditionally/historically, these have always been names of cephalopod
> species--usually the "common name", but occasionally a latin name
> (infernalis).
>
> Q is a bit of a challenge since there aren't many of either that start
> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus
> (https://en.wikipedia.org/wiki/Quebecoceras).
>
> The only other Q cephalopod reference I could find was Squidward Q
> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
> and Yehuda figured out that the Q stands for Quincy.
>
> So far that's it.  If you can find any other options, please catalog them
> on the etherpad:
>
> https://pad.ceph.com/p/q
>
> (or even get a head start on future releases.. they're always the
> single-letter pads, e.g., https://pad.ceph.com/p/r).
>
> sage
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Wesley Dillingham

Checking the word "Octopus" in different languages the only one starting
with a "Q" is in "Maltese": "Qarnit"

For good measure here is a Maltesian Qarnit stew recipe:
http://littlerock.com.mt/food/maltese-traditional-recipe-stuffat-tal-qarnit-octopus-stew/

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Mar 23, 2020 at 1:32 PM Brian Topping 
wrote:

> I liked the first one a lot. Until I read the second one.
>
> > On Mar 23, 2020, at 11:29 AM, Anthony D'Atri 
> wrote:
> >
> > That has potential.  Another, albeit suboptimal idea would be simply
> >
> > Quid
> >
> > as in
> >
> > ’S quid
> >
> > as in “it’s squid”.  cf. https://en.wikipedia.org/wiki/%27S_Wonderful
> >
> > Alternately just skip to R and when someone tasks about Q, we say “The
> first rule of Ceph is that we don’t talk about Q”.
> >
> > — aad
> >
> >>
> >> How about the squid-headed alien species from Star Wars?
> >>
> >>
> https://en.wikipedia.org/wiki/List_of_Star_Wars_species_(P%E2%80%93T)#Quarren
> >>
> >>
> >>
> >>
> >> On Mon, Mar 23, 2020 at 6:11 PM Sage Weil  wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> As we wrap up Octopus and kick of development for Pacific, now it seems
> >>> like a good idea to sort out what to call the Q release.
> >>> Traditionally/historically, these have always been names of cephalopod
> >>> species--usually the "common name", but occasionally a latin name
> >>> (infernalis).
> >>>
> >>> Q is a bit of a challenge since there aren't many of either that start
> >>> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of
> nautilus
> >>> (https://en.wikipedia.org/wiki/Quebecoceras).
> >>>
> >>> The only other Q cephalopod reference I could find was Squidward Q
> >>> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
> >>> and Yehuda figured out that the Q stands for Quincy.
> >>>
> >>> So far that's it.  If you can find any other options, please catalog
> them
> >>> on the etherpad:
> >>>
> >>>   https://pad.ceph.com/p/q
> >>>
> >>> (or even get a head start on future releases.. they're always the
> >>> single-letter pads, e.g., https://pad.ceph.com/p/r).
> >>>
> >>> sage
> >>> ___
> >>> Dev mailing list -- d...@ceph.io
> >>> To unsubscribe send an email to dev-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Brian Topping

I liked the first one a lot. Until I read the second one.

> On Mar 23, 2020, at 11:29 AM, Anthony D'Atri  wrote:
> 
> That has potential.  Another, albeit suboptimal idea would be simply 
> 
> Quid
> 
> as in
> 
> ’S quid
> 
> as in “it’s squid”.  cf. https://en.wikipedia.org/wiki/%27S_Wonderful
> 
> Alternately just skip to R and when someone tasks about Q, we say “The first 
> rule of Ceph is that we don’t talk about Q”.
> 
> — aad
> 
>> 
>> How about the squid-headed alien species from Star Wars?
>> 
>> https://en.wikipedia.org/wiki/List_of_Star_Wars_species_(P%E2%80%93T)#Quarren
>> 
>> 
>> 
>> 
>> On Mon, Mar 23, 2020 at 6:11 PM Sage Weil  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> As we wrap up Octopus and kick of development for Pacific, now it seems
>>> like a good idea to sort out what to call the Q release.
>>> Traditionally/historically, these have always been names of cephalopod
>>> species--usually the "common name", but occasionally a latin name
>>> (infernalis).
>>> 
>>> Q is a bit of a challenge since there aren't many of either that start
>>> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus
>>> (https://en.wikipedia.org/wiki/Quebecoceras).
>>> 
>>> The only other Q cephalopod reference I could find was Squidward Q
>>> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
>>> and Yehuda figured out that the Q stands for Quincy.
>>> 
>>> So far that's it.  If you can find any other options, please catalog them
>>> on the etherpad:
>>> 
>>>   https://pad.ceph.com/p/q
>>> 
>>> (or even get a head start on future releases.. they're always the
>>> single-letter pads, e.g., https://pad.ceph.com/p/r).
>>> 
>>> sage
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Anthony D'Atri

That has potential.  Another, albeit suboptimal idea would be simply 

Quid

as in

’S quid

as in “it’s squid”.  cf. https://en.wikipedia.org/wiki/%27S_Wonderful

Alternately just skip to R and when someone tasks about Q, we say “The first 
rule of Ceph is that we don’t talk about Q”.

— aad

> 
> How about the squid-headed alien species from Star Wars?
> 
> https://en.wikipedia.org/wiki/List_of_Star_Wars_species_(P%E2%80%93T)#Quarren
> 
> 
> 
> 
> On Mon, Mar 23, 2020 at 6:11 PM Sage Weil  wrote:
>> 
>> Hi everyone,
>> 
>> As we wrap up Octopus and kick of development for Pacific, now it seems
>> like a good idea to sort out what to call the Q release.
>> Traditionally/historically, these have always been names of cephalopod
>> species--usually the "common name", but occasionally a latin name
>> (infernalis).
>> 
>> Q is a bit of a challenge since there aren't many of either that start
>> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus
>> (https://en.wikipedia.org/wiki/Quebecoceras).
>> 
>> The only other Q cephalopod reference I could find was Squidward Q
>> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
>> and Yehuda figured out that the Q stands for Quincy.
>> 
>> So far that's it.  If you can find any other options, please catalog them
>> on the etherpad:
>> 
>>https://pad.ceph.com/p/q
>> 
>> (or even get a head start on future releases.. they're always the
>> single-letter pads, e.g., https://pad.ceph.com/p/r).
>> 
>> sage
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Andrew Bruce

Quincy - Should be in the context of easily-solved storage failures that can 
occur only on Thursday nights between the hours of 2000 and 2100 with a strong 
emphasis on random chance for corrective actions and an incompetent local 
security group. Possibly not the best associations for a technology solution, 
but I'm perhaps dating myself a tad too much. What do I know, anyway?

> On Mar 23, 2020, at 1:19 PM, Brett Niver  wrote:
> 
> there's always Quahog here in New England, but I like Quincy.
> 
> 
> On Mon, Mar 23, 2020 at 1:13 PM Brian Topping 
> wrote:
> 
>> Maybe just call it Quincy and have a backstory? Might be fun...
>> 
>>> On Mar 23, 2020, at 11:11 AM, Sage Weil  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> As we wrap up Octopus and kick of development for Pacific, now it seems
>>> like a good idea to sort out what to call the Q release.
>>> Traditionally/historically, these have always been names of cephalopod
>>> species--usually the "common name", but occasionally a latin name
>>> (infernalis).
>>> 
>>> Q is a bit of a challenge since there aren't many of either that start
>>> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of
>> nautilus
>>> (https://en.wikipedia.org/wiki/Quebecoceras).
>>> 
>>> The only other Q cephalopod reference I could find was Squidward Q
>>> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
>>> and Yehuda figured out that the Q stands for Quincy.
>>> 
>>> So far that's it.  If you can find any other options, please catalog
>> them
>>> on the etherpad:
>>> 
>>>  https://pad.ceph.com/p/q
>>> 
>>> (or even get a head start on future releases.. they're always the
>>> single-letter pads, e.g., https://pad.ceph.com/p/r).
>>> 
>>> sage
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Dan van der Ster

How about the squid-headed alien species from Star Wars?

https://en.wikipedia.org/wiki/List_of_Star_Wars_species_(P%E2%80%93T)#Quarren




On Mon, Mar 23, 2020 at 6:11 PM Sage Weil  wrote:
>
> Hi everyone,
>
> As we wrap up Octopus and kick of development for Pacific, now it seems
> like a good idea to sort out what to call the Q release.
> Traditionally/historically, these have always been names of cephalopod
> species--usually the "common name", but occasionally a latin name
> (infernalis).
>
> Q is a bit of a challenge since there aren't many of either that start
> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus
> (https://en.wikipedia.org/wiki/Quebecoceras).
>
> The only other Q cephalopod reference I could find was Squidward Q
> Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
> and Yehuda figured out that the Q stands for Quincy.
>
> So far that's it.  If you can find any other options, please catalog them
> on the etherpad:
>
> https://pad.ceph.com/p/q
>
> (or even get a head start on future releases.. they're always the
> single-letter pads, e.g., https://pad.ceph.com/p/r).
>
> sage
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Brett Niver

there's always Quahog here in New England, but I like Quincy.


On Mon, Mar 23, 2020 at 1:13 PM Brian Topping 
wrote:

> Maybe just call it Quincy and have a backstory? Might be fun...
>
> > On Mar 23, 2020, at 11:11 AM, Sage Weil  wrote:
> >
> > Hi everyone,
> >
> > As we wrap up Octopus and kick of development for Pacific, now it seems
> > like a good idea to sort out what to call the Q release.
> > Traditionally/historically, these have always been names of cephalopod
> > species--usually the "common name", but occasionally a latin name
> > (infernalis).
> >
> > Q is a bit of a challenge since there aren't many of either that start
> > with Q.  Nick Barcet found one: quebecoceras, an extinct genus of
> nautilus
> > (https://en.wikipedia.org/wiki/Quebecoceras).
> >
> > The only other Q cephalopod reference I could find was Squidward Q
> > Tentacles, a character (octopus, strangely) from Spongebob Squarepants,
> > and Yehuda figured out that the Q stands for Quincy.
> >
> > So far that's it.  If you can find any other options, please catalog
> them
> > on the etherpad:
> >
> >   https://pad.ceph.com/p/q
> >
> > (or even get a head start on future releases.. they're always the
> > single-letter pads, e.g., https://pad.ceph.com/p/r).
> >
> > sage
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Q release name

2020-03-23 Thread Brian Topping

Maybe just call it Quincy and have a backstory? Might be fun...

> On Mar 23, 2020, at 11:11 AM, Sage Weil  wrote:
> 
> Hi everyone,
> 
> As we wrap up Octopus and kick of development for Pacific, now it seems 
> like a good idea to sort out what to call the Q release.  
> Traditionally/historically, these have always been names of cephalopod 
> species--usually the "common name", but occasionally a latin name 
> (infernalis).
> 
> Q is a bit of a challenge since there aren't many of either that start 
> with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus 
> (https://en.wikipedia.org/wiki/Quebecoceras).
> 
> The only other Q cephalopod reference I could find was Squidward Q 
> Tentacles, a character (octopus, strangely) from Spongebob Squarepants, 
> and Yehuda figured out that the Q stands for Quincy.
> 
> So far that's it.  If you can find any other options, please catalog them 
> on the etherpad:
> 
>   https://pad.ceph.com/p/q
> 
> (or even get a head start on future releases.. they're always the 
> single-letter pads, e.g., https://pad.ceph.com/p/r).
> 
> sage
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Q release name

2020-03-23 Thread Sage Weil

Hi everyone,

As we wrap up Octopus and kick of development for Pacific, now it seems 
like a good idea to sort out what to call the Q release.  
Traditionally/historically, these have always been names of cephalopod 
species--usually the "common name", but occasionally a latin name 
(infernalis).

Q is a bit of a challenge since there aren't many of either that start 
with Q.  Nick Barcet found one: quebecoceras, an extinct genus of nautilus 
(https://en.wikipedia.org/wiki/Quebecoceras).

The only other Q cephalopod reference I could find was Squidward Q 
Tentacles, a character (octopus, strangely) from Spongebob Squarepants, 
and Yehuda figured out that the Q stands for Quincy.

So far that's it.  If you can find any other options, please catalog them 
on the etherpad:

https://pad.ceph.com/p/q

(or even get a head start on future releases.. they're always the 
single-letter pads, e.g., https://pad.ceph.com/p/r).

sage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: No reply or very slow reply from Prometheus plugin - ceph-mgr 13.2.8 mimic

2020-03-23 Thread Janek Bevendorff

I dug up this issue report, where the problem has been reported before:
https://tracker.ceph.com/issues/39264

Unfortuantely, the issue hasn't got much (or any) attention yet. So
let's get this fixed, the prometheus module is unusable in its current
state.


On 23/03/2020 17:50, Janek Bevendorff wrote:
> I haven't seen any MGR hangs so far since I disabled the prometheus
> module. It seems like the module is not only slow, but kills the whole
> MGR when the cluster is sufficiently large, so these two issues are most
> likely connected. The issue has become much, much worse with 14.2.8.
>
>
> On 23/03/2020 09:00, Janek Bevendorff wrote:
>> I am running the very latest version of Nautilus. I will try setting up
>> an external exporter today and see if that fixes anything. Our cluster
>> is somewhat large-ish with 1248 OSDs, so I expect stat collection to
>> take "some" time, but it definitely shouldn't crush the MGRs all the time.
>>
>> On 21/03/2020 02:33, Paul Choi wrote:
>>> Hi Janek,
>>>
>>> What version of Ceph are you using?
>>> We also have a much smaller cluster running Nautilus, with no MDS. No
>>> Prometheus issues there.
>>> I won't speculate further than this but perhaps Nautilus doesn't have
>>> the same issue as Mimic?
>>>
>>> On Fri, Mar 20, 2020 at 12:23 PM Janek Bevendorff
>>> >> > wrote:
>>>
>>> I think this is related to my previous post to this list about MGRs
>>> failing regularly and being overall quite slow to respond. The problem
>>> has existed before, but the new version has made it way worse. My MGRs
>>> keep dyring every few hours and need to be restarted. the Promtheus
>>> plugin works, but it's pretty slow and so is the dashboard.
>>> Unfortunately, nobody seems to have a solution for this and I
>>> wonder why
>>> not more people are complaining about this problem.
>>>
>>>
>>> On 20/03/2020 19:30, Paul Choi wrote:
>>> > If I "curl http://localhost:9283/metrics; and wait sufficiently long
>>> > enough, I get this - says "No MON connection". But the mons are
>>> health and
>>> > the cluster is functioning fine.
>>> > That said, the mons' rocksdb sizes are fairly big because
>>> there's lots of
>>> > rebalancing going on. The Prometheus endpoint hanging seems to
>>> happen
>>> > regardless of the mon size anyhow.
>>> >
>>> >     mon.woodenbox0 is 41 GiB >= mon_data_size_warn (15 GiB)
>>> >     mon.woodenbox2 is 26 GiB >= mon_data_size_warn (15 GiB)
>>> >     mon.woodenbox4 is 42 GiB >= mon_data_size_warn (15 GiB)
>>> >     mon.woodenbox3 is 43 GiB >= mon_data_size_warn (15 GiB)
>>> >     mon.woodenbox1 is 38 GiB >= mon_data_size_warn (15 GiB)
>>> >
>>> > # fg
>>> > curl -H "Connection: close" http://localhost:9283/metrics
>>> > >> > "-//W3C//DTD XHTML 1.0 Transitional//EN"
>>> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
>>> > 
>>> > 
>>> >     
>>> >     503 Service Unavailable
>>> >     
>>> >     #powered_by {
>>> >         margin-top: 20px;
>>> >         border-top: 2px solid black;
>>> >         font-style: italic;
>>> >     }
>>> >
>>> >     #traceback {
>>> >         color: red;
>>> >     }
>>> >     
>>> > 
>>> >     
>>> >         503 Service Unavailable
>>> >         No MON connection
>>> >         Traceback (most recent call last):
>>> >   File
>>> "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670,
>>> > in respond
>>> >     response.body = self.handler()
>>> >   File
>>> "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
>>> > 217, in __call__
>>> >     self.body = self.oldhandler(*args, **kwargs)
>>> >   File
>>> "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61,
>>> > in __call__
>>> >     return self.callable(*self.args, **self.kwargs)
>>> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 704, in
>>> metrics
>>> >     return self._metrics(instance)
>>> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 721, in
>>> _metrics
>>> >     raise cherrypy.HTTPError(503, 'No MON connection')
>>> > HTTPError: (503, 'No MON connection')
>>> > 
>>> >     
>>> >       
>>> >         Powered by http://www.cherrypy.org;>CherryPy
>>> 3.5.0
>>> >       
>>> >     
>>> >     
>>> > 
>>> >
>>> > On Fri, Mar 20, 2020 at 6:33 AM Paul Choi >> > wrote:
>>> >
>>> >> Hello,
>>> >>
>>> >> We are running Mimic 13.2.8 with our cluster, and since
>>> upgrading to
>>> >> 13.2.8 the Prometheus plugin seems to hang a lot. It used to
>>> respond under
>>> >> 10s but now it often hangs. Restarting the mgr processes helps
>>> temporarily
>>> >> but within minutes it gets stuck again.
>>> >>
>>> >>

[ceph-users] Re: No reply or very slow reply from Prometheus plugin - ceph-mgr 13.2.8 mimic

2020-03-23 Thread Janek Bevendorff

I haven't seen any MGR hangs so far since I disabled the prometheus
module. It seems like the module is not only slow, but kills the whole
MGR when the cluster is sufficiently large, so these two issues are most
likely connected. The issue has become much, much worse with 14.2.8.


On 23/03/2020 09:00, Janek Bevendorff wrote:
> I am running the very latest version of Nautilus. I will try setting up
> an external exporter today and see if that fixes anything. Our cluster
> is somewhat large-ish with 1248 OSDs, so I expect stat collection to
> take "some" time, but it definitely shouldn't crush the MGRs all the time.
>
> On 21/03/2020 02:33, Paul Choi wrote:
>> Hi Janek,
>>
>> What version of Ceph are you using?
>> We also have a much smaller cluster running Nautilus, with no MDS. No
>> Prometheus issues there.
>> I won't speculate further than this but perhaps Nautilus doesn't have
>> the same issue as Mimic?
>>
>> On Fri, Mar 20, 2020 at 12:23 PM Janek Bevendorff
>> > > wrote:
>>
>> I think this is related to my previous post to this list about MGRs
>> failing regularly and being overall quite slow to respond. The problem
>> has existed before, but the new version has made it way worse. My MGRs
>> keep dyring every few hours and need to be restarted. the Promtheus
>> plugin works, but it's pretty slow and so is the dashboard.
>> Unfortunately, nobody seems to have a solution for this and I
>> wonder why
>> not more people are complaining about this problem.
>>
>>
>> On 20/03/2020 19:30, Paul Choi wrote:
>> > If I "curl http://localhost:9283/metrics; and wait sufficiently long
>> > enough, I get this - says "No MON connection". But the mons are
>> health and
>> > the cluster is functioning fine.
>> > That said, the mons' rocksdb sizes are fairly big because
>> there's lots of
>> > rebalancing going on. The Prometheus endpoint hanging seems to
>> happen
>> > regardless of the mon size anyhow.
>> >
>> >     mon.woodenbox0 is 41 GiB >= mon_data_size_warn (15 GiB)
>> >     mon.woodenbox2 is 26 GiB >= mon_data_size_warn (15 GiB)
>> >     mon.woodenbox4 is 42 GiB >= mon_data_size_warn (15 GiB)
>> >     mon.woodenbox3 is 43 GiB >= mon_data_size_warn (15 GiB)
>> >     mon.woodenbox1 is 38 GiB >= mon_data_size_warn (15 GiB)
>> >
>> > # fg
>> > curl -H "Connection: close" http://localhost:9283/metrics
>> > > > "-//W3C//DTD XHTML 1.0 Transitional//EN"
>> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
>> > 
>> > 
>> >     
>> >     503 Service Unavailable
>> >     
>> >     #powered_by {
>> >         margin-top: 20px;
>> >         border-top: 2px solid black;
>> >         font-style: italic;
>> >     }
>> >
>> >     #traceback {
>> >         color: red;
>> >     }
>> >     
>> > 
>> >     
>> >         503 Service Unavailable
>> >         No MON connection
>> >         Traceback (most recent call last):
>> >   File
>> "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670,
>> > in respond
>> >     response.body = self.handler()
>> >   File
>> "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
>> > 217, in __call__
>> >     self.body = self.oldhandler(*args, **kwargs)
>> >   File
>> "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61,
>> > in __call__
>> >     return self.callable(*self.args, **self.kwargs)
>> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 704, in
>> metrics
>> >     return self._metrics(instance)
>> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 721, in
>> _metrics
>> >     raise cherrypy.HTTPError(503, 'No MON connection')
>> > HTTPError: (503, 'No MON connection')
>> > 
>> >     
>> >       
>> >         Powered by http://www.cherrypy.org;>CherryPy
>> 3.5.0
>> >       
>> >     
>> >     
>> > 
>> >
>> > On Fri, Mar 20, 2020 at 6:33 AM Paul Choi > > wrote:
>> >
>> >> Hello,
>> >>
>> >> We are running Mimic 13.2.8 with our cluster, and since
>> upgrading to
>> >> 13.2.8 the Prometheus plugin seems to hang a lot. It used to
>> respond under
>> >> 10s but now it often hangs. Restarting the mgr processes helps
>> temporarily
>> >> but within minutes it gets stuck again.
>> >>
>> >> The active mgr doesn't exit when doing `systemctl stop
>> ceph-mgr.target"
>> >> and needs to
>> >>  be kill -9'ed.
>> >>
>> >> Is there anything I can do to address this issue, or at least
>> get better
>> >> visibility into the issue?
>> >>
>> >> We only have a few plugins enabled:
>> >> $ ceph mgr module ls
>> >> {
>> >>     "enabled_modules": [
>> >>         "balancer",

[ceph-users] OSD: FAILED ceph_assert(clone_size.count(clone))

2020-03-23 Thread Jake Grimmett


Dear All,

We are having problems with a critical osd crashing on a Nautilus 
(14.2.8) cluster.


This is a critical failure, as the osd is part of a pg that is otherwise 
"down+remapped" due to other osd's crashing; we were hoping the pg was 
going to repair itself, as there are plenty of free osds, but for some 
reason this pg never managed to get out of an undersized state.


The osd starts OK, runs for a few minutes, then crashes with an assert, 
immediately after trying to backfill the pg that is "down+remapped"


    -7> 2020-03-23 15:28:15.368 7f15aeea8700  5 osd.287 pg_epoch: 35531 
pg[5.750s2( v 35398'3381328 (35288'3378238,35398'3381328] 
local-lis/les=35530/35531 n=190408 ec=1821/1818 lis/c 35530/22903 
les/c/f 35531/22917/0 35486/35530/35530) 
[234,354,304,388,125,25,427,226,77,154]/[2147483647,2147483647,287,388,125,25,427,226,77,154]p287(2) 
backfill=[234(0),304(2),354(1)] r=2 lpr=35530 pi=[22903,35530)/9 rops=1 
crt=35398'3381328 lcod 0'0 mlcod 0'0 
active+undersized+degraded+remapped+backfilling mbc={} trimq=112 ps=121] 
backfill_pos is 5:0ae00653:::1000e49a8c6.00d3:head
    -6> 2020-03-23 15:28:15.381 7f15cc9ec700 10 monclient: 
get_auth_request con 0x555b2f229800 auth_method 0
    -5> 2020-03-23 15:28:15.381 7f15b86bb700  2 osd.287 35531 
ms_handle_reset con 0x555b2fef7400 session 0x555b2f363600
    -4> 2020-03-23 15:28:15.391 7f15c04c5700  5 prioritycache 
tune_memory target: 4294967296 mapped: 805339136 unmapped: 1032192 heap: 
806371328 old mem: 2845415832 new mem: 2845415832
    -3> 2020-03-23 15:28:15.420 7f15cc9ec700 10 monclient: 
get_auth_request con 0x555b2fef7800 auth_method 0
    -2> 2020-03-23 15:28:15.420 7f15b86bb700  2 osd.287 35531 
ms_handle_reset con 0x555b2fef7c00 session 0x555b2f363c00
    -1> 2020-03-23 15:28:15.476 7f15aeea8700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/osd/osd_types.cc: 
In function 'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread 
7f15aeea8700 time 2020-03-23 15:28:15.470166
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/osd/osd_types.cc: 
5443: FAILED ceph_assert(clone_size.count(clone))


osd log (127KB) is here: 



/var/log/ceph/ceph-osd.287.log.gz

when the osd was running, the pg state is as follows

[root@ceph7 ~]# ceph pg dump | grep ^5.750
5.750    190408  0  804    190119   0 
569643615603   0  0 3090 3090 
active+undersized+degraded+remapped+backfill_wait 2020-03-23 
14:37:57.582509  35398'3381328  35491:3265627 
[234,354,304,388,125,25,427,226,77,154]    234 
[NONE,NONE,287,388,125,25,427,226,77,154]    287 24471'3200829 
2020-01-28 15:48:35.574934   24471'3200829 2020-01-28 
15:48:35.574934   112


with the osd down:

[root@ceph7 ~]#  ceph pg dump | grep ^5.750
dumped all
5.750    190408  0    0 0   0 
569643615603   0  0 3090 
3090 down+remapped 2020-03-23 
15:28:28.345176  35398'3381328  35532:3265613 
[234,354,304,388,125,25,427,226,77,154]    234 
[NONE,NONE,NONE,388,125,25,427,226,77,154]    388 24471'3200829 
2020-01-28 15:48:35.574934   24471'3200829 2020-01-28 15:48:35.574934


This cluster is being used to backup a live cephfs cluster and has 1.8PB 
of data, including 30 days of snapshots. We are using 8+2 EC.


Any help appreciated,

Jake


Note: I am working from home until further notice.
For help, contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile 0776 9886539
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread gencer

Hi Volker,

Sure, here you go:


{"users": {"gencer": {"username": "gencer", "password": "", 
"roles": ["administrator"], "name": "Gencer Gen\u00e7", "email": "gencer@xxx", 
"lastUpdate": 1580029921, "enabled": true, "pwdExpirationDate": null}}, 
"roles": {}, "version": 2}


Gencer.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich

Hi Jan,

yes, I'm watching this TT as well, I'll post update there
(together with quick & dirty patch to get more debugging info)

BR

nik


On Mon, Mar 23, 2020 at 12:12:43PM +0100, Jan Fajerski wrote:
> https://tracker.ceph.com/issues/44184
> Looks similar, maybe you're also seeing other symptoms listed there?
> In any case would be good to track this in one place.
> 
> On Mon, Mar 23, 2020 at 11:29:53AM +0100, Nikola Ciprich wrote:
> >OK, so after some debugging, I've pinned the problem down to
> >OSDMonitor::get_trim_to:
> >
> >   std::lock_guard l(creating_pgs_lock);
> >   if (!creating_pgs.pgs.empty()) {
> > return 0;
> >   }
> >
> >apparently creating_pgs.pgs.empty() is not true, do I understand it
> >correctly that cluster thinks the list of creating pgs is not empty?
> >
> >all pgs are in clean+active state, so maybe there's something malformed
> >in the db? How can I check?
> >
> >I tried dumping list of creating_pgs according to
> >http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
> >but to no avail
> >
> >On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> >>Hello dear cephers,
> >>
> >>lately, there's been some discussion about slow requests hanging
> >>in "wait for new map" status. At least in my case, it's being caused
> >>by osdmaps not being properly trimmed. I tried all possible steps
> >>to force osdmap pruning (restarting mons, restarting everyging,
> >>poking crushmap), to no avail. Still all OSDs keep min osdmap version
> >>1, while newest is 4734. Otherwise cluster is healthy, with no down
> >>OSDs, network communication works flawlessly, all seems to be fine.
> >>Just can't get old osdmaps to go away.. I's very small cluster and I've
> >>moved all production traffic elsewhere, so I'm free to investigate
> >>and debug, however I'm out of ideas on what to try or where to look.
> >>
> >>Any ideas somebody please?
> >>
> >>The cluster is running 13.2.8
> >>
> >>I'd be very grateful for any tips
> >>
> >>with best regards
> >>
> >>nikola ciprich
> >>
> >>--
> >>-
> >>Ing. Nikola CIPRICH
> >>LinuxBox.cz, s.r.o.
> >>28.rijna 168, 709 00 Ostrava
> >>
> >>tel.:   +420 591 166 214
> >>fax:+420 596 621 273
> >>mobil:  +420 777 093 799
> >>www.linuxbox.cz
> >>
> >>mobil servis: +420 737 238 656
> >>email servis: ser...@linuxbox.cz
> >>-
> >>
> >
> >-- 
> >-
> >Ing. Nikola CIPRICH
> >LinuxBox.cz, s.r.o.
> >28.rijna 168, 709 00 Ostrava
> >
> >tel.:   +420 591 166 214
> >fax:+420 596 621 273
> >mobil:  +420 777 093 799
> >www.linuxbox.cz
> >
> >mobil servis: +420 737 238 656
> >email servis: ser...@linuxbox.cz
> >-
> >___
> >ceph-users mailing list -- ceph-users@ceph.io
> >To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> -- 
> Jan Fajerski
> Senior Software Engineer Enterprise Storage
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Fwd: RGW failing to create bucket

2020-03-23 Thread Abhinav Singh

-- Forwarded message -
From: Abhinav Singh 
Date: Mon, Mar 23, 2020 at 7:43 PM
Subject: RGW failing to create bucket
To: 


ceph : octopus
JaegerTracing : master
ubuntu : 18.04

When I implementing jaeger tracing it is unable to create a bucket.
(I m using swif to perform testing.)
/src/librados/IoCtxImpl.cc

```
void librados::IoCtxImpl::queue_aio_write(AioCompletionImpl *c)
{
std::cout<<"yes"StartSpan("Span1");
get();
ofstream file;
file.open("/home/abhinav/Desktop/write.txt",std::ios::out | std::ios::app);
file<<"Writing /src/librados/IoCtxImpl.cc 310.\n";
file.close();
std::scoped_lock l{aio_write_list_lock};
ceph_assert(c->io == this);
c->aio_write_seq = ++aio_write_seq;
ldout(client->cct, 20) << "queue_aio_write " << this << " completion " << c
<< " write_seq " << aio_write_seq << dendl;
aio_write_list.push_back(>aio_write_list_item);
// opentracing::Tracer::Global()->Close();
}
```
 /include/tracer.h
```
typedef std::unique_ptr Span;

class JTracer{
public:
JTracer(){}
~JTracer(){
opentracing::Tracer::Global()->Close();
}
void static inline loadYamlConfigFile(const char* path){
return;
}
void initTracer(const char* tracerName,const char* filePath){
auto yaml = YAML::LoadFile(filePath);
auto configuration = jaegertracing::Config::parse(yaml);
auto tracer = jaegertracing::Tracer::make(
tracerName,
configuration,
jaegertracing::logging::consoleLogger());
opentracing::Tracer::InitGlobal(
std::static_pointer_cast(tracer));
Span s=opentracing::Tracer::Global()->StartSpan("Testing");
s->Finish();
}
Span newSpan(const char* spanName){
Span span=opentracing::Tracer::Global()->StartSpan(spanName);
return std::move(span);
}
Span childSpan(const char* spanName,const Span& parentSpan){
Span span = opentracing::Tracer::Global()->StartSpan(spanName, {opentracing
::ChildOf(>context())});
return std::move(span);
}
Span followUpSpan(const char *spanName, const Span& parentSpan){
Span span = opentracing::Tracer::Global()->StartSpan(spanName, {opentracing
::FollowsFrom(>context())});
return std::move(span);
}
};
```

Output when trying to create new container

```
errno 111 connection refused
```
But when I remove the tracer part in IoCtxImpl.cc it is workng fine.

I m new to ceph, and dont know what information to share to correctly track
down the problem, if any extra informtion is needed I will share it
instantly.

Been stuck into this issue for one week.
Please someone help me!

Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich

OK, to reply myself :-)

I wasn't very smart about decoding the output of "ceph-kvstore-tool get ..."
so I added dump of creating_pgs.pgs into get_trim_to function.

now I have the list of PGs which seem to be stuck in creating state
in monitors DB. If i query them, they're active+clean as I wrote.

I suppose I could remove them using ceph-kvstore-tool, right?

however I'd rather ask before I proceed:

is it safe to remove them from DB, if they all seem to be already created?

how do I do it? Stop all monitors, use the tool and start them again?
(I've moved all the services to other cluster, so this won't cause any outage)

I'd be very grateful for guidance here..

thanks in advance

BR

nik


On Mon, Mar 23, 2020 at 11:29:53AM +0100, Nikola Ciprich wrote:
> OK, so after some debugging, I've pinned the problem down to
> OSDMonitor::get_trim_to:
> 
> std::lock_guard l(creating_pgs_lock);
> if (!creating_pgs.pgs.empty()) {
>   return 0;
> }
> 
> apparently creating_pgs.pgs.empty() is not true, do I understand it
> correctly that cluster thinks the list of creating pgs is not empty?
> 
> all pgs are in clean+active state, so maybe there's something malformed
> in the db? How can I check?
> 
> I tried dumping list of creating_pgs according to
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
> but to no avail
> 
> On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> > Hello dear cephers,
> > 
> > lately, there's been some discussion about slow requests hanging
> > in "wait for new map" status. At least in my case, it's being caused
> > by osdmaps not being properly trimmed. I tried all possible steps
> > to force osdmap pruning (restarting mons, restarting everyging,
> > poking crushmap), to no avail. Still all OSDs keep min osdmap version
> > 1, while newest is 4734. Otherwise cluster is healthy, with no down
> > OSDs, network communication works flawlessly, all seems to be fine.
> > Just can't get old osdmaps to go away.. I's very small cluster and I've
> > moved all production traffic elsewhere, so I'm free to investigate
> > and debug, however I'm out of ideas on what to try or where to look.
> > 
> > Any ideas somebody please?
> > 
> > The cluster is running 13.2.8
> > 
> > I'd be very grateful for any tips
> > 
> > with best regards
> > 
> > nikola ciprich
> > 
> > -- 
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] multi-node NFS Ganesha + libcephfs caching

2020-03-23 Thread Maged Mokhtar




Hello all,

For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write 
caching on, or should it be configured off for failover ?

Cheers /Maged

 
___

ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-23 Thread Eugen Block


Sorry, to clarify, you also need to restrict the clients to mimic or
later to use RBD clone v2 in the default "auto" version selection
mode:

$ ceph osd set-require-min-compat-client mimic


Ah, of course, thanks for the clarification.


Zitat von Jason Dillaman :


On Mon, Mar 23, 2020 at 5:02 AM Eugen Block  wrote:


> To be able to mount the mirrored rbd image (without a protected snapshot):
>   rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> --cluster backup
>
> I just need to upgrade my backup cluster?

No, that only works with snapshots. Although I'm not sure if you can
really skip the protection. I have two Octopus lab clusters and this
procedure only works with a protected snapshot. If I try to clone the
unprotected snapshot in the remote cluster I get an error:

remote:~ # rbd clone pool1/image1@snap1 pool2/image1
2020-03-23T09:55:54.813+0100 7f7cc6ffd700 -1
librbd::image::CloneRequest: 0x558af5fb9f00 validate_parent: parent
snapshot must be protected
rbd: clone error: (22) Invalid argument


Sorry, to clarify, you also need to restrict the clients to mimic or
later to use RBD clone v2 in the default "auto" version selection
mode:

$ ceph osd set-require-min-compat-client mimic




Zitat von Ml Ml :

> okay, so i have ceph version 14.2.6 nautilus on my source cluster and
> ceph version 12.2.13 luminous on my backup clouster.
>
> To be able to mount the mirrored rbd image (without a protected snapshot):
>   rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> --cluster backup
>
> I just need to upgrade my backup cluster?
>
>
> Thanks,
> Michael
>
> On Thu, Mar 19, 2020 at 1:06 PM Jason Dillaman  
 wrote:

>>
>> On Thu, Mar 19, 2020 at 6:19 AM Eugen Block  wrote:
>> >
>> > Hi,
>> >
>> > one workaround would be to create a protected snapshot on the primary
>> > image which is also mirrored, and then clone that snapshot on the
>> > remote site. That clone can be accessed as required.
>>
>> +1. This is the correct approach. If you are using a Mimic+ cluster
>> (i.e. require OSD release >= Mimic), you can use skip the protect
>> step.
>>
>> > I'm not sure if there's a way to directly access the remote image
>> > since it's read-only.
>> >
>> > Regards,
>> > Eugen
>> >
>> >
>> > Zitat von Ml Ml :
>> >
>> > > Hello,
>> > >
>> > > my goal is to back up a proxmox cluster with rbd-mirror for desaster
>> > > recovery. Promoting/Demoting, etc.. works great.
>> > >
>> > > But how can i access a single file on the mirrored cluster? I tried:
>> > >
>> > >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
>> > > --cluster backup
>> > >/dev/nbd1
>> > >
>> > > But i get:
>> > >root@ceph01:~# fdisk -l /dev/nbd1
>> > >fdisk: cannot open /dev/nbd1: Input/output error
>> > >
>> > > dmesg shows stuff like:
>> > >[Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
>> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side  
returned error (30)
>> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side  
returned error (30)

>> > >
>> > > Here is my state:
>> > >
>> > > root@ceph01:~# rbd --cluster backup mirror pool status
>> cluster5-rbd --verbose
>> > > health: OK
>> > > images: 3 total
>> > > 3 replaying
>> > >
>> > > vm-106-disk-0:
>> > >   global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
>> > >   state:   up+replaying
>> > >   description: replaying, master_position=[object_number=3,  
tag_tid=2,

>> > > entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
>> > > entry_tid=3], entries_behind_master=0
>> > >   last_update: 2020-03-19 09:29:17
>> > >
>> > > vm-114-disk-1:
>> > >   global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
>> > >   state:   up+replaying
>> > >   description: replaying, master_position=[object_number=390,
>> > > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
>> > > tag_tid=6, entry_tid=328526], entries_behind_master=5764
>> > >   last_update: 2020-03-19 09:29:17
>> > >
>> > > vm-115-disk-0:
>> > >   global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
>> > >   state:   up+replaying
>> > >   description: replaying, master_position=[object_number=72,
>> > > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
>> > > tag_tid=1, entry_tid=67796], entries_behind_master=0
>> > >   last_update: 2020-03-19 09:29:17
>> > >
>> > > More dmesg stuff:
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
>> > > [Thu Mar 19 09:30:02 2020]

[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-23 Thread Jason Dillaman

On Mon, Mar 23, 2020 at 5:02 AM Eugen Block  wrote:
>
> > To be able to mount the mirrored rbd image (without a protected snapshot):
> >   rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> > --cluster backup
> >
> > I just need to upgrade my backup cluster?
>
> No, that only works with snapshots. Although I'm not sure if you can
> really skip the protection. I have two Octopus lab clusters and this
> procedure only works with a protected snapshot. If I try to clone the
> unprotected snapshot in the remote cluster I get an error:
>
> remote:~ # rbd clone pool1/image1@snap1 pool2/image1
> 2020-03-23T09:55:54.813+0100 7f7cc6ffd700 -1
> librbd::image::CloneRequest: 0x558af5fb9f00 validate_parent: parent
> snapshot must be protected
> rbd: clone error: (22) Invalid argument

Sorry, to clarify, you also need to restrict the clients to mimic or
later to use RBD clone v2 in the default "auto" version selection
mode:

$ ceph osd set-require-min-compat-client mimic

>
>
> Zitat von Ml Ml :
>
> > okay, so i have ceph version 14.2.6 nautilus on my source cluster and
> > ceph version 12.2.13 luminous on my backup clouster.
> >
> > To be able to mount the mirrored rbd image (without a protected snapshot):
> >   rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> > --cluster backup
> >
> > I just need to upgrade my backup cluster?
> >
> >
> > Thanks,
> > Michael
> >
> > On Thu, Mar 19, 2020 at 1:06 PM Jason Dillaman  wrote:
> >>
> >> On Thu, Mar 19, 2020 at 6:19 AM Eugen Block  wrote:
> >> >
> >> > Hi,
> >> >
> >> > one workaround would be to create a protected snapshot on the primary
> >> > image which is also mirrored, and then clone that snapshot on the
> >> > remote site. That clone can be accessed as required.
> >>
> >> +1. This is the correct approach. If you are using a Mimic+ cluster
> >> (i.e. require OSD release >= Mimic), you can use skip the protect
> >> step.
> >>
> >> > I'm not sure if there's a way to directly access the remote image
> >> > since it's read-only.
> >> >
> >> > Regards,
> >> > Eugen
> >> >
> >> >
> >> > Zitat von Ml Ml :
> >> >
> >> > > Hello,
> >> > >
> >> > > my goal is to back up a proxmox cluster with rbd-mirror for desaster
> >> > > recovery. Promoting/Demoting, etc.. works great.
> >> > >
> >> > > But how can i access a single file on the mirrored cluster? I tried:
> >> > >
> >> > >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> >> > > --cluster backup
> >> > >/dev/nbd1
> >> > >
> >> > > But i get:
> >> > >root@ceph01:~# fdisk -l /dev/nbd1
> >> > >fdisk: cannot open /dev/nbd1: Input/output error
> >> > >
> >> > > dmesg shows stuff like:
> >> > >[Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
> >> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error 
> >> > > (30)
> >> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error 
> >> > > (30)
> >> > >
> >> > > Here is my state:
> >> > >
> >> > > root@ceph01:~# rbd --cluster backup mirror pool status
> >> cluster5-rbd --verbose
> >> > > health: OK
> >> > > images: 3 total
> >> > > 3 replaying
> >> > >
> >> > > vm-106-disk-0:
> >> > >   global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
> >> > >   state:   up+replaying
> >> > >   description: replaying, master_position=[object_number=3, tag_tid=2,
> >> > > entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
> >> > > entry_tid=3], entries_behind_master=0
> >> > >   last_update: 2020-03-19 09:29:17
> >> > >
> >> > > vm-114-disk-1:
> >> > >   global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
> >> > >   state:   up+replaying
> >> > >   description: replaying, master_position=[object_number=390,
> >> > > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
> >> > > tag_tid=6, entry_tid=328526], entries_behind_master=5764
> >> > >   last_update: 2020-03-19 09:29:17
> >> > >
> >> > > vm-115-disk-0:
> >> > >   global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
> >> > >   state:   up+replaying
> >> > >   description: replaying, master_position=[object_number=72,
> >> > > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
> >> > > tag_tid=1, entry_tid=67796], entries_behind_master=0
> >> > >   last_update: 2020-03-19 09:29:17
> >> > >
> >> > > More dmesg stuff:
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> >> > > [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
>

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread Volker Theile

Hi Gencer,

could you please post the output of

$ ceph config-key get "mgr/dashboard/accessdb_v2"

Regards
Volker

Am 22.03.20 um 09:37 schrieb Gencer W. Genç:
> After upgrading from 15.1.0 to 15.1.1 of Octopus im seeing this error for
> dashboard:
>
>  
>
>   cluster:
>
> id: c5233cbc-e9c2-4db3-85e1-423737a95a8c
>
> health: HEALTH_ERR
>
> Module 'dashboard' has failed: ('pwdUpdateRequired',)
>
>  
>
> Also executing any  command result in:
>
>  
>
> Error EIO: Module 'dashboard' has experienced an error and cannot handle
> commands:
>
> ('pwdUpdateRequired',)
>
>  
>
> How can i fix this problem? By disabling dashboard, Ceph status is back to
> OK but if i enable ceph dashboard then ERR given.
>
>  
>
> Thanks,
>
> Gencer.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Volker Theile
Software Engineer | Ceph | openATTIC
Phone: +49 173 5876879
E-Mail: vthe...@suse.com

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany

(HRB 36809, AG Nürnberg)
Managing Director: Felix Imendörffer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph ignoring cluster/public_network when initiating TCP connections

2020-03-23 Thread Dave Hall

Liviu,

All due respect, the settings I suggested should cause the kernel to 
always pick the right source IP for a given destination IP, even when 
both NICs are connected to the same physical subnet. Except maybe if you 
have a default route on your private interface - you should only have 
one default route assigned to your public interface.

Would  you be willing to post the output of ' ip route' for one of your 
nodes and maybe one of your clients?

Another note:  the last time I used NAT on a server with a lot of TCP 
connections I ran into performance problems due to the CONNTRACK table.  
While that was many kernels ago, the principle is that once any NAT rule 
is added the kernel has to add an entry in the CONNTRACK table for 
*every* TCP connection and then has to do a lookup for every packet.  At 
that time the CONNTRACK table size was fixed and needed to be expanded, 
but then the lookups got even slower.

-Dave

Dave Hall
Binghamton University
kdh...@binghamton.edu

On 3/23/2020 12:21 AM, Liviu Sas wrote:

Hi Dave,

Thank you for the answer.

Unfortunately the issue is that ceph uses the wrong source IP address, 
and sends the traffic on the wrong interface anyway.
Would be good if ceph could actually set the source IP address to the 
cluster/public IP when initiating a TCP connection.

I managed to come up with a workaround by source nating the ceph 
traffic to the desired IP address in the POSTROUTING table.

eg: node1:
iptables -t nat -A POSTROUTING -s 10.2.0.2 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.1
iptables -t nat -A POSTROUTING -s 10.2.0.5 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.1

node2:
iptables -t nat -A POSTROUTING -s 10.2.0.6 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.2
iptables -t nat -A POSTROUTING -s 10.2.0.9 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.2

node3:
iptables -t nat -A POSTROUTING -s 10.2.0.10 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.3
iptables -t nat -A POSTROUTING -s 10.2.0.1 -d 10.2.1.0/24 
 -j SNAT  --to 10.2.1.3

Where 10.2.0.x is the IP address of the interfaces that should not be 
used.

I still need to thoroughly test it tho.

On Mon, Mar 23, 2020 at 4:59 PM Dave Hall > wrote:

Liviu,

I've found that for Linux systems with multiple NICs the default
kernel
settings allow the behavior you're seeing. To prevent this I
always add
the following to my /etc/sysctl settings, usually in
/etc/sysctl.d/rp_filter.conf:

    net.ipv4.conf.default.rp_filter=1
    net.ipv4.conf.all.rp_filter=1

    net.ipv4.conf.all.arp_ignore=1
    net.ipv4.conf.all.arp_announce=2

The rp_filter lines have to do with keeping packets going in and
out of
the interface that matches the IP.  The two ARP lines have to do with
making sure that only the correct interface responds to ARP requests.

-Dave

Dave Hall
Binghamton University
kdh...@binghamton.edu 

On 3/22/2020 8:03 PM, Liviu Sas wrote:
> Hello,
>
> While testing our ceph cluster setup, I noticed a possible issue
with the
> cluster/public network configuration being ignored for TCP session
> initiation.
>
> Looks like the daemons (mon/mgr/mds/osd) are all listening on
the right IP
> address but are initiating TCP sessions from the wrong interfaces.
> Would it be possible to force ceph daemons to use the
cluster/public IP
> addresses to initiate new TCP connections instead of letting the
kernel
> chose?
>
> Some details below:
>
> We set everything up to use our "10.2.1.0/24
" network:
> 10.2.1.x (x=node number 1,2,3)
> But we can see TCP sessions being initiated from "10.2.0.0/24
" network.
>
> So the daemons are listening to the right IP addresses.
> root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep LISTE
> ceph-mds  1541648             ceph   16u     IPv4     8169344
>   0t0        TCP 10.2.1.1:6800  (LISTEN)
> ceph-mds  1541648             ceph   17u     IPv4     8169346
>   0t0        TCP 10.2.1.1:6801  (LISTEN)
> ceph-mgr  1541654             ceph   25u     IPv4     8163039
>   0t0        TCP 10.2.1.1:6810  (LISTEN)
> ceph-mgr  1541654             ceph   27u     IPv4     8163051
>   0t0        TCP 10.2.1.1:6811  (LISTEN)
> ceph-mon  1541703             ceph   27u     IPv4     8170914
>   0t0        TCP 10.2.1.1:3300  (LISTEN)
> ceph-mon  1541703             ceph   28u     IPv4     8170915
>   0t0        TCP 10.2.1.1:6789  (LISTEN)
> ceph-osd  1541711             ceph   16u     IPv4     8169353
>   0t0        TCP 10.2.1.1:6802

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread Lenz Grimmer

On 2020-03-23 12:23, Gencer W. Genç wrote:

> Yeah, I saw the PR and i still hit the issue today. In the meantime
> while @Volker investigates, is there a workaround to bring dashboard
> back? Or should I wait for @Volker's investigation?

The workaround is likely to remove the user account, so it can be
re-created with the required pwdUpdateRequired field included in the
record. However, the command to remove dashboard user accounts needs a
running dashboard module... We probably have a chicken-and-egg situation
here. We're looking into a way to remove the record from the MON
database directly, without using the dashboard built-in commands.

> P.S.: I fopund this too: https://tracker.ceph.com/issues/44271

Yep, that also related to this topic. The dashboard code that performs
an upgrade of the user database to a new structure only takes care of
changes between nautilus and octopus, but did not kick in for updates
from one octopus RC to the next.

Lenz

-- 
SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)

signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] How can I recover PGs in state 'unknown', where OSD location seems to be lost?

2020-03-23 Thread Mark S. Holliman

Hi all,

I have a large distributed ceph cluster that recently broke with all PGs housed 
at a single site getting marked as 'unknown' after a run of the Ceph Ansible 
playbook (which was being used to expand the cluster at a third site).  Is 
there a way to recover the location of PGs in this state, or a way to fall back 
to a previous config where things were working?  Or a way to scan the OSDs to 
determine which PGs are housed there?  All the OSDs are still in place and 
reporting as healthy, it's just the PG locations that are missing.  For info: 
the ceph cluster is used to provide a single shared CephFS mount for a 
distributed batch cluster, and it includes workers and pools of OSDs from three 
different OpenStack clouds.

Ceph version: 13.2.8

Here is the system health:

[root@euclid-edi-ctrl-0 ~]# ceph -s
  cluster:
id: 0fe7e967-ecd6-46d4-9f6b-224539073d3b
health: HEALTH_WARN
insufficient standby MDS daemons available
1 MDSs report slow metadata IOs
Reduced data availability: 1024 pgs inactive
6 slow ops, oldest one blocked for 244669 sec, 
mon.euclid-edi-ctrl-0 has slow ops
too few PGs per OSD (26 < min 30)

  services:
mon: 4 daemons, quorum 
euclid-edi-ctrl-0,euclid-cam-proxy-0,euclid-imp-proxy-0,euclid-ral-proxy-0
mgr: euclid-edi-ctrl-0(active), standbys: euclid-imp-proxy-0, 
euclid-cam-proxy-0, euclid-ral-proxy-0
mds: cephfs-2/2/2 up  
{0=euclid-ral-proxy-0=up:active,1=euclid-cam-proxy-0=up:active}
osd: 269 osds: 269 up, 269 in

  data:
pools:   5 pools, 5120 pgs
objects: 30.54 M objects, 771 GiB
usage:   3.8 TiB used, 41 TiB / 45 TiB avail
pgs: 20.000% pgs unknown
 4095 active+clean
 1024 unknown
 1active+clean+scrubbing

OSD Pools:
[root@euclid-edi-ctrl-0 ~]# ceph osd lspools
1 cephfs_data
2 cephfs_metadata
3 euclid_cam
4 euclid_ral
5 euclid_imp
[root@euclid-edi-ctrl-0 ~]# ceph pg dump_pools_json
dumped pools
POOLID OBJECTS  MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG
5 0  00 0   00  
 0  0   00
1  16975540  00 0   0  79165311663  
 0  0 6243475  6243475
2   5171099  00 0   0551991405   
126879876 270829 3122183  3122183
3   8393436  00 0   0 748466429315  
 0  0 1556647  1556647
4 0  00 0   00  
 0  0   00

[root@euclid-edi-ctrl-0 ~]# ceph health detail
...
PG_AVAILABILITY Reduced data availability: 1024 pgs inactive
pg 4.3c8 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3ca is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3cb is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d0 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d1 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d2 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d3 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d4 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d5 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d6 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d7 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d8 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3d9 is stuck inactive for 246794.767182, current state unknown, last 
acting []
pg 4.3da is stuck inactive for 246794.767182, current state unknown, last 
acting []
...
[root@euclid-edi-ctrl-0 ~]# ceph pg map 4.3c8
osdmap e284992 pg 4.3c8 (4.3c8) -> up [] acting []

Cheers,
  Mark


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [15.1.1-rc] - "Module 'dashboard' has failed: ('pwdUpdateRequired',)"

2020-03-23 Thread Gencer W . Genç

Hi Lenz,

 

Yeah, I saw the PR and i still hit the issue today. In the meantime while
@Volker investigates, is there a workaround to bring dashboard back? Or
should I wait for @Volker's investigation?

 

P.S.: I fopund this too: https://tracker.ceph.com/issues/44271

 

Thanks,

Gencer.

 

 

=== REPLY ABOVE ===

Hi Gencer,

 

On 2020-03-22 09:37, Gencer W. Genç wrote:

 

...

 

Hmm, I thought we had fixed that bug by merging the following fix:

 

https://github.com/ceph/ceph/pull/33513

 

@Volker, would you mind taking a look at this? Thanks in advance!

 

Lenz

 

--

SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg

GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Jan Fajerski


https://tracker.ceph.com/issues/44184
Looks similar, maybe you're also seeing other symptoms listed there? In any case 
would be good to track this in one place.


On Mon, Mar 23, 2020 at 11:29:53AM +0100, Nikola Ciprich wrote:

OK, so after some debugging, I've pinned the problem down to
OSDMonitor::get_trim_to:

   std::lock_guard l(creating_pgs_lock);
   if (!creating_pgs.pgs.empty()) {
 return 0;
   }

apparently creating_pgs.pgs.empty() is not true, do I understand it
correctly that cluster thinks the list of creating pgs is not empty?

all pgs are in clean+active state, so maybe there's something malformed
in the db? How can I check?

I tried dumping list of creating_pgs according to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
but to no avail

On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:

Hello dear cephers,

lately, there's been some discussion about slow requests hanging
in "wait for new map" status. At least in my case, it's being caused
by osdmaps not being properly trimmed. I tried all possible steps
to force osdmap pruning (restarting mons, restarting everyging,
poking crushmap), to no avail. Still all OSDs keep min osdmap version
1, while newest is 4734. Otherwise cluster is healthy, with no down
OSDs, network communication works flawlessly, all seems to be fine.
Just can't get old osdmaps to go away.. I's very small cluster and I've
moved all production traffic elsewhere, so I'm free to investigate
and debug, however I'm out of ideas on what to try or where to look.

Any ideas somebody please?

The cluster is running 13.2.8

I'd be very grateful for any tips

with best regards

nikola ciprich

--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-



--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: can't get healthy cluster to trim osdmaps (13.2.8)

2020-03-23 Thread Nikola Ciprich

OK, so after some debugging, I've pinned the problem down to
OSDMonitor::get_trim_to:

std::lock_guard l(creating_pgs_lock);
if (!creating_pgs.pgs.empty()) {
  return 0;
}

apparently creating_pgs.pgs.empty() is not true, do I understand it
correctly that cluster thinks the list of creating pgs is not empty?

all pgs are in clean+active state, so maybe there's something malformed
in the db? How can I check?

I tried dumping list of creating_pgs according to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030297.html
but to no avail

On Tue, Mar 17, 2020 at 12:25:29PM +0100, Nikola Ciprich wrote:
> Hello dear cephers,
> 
> lately, there's been some discussion about slow requests hanging
> in "wait for new map" status. At least in my case, it's being caused
> by osdmaps not being properly trimmed. I tried all possible steps
> to force osdmap pruning (restarting mons, restarting everyging,
> poking crushmap), to no avail. Still all OSDs keep min osdmap version
> 1, while newest is 4734. Otherwise cluster is healthy, with no down
> OSDs, network communication works flawlessly, all seems to be fine.
> Just can't get old osdmaps to go away.. I's very small cluster and I've
> moved all production traffic elsewhere, so I'm free to investigate
> and debug, however I'm out of ideas on what to try or where to look.
> 
> Any ideas somebody please?
> 
> The cluster is running 13.2.8
> 
> I'd be very grateful for any tips
> 
> with best regards
> 
> nikola ciprich
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph pool quotas

2020-03-23 Thread Robert Sander

Am 21.03.20 um 05:51 schrieb Konstantin Shalygin:
> On 3/18/20 10:09 PM, Stolte, Felix wrote:
>> a short question about pool quotas. Do they apply to stats attributes
>> “stored” or “bytes_used” (Is replication counted for or not)?
> 
> Quotas is for total used space for this pool on OSD's. So this is
> threshold for bytes_used.

Could the documentation be changed to express this more clearly, please?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-23 Thread Eugen Block


To be able to mount the mirrored rbd image (without a protected snapshot):
  rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
--cluster backup

I just need to upgrade my backup cluster?


No, that only works with snapshots. Although I'm not sure if you can  
really skip the protection. I have two Octopus lab clusters and this  
procedure only works with a protected snapshot. If I try to clone the  
unprotected snapshot in the remote cluster I get an error:


remote:~ # rbd clone pool1/image1@snap1 pool2/image1
2020-03-23T09:55:54.813+0100 7f7cc6ffd700 -1  
librbd::image::CloneRequest: 0x558af5fb9f00 validate_parent: parent  
snapshot must be protected

rbd: clone error: (22) Invalid argument



Zitat von Ml Ml :


okay, so i have ceph version 14.2.6 nautilus on my source cluster and
ceph version 12.2.13 luminous on my backup clouster.

To be able to mount the mirrored rbd image (without a protected snapshot):
  rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
--cluster backup

I just need to upgrade my backup cluster?


Thanks,
Michael

On Thu, Mar 19, 2020 at 1:06 PM Jason Dillaman  wrote:


On Thu, Mar 19, 2020 at 6:19 AM Eugen Block  wrote:
>
> Hi,
>
> one workaround would be to create a protected snapshot on the primary
> image which is also mirrored, and then clone that snapshot on the
> remote site. That clone can be accessed as required.

+1. This is the correct approach. If you are using a Mimic+ cluster
(i.e. require OSD release >= Mimic), you can use skip the protect
step.

> I'm not sure if there's a way to directly access the remote image
> since it's read-only.
>
> Regards,
> Eugen
>
>
> Zitat von Ml Ml :
>
> > Hello,
> >
> > my goal is to back up a proxmox cluster with rbd-mirror for desaster
> > recovery. Promoting/Demoting, etc.. works great.
> >
> > But how can i access a single file on the mirrored cluster? I tried:
> >
> >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> > --cluster backup
> >/dev/nbd1
> >
> > But i get:
> >root@ceph01:~# fdisk -l /dev/nbd1
> >fdisk: cannot open /dev/nbd1: Input/output error
> >
> > dmesg shows stuff like:
> >[Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
> >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >
> > Here is my state:
> >
> > root@ceph01:~# rbd --cluster backup mirror pool status  
cluster5-rbd --verbose

> > health: OK
> > images: 3 total
> > 3 replaying
> >
> > vm-106-disk-0:
> >   global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=3, tag_tid=2,
> > entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
> > entry_tid=3], entries_behind_master=0
> >   last_update: 2020-03-19 09:29:17
> >
> > vm-114-disk-1:
> >   global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=390,
> > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
> > tag_tid=6, entry_tid=328526], entries_behind_master=5764
> >   last_update: 2020-03-19 09:29:17
> >
> > vm-115-disk-0:
> >   global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=72,
> > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
> > tag_tid=1, entry_tid=67796], entries_behind_master=0
> >   last_update: 2020-03-19 09:29:17
> >
> > More dmesg stuff:
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev  
nbd1, sector 0

> > [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 0, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev  
nbd1, sector 1

> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 1, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev  
nbd1, sector 2

> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 2, async

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-23 Thread Thomas Schneider

Hello Martin,

that is much less than I experienced of allocated disk space in case
something is wrong with the cluster.
I have defined at least 10GB and there were situations (in the past)
when this space was quickly allocated by
syslog
user.log
messages
daemon.log

Regards
Thomas

Am 23.03.2020 um 09:39 schrieb Martin Verges:
> Hello Thomas,
>
> by default we allocate 1GB per Host on the Management Node, nothing on
> the PXE booted server.
>
> This value can be changed in the management container config file
> (/config/config.yml):
> > ...
> > logFilesPerServerGB: 1
> > ...
> After changing the config, you need to restart the mgmt container.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider
> <74cmo...@gmail.com >:
>
> Hello Martin,
>
> how much disk space do you reserve for log in the PXE setup?
>
> Regards
> Thomas
>
> Am 22.03.2020 um 20:50 schrieb Martin Verges:
> > Hello Samuel,
> >
> > we from croit.io  don't use NFS to boot up
> Servers. We copy the OS directly
> > into the RAM (approximately 0.5-1GB). Think of it like a
> container, you
> > start it and throw it away when you no longer need it.
> > This way we can save the slots of OS harddisks to add more
> storage per node
> > and reduce overall costs as 1GB ram is cheaper then an OS disk
> and consumes
> > less power.
> >
> > If our management node is down, nothing will happen to the
> cluster. No
> > impact, no downtime. However, you do need the mgmt node to boot
> up the
> > cluster. So after a very rare total power outage, your first
> system would
> > be the mgmt node and then the cluster itself. But again, if you
> configure
> > your systems correct, no manual work is required to recover from
> that. For
> > everything else, it is possible (but definitely not needed) to
> deploy our
> > mgmt node in active/passive HA.
> >
> > We have multiple hundred installations worldwide in production
> > environments. Our strong PXE knowledge comes from more than 20
> years of
> > datacenter hosting experience and it never ever failed us in the
> last >10
> > years.
> >
> > The main benefits out of that:
> >  - Immutable OS freshly booted: Every host has exactly the same
> version,
> > same library, kernel, Ceph versions,...
> >  - OS is heavily tested by us: Every croit deployment has
> exactly the same
> > image. We can find errors much faster and hit much fewer errors.
> >  - Easy Update: Updating OS, Ceph or anything else is just a
> node reboot.
> > No cluster downtime, No service Impact, full automatic handling
> by our mgmt
> > Software.
> >  - No need to install OS: No maintenance costs, no labor
> required, no other
> > OS management required.
> >  - Centralized Logs/Stats: As it is booted in memory, all logs and
> > statistics are collected on a central place for easy access.
> >  - Easy to scale: It doesn't matter if you boot 3 oder 300
> nodes, all
> > boot the exact same image in a few seconds.
> >  .. lots more
> >
> > Please do not hesitate to contact us directly. We always try to
> offer an
> > excellent service and are strongly customer oriented.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io 
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn
>  <
> > huxia...@horebdata.cn >:
> >
> >> Hello， Martin，
> >>
> >> I notice that Croit advocate the use of ceph cluster without OS
> disks, but
> >> with PXE boot.
> >>
> >> Do you use a NFS server to serve the root file system for each
> node? such
> >> as hosting configuration files, user and password, log files,
> etc. My
> >> question is, will the NFS server be a single point of failure?
> If the NFS
> >> server goes down, the network experience any outage, ceph nodes
> may not be
> >> able to write to the local file systems, possibly leading to
>

[ceph-users] Re: Identify slow ops

2020-03-23 Thread Thomas Schneider

Hi,

I have upgraded to 14.2.8 and rebooted all nodes sequentially including
all 3 MON services.
However the slow ops are still displayed with increasing block time.
# ceph -s
  cluster:
    id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
    health: HEALTH_WARN
    17 daemons have recently crashed
    2263 slow ops, oldest one blocked for 183885 sec, mon.ld5505
has slow ops

  services:
    mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2d)
    mgr: ld5505(active, since 2d), standbys: ld5506, ld5507
    mds: cephfs:2 {0=ld4257=up:active,1=ld5508=up:active} 2
up:standby-replay 3 up:standby
    osd: 442 osds: 441 up (since 38h), 441 in (since 38h)

  data:
    pools:   7 pools, 19628 pgs
    objects: 68.65M objects, 262 TiB
    usage:   786 TiB used, 744 TiB / 1.5 PiB avail
    pgs: 19628 active+clean

  io:
    client:   3.3 KiB/s rd, 3.1 MiB/s wr, 7 op/s rd, 25 op/s wr

I have the impression that this is not a harmless bug anymore.

Please advise how to proceed.

THX


Am 17.02.2020 um 18:31 schrieb Paul Emmerich:
> that's probably just https://tracker.ceph.com/issues/43893
> (a harmless bug)
>
> Restart the mons to get rid of the message
>
> Paul
>
> -- Paul Emmerich Looking for help with your Ceph cluster? Contact us
> at https://croit.io croit GmbH Freseniusstr. 31h 81247 München
> www.croit.io Tel: +49 89 1896585 90 On Mon, Feb 17, 2020 at 2:59 PM
> Thomas Schneider <74cmo...@gmail.com> wrote:
>> Hi,
>>
>> the current output of ceph -s reports a warning:
>> 2 slow ops, oldest one blocked for 347335 sec, mon.ld5505 has slow ops
>> This time is increasing.
>>
>> root@ld3955:~# ceph -s
>>   cluster:
>> id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
>> health: HEALTH_WARN
>> 9 daemons have recently crashed
>> 2 slow ops, oldest one blocked for 347335 sec, mon.ld5505
>> has slow ops
>>
>>   services:
>> mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 3d)
>> mgr: ld5507(active, since 8m), standbys: ld5506, ld5505
>> mds: cephfs:2 {0=ld5507=up:active,1=ld5505=up:active} 2
>> up:standby-replay 3 up:standby
>> osd: 442 osds: 442 up (since 8d), 442 in (since 9d)
>>
>>   data:
>> pools:   7 pools, 19628 pgs
>> objects: 65.78M objects, 251 TiB
>> usage:   753 TiB used, 779 TiB / 1.5 PiB avail
>> pgs: 19628 active+clean
>>
>>   io:
>> client:   427 KiB/s rd, 22 MiB/s wr, 851 op/s rd, 647 op/s wr
>>
>> The details are as follows:
>> root@ld3955:~# ceph health detail
>> HEALTH_WARN 9 daemons have recently crashed; 2 slow ops, oldest one
>> blocked for 347755 sec, mon.ld5505 has slow ops
>> RECENT_CRASH 9 daemons have recently crashed
>> mds.ld4464 crashed on host ld4464 at 2020-02-09 07:33:59.131171Z
>> mds.ld5506 crashed on host ld5506 at 2020-02-09 07:42:52.036592Z
>> mds.ld4257 crashed on host ld4257 at 2020-02-09 07:47:44.369505Z
>> mds.ld4464 crashed on host ld4464 at 2020-02-09 06:10:24.515912Z
>> mds.ld5507 crashed on host ld5507 at 2020-02-09 07:13:22.400268Z
>> mds.ld4257 crashed on host ld4257 at 2020-02-09 06:48:34.742475Z
>> mds.ld5506 crashed on host ld5506 at 2020-02-09 06:10:24.680648Z
>> mds.ld4465 crashed on host ld4465 at 2020-02-09 06:52:33.204855Z
>> mds.ld5506 crashed on host ld5506 at 2020-02-06 07:59:37.089007Z
>> SLOW_OPS 2 slow ops, oldest one blocked for 347755 sec, mon.ld5505 has
>> slow ops
>>
>> There's no error on services (mgr, mon, osd).
>>
>> Can you please advise how to identify the root cause of this slow ops?
>>
>> THX
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-23 Thread Martin Verges

Hello Thomas,

by default we allocate 1GB per Host on the Management Node, nothing on the
PXE booted server.

This value can be changed in the management container config file
(/config/config.yml):
> ...
> logFilesPerServerGB: 1
> ...
After changing the config, you need to restart the mgmt container.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider <
74cmo...@gmail.com>:

> Hello Martin,
>
> how much disk space do you reserve for log in the PXE setup?
>
> Regards
> Thomas
>
> Am 22.03.2020 um 20:50 schrieb Martin Verges:
> > Hello Samuel,
> >
> > we from croit.io don't use NFS to boot up Servers. We copy the OS
> directly
> > into the RAM (approximately 0.5-1GB). Think of it like a container, you
> > start it and throw it away when you no longer need it.
> > This way we can save the slots of OS harddisks to add more storage per
> node
> > and reduce overall costs as 1GB ram is cheaper then an OS disk and
> consumes
> > less power.
> >
> > If our management node is down, nothing will happen to the cluster. No
> > impact, no downtime. However, you do need the mgmt node to boot up the
> > cluster. So after a very rare total power outage, your first system would
> > be the mgmt node and then the cluster itself. But again, if you configure
> > your systems correct, no manual work is required to recover from that.
> For
> > everything else, it is possible (but definitely not needed) to deploy our
> > mgmt node in active/passive HA.
> >
> > We have multiple hundred installations worldwide in production
> > environments. Our strong PXE knowledge comes from more than 20 years of
> > datacenter hosting experience and it never ever failed us in the last >10
> > years.
> >
> > The main benefits out of that:
> >  - Immutable OS freshly booted: Every host has exactly the same version,
> > same library, kernel, Ceph versions,...
> >  - OS is heavily tested by us: Every croit deployment has exactly the
> same
> > image. We can find errors much faster and hit much fewer errors.
> >  - Easy Update: Updating OS, Ceph or anything else is just a node reboot.
> > No cluster downtime, No service Impact, full automatic handling by our
> mgmt
> > Software.
> >  - No need to install OS: No maintenance costs, no labor required, no
> other
> > OS management required.
> >  - Centralized Logs/Stats: As it is booted in memory, all logs and
> > statistics are collected on a central place for easy access.
> >  - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all
> > boot the exact same image in a few seconds.
> >  .. lots more
> >
> > Please do not hesitate to contact us directly. We always try to offer an
> > excellent service and are strongly customer oriented.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn <
> > huxia...@horebdata.cn>:
> >
> >> Hello， Martin，
> >>
> >> I notice that Croit advocate the use of ceph cluster without OS disks,
> but
> >> with PXE boot.
> >>
> >> Do you use a NFS server to serve the root file system for each node?
> such
> >> as hosting configuration files, user and password, log files, etc. My
> >> question is, will the NFS server be a single point of failure? If the
> NFS
> >> server goes down, the network experience any outage, ceph nodes may not
> be
> >> able to write to the local file systems, possibly leading to service
> outage.
> >>
> >> How do you deal with the above potential issues in production? I am a
> bit
> >> worried...
> >>
> >> best regards,
> >>
> >> samuel
> >>
> >>
> >>
> >>
> >> --
> >> huxia...@horebdata.cn
> >>
> >>
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [External Email] ceph ignoring cluster/public_network when initiating TCP connections

2020-03-23 Thread Max Krasilnikov

День добрий! 

 Mon, Mar 23, 2020 at 05:21:37PM +1300, droopanu wrote: 

> Hi Dave,
> 
> Thank you for the answer.
> 
> Unfortunately the issue is that ceph uses the wrong source IP address, and
> sends the traffic on the wrong interface anyway.
> Would be good if ceph could actually set the source IP address to the
> cluster/public IP when initiating a TCP connection.

Maybe, this thread will help you:
https://www.spinics.net/lists/ceph-users/msg50499.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-23 Thread Thomas Schneider

Hello Martin,

how much disk space do you reserve for log in the PXE setup?

Regards
Thomas

Am 22.03.2020 um 20:50 schrieb Martin Verges:
> Hello Samuel,
>
> we from croit.io don't use NFS to boot up Servers. We copy the OS directly
> into the RAM (approximately 0.5-1GB). Think of it like a container, you
> start it and throw it away when you no longer need it.
> This way we can save the slots of OS harddisks to add more storage per node
> and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes
> less power.
>
> If our management node is down, nothing will happen to the cluster. No
> impact, no downtime. However, you do need the mgmt node to boot up the
> cluster. So after a very rare total power outage, your first system would
> be the mgmt node and then the cluster itself. But again, if you configure
> your systems correct, no manual work is required to recover from that. For
> everything else, it is possible (but definitely not needed) to deploy our
> mgmt node in active/passive HA.
>
> We have multiple hundred installations worldwide in production
> environments. Our strong PXE knowledge comes from more than 20 years of
> datacenter hosting experience and it never ever failed us in the last >10
> years.
>
> The main benefits out of that:
>  - Immutable OS freshly booted: Every host has exactly the same version,
> same library, kernel, Ceph versions,...
>  - OS is heavily tested by us: Every croit deployment has exactly the same
> image. We can find errors much faster and hit much fewer errors.
>  - Easy Update: Updating OS, Ceph or anything else is just a node reboot.
> No cluster downtime, No service Impact, full automatic handling by our mgmt
> Software.
>  - No need to install OS: No maintenance costs, no labor required, no other
> OS management required.
>  - Centralized Logs/Stats: As it is booted in memory, all logs and
> statistics are collected on a central place for easy access.
>  - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all
> boot the exact same image in a few seconds.
>  .. lots more
>
> Please do not hesitate to contact us directly. We always try to offer an
> excellent service and are strongly customer oriented.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn <
> huxia...@horebdata.cn>:
>
>> Hello， Martin，
>>
>> I notice that Croit advocate the use of ceph cluster without OS disks, but
>> with PXE boot.
>>
>> Do you use a NFS server to serve the root file system for each node? such
>> as hosting configuration files, user and password, log files, etc. My
>> question is, will the NFS server be a single point of failure? If the NFS
>> server goes down, the network experience any outage, ceph nodes may not be
>> able to write to the local file systems, possibly leading to service outage.
>>
>> How do you deal with the above potential issues in production? I am a bit
>> worried...
>>
>> best regards,
>>
>> samuel
>>
>>
>>
>>
>> --
>> huxia...@horebdata.cn
>>
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: No reply or very slow reply from Prometheus plugin - ceph-mgr 13.2.8 mimic

2020-03-23 Thread Janek Bevendorff

I am running the very latest version of Nautilus. I will try setting up
an external exporter today and see if that fixes anything. Our cluster
is somewhat large-ish with 1248 OSDs, so I expect stat collection to
take "some" time, but it definitely shouldn't crush the MGRs all the time.

On 21/03/2020 02:33, Paul Choi wrote:
> Hi Janek,
>
> What version of Ceph are you using?
> We also have a much smaller cluster running Nautilus, with no MDS. No
> Prometheus issues there.
> I won't speculate further than this but perhaps Nautilus doesn't have
> the same issue as Mimic?
>
> On Fri, Mar 20, 2020 at 12:23 PM Janek Bevendorff
>  > wrote:
>
> I think this is related to my previous post to this list about MGRs
> failing regularly and being overall quite slow to respond. The problem
> has existed before, but the new version has made it way worse. My MGRs
> keep dyring every few hours and need to be restarted. the Promtheus
> plugin works, but it's pretty slow and so is the dashboard.
> Unfortunately, nobody seems to have a solution for this and I
> wonder why
> not more people are complaining about this problem.
>
>
> On 20/03/2020 19:30, Paul Choi wrote:
> > If I "curl http://localhost:9283/metrics; and wait sufficiently long
> > enough, I get this - says "No MON connection". But the mons are
> health and
> > the cluster is functioning fine.
> > That said, the mons' rocksdb sizes are fairly big because
> there's lots of
> > rebalancing going on. The Prometheus endpoint hanging seems to
> happen
> > regardless of the mon size anyhow.
> >
> >     mon.woodenbox0 is 41 GiB >= mon_data_size_warn (15 GiB)
> >     mon.woodenbox2 is 26 GiB >= mon_data_size_warn (15 GiB)
> >     mon.woodenbox4 is 42 GiB >= mon_data_size_warn (15 GiB)
> >     mon.woodenbox3 is 43 GiB >= mon_data_size_warn (15 GiB)
> >     mon.woodenbox1 is 38 GiB >= mon_data_size_warn (15 GiB)
> >
> > # fg
> > curl -H "Connection: close" http://localhost:9283/metrics
> >  > "-//W3C//DTD XHTML 1.0 Transitional//EN"
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
> > 
> > 
> >     
> >     503 Service Unavailable
> >     
> >     #powered_by {
> >         margin-top: 20px;
> >         border-top: 2px solid black;
> >         font-style: italic;
> >     }
> >
> >     #traceback {
> >         color: red;
> >     }
> >     
> > 
> >     
> >         503 Service Unavailable
> >         No MON connection
> >         Traceback (most recent call last):
> >   File
> "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670,
> > in respond
> >     response.body = self.handler()
> >   File
> "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
> > 217, in __call__
> >     self.body = self.oldhandler(*args, **kwargs)
> >   File
> "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61,
> > in __call__
> >     return self.callable(*self.args, **self.kwargs)
> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 704, in
> metrics
> >     return self._metrics(instance)
> >   File "/usr/lib/ceph/mgr/prometheus/module.py", line 721, in
> _metrics
> >     raise cherrypy.HTTPError(503, 'No MON connection')
> > HTTPError: (503, 'No MON connection')
> > 
> >     
> >       
> >         Powered by http://www.cherrypy.org;>CherryPy
> 3.5.0
> >       
> >     
> >     
> > 
> >
> > On Fri, Mar 20, 2020 at 6:33 AM Paul Choi  > wrote:
> >
> >> Hello,
> >>
> >> We are running Mimic 13.2.8 with our cluster, and since
> upgrading to
> >> 13.2.8 the Prometheus plugin seems to hang a lot. It used to
> respond under
> >> 10s but now it often hangs. Restarting the mgr processes helps
> temporarily
> >> but within minutes it gets stuck again.
> >>
> >> The active mgr doesn't exit when doing `systemctl stop
> ceph-mgr.target"
> >> and needs to
> >>  be kill -9'ed.
> >>
> >> Is there anything I can do to address this issue, or at least
> get better
> >> visibility into the issue?
> >>
> >> We only have a few plugins enabled:
> >> $ ceph mgr module ls
> >> {
> >>     "enabled_modules": [
> >>         "balancer",
> >>         "prometheus",
> >>         "zabbix"
> >>     ],
> >>
> >> 3 mgr processes, but it's a pretty large cluster (near 4000
> OSDs) and it's
> >> a busy one with lots of rebalancing. (I don't know if a busy
> cluster would
> >> seriously affect the mgr's performance, but just throwing it
> out there)
> >>
> >>   services:
> >>     mon: 5 daemons, quorum
> >>

[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-23 Thread Ml Ml

okay, so i have ceph version 14.2.6 nautilus on my source cluster and
ceph version 12.2.13 luminous on my backup clouster.

To be able to mount the mirrored rbd image (without a protected snapshot):
  rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
--cluster backup

I just need to upgrade my backup cluster?


Thanks,
Michael

On Thu, Mar 19, 2020 at 1:06 PM Jason Dillaman  wrote:
>
> On Thu, Mar 19, 2020 at 6:19 AM Eugen Block  wrote:
> >
> > Hi,
> >
> > one workaround would be to create a protected snapshot on the primary
> > image which is also mirrored, and then clone that snapshot on the
> > remote site. That clone can be accessed as required.
>
> +1. This is the correct approach. If you are using a Mimic+ cluster
> (i.e. require OSD release >= Mimic), you can use skip the protect
> step.
>
> > I'm not sure if there's a way to directly access the remote image
> > since it's read-only.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Ml Ml :
> >
> > > Hello,
> > >
> > > my goal is to back up a proxmox cluster with rbd-mirror for desaster
> > > recovery. Promoting/Demoting, etc.. works great.
> > >
> > > But how can i access a single file on the mirrored cluster? I tried:
> > >
> > >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> > > --cluster backup
> > >/dev/nbd1
> > >
> > > But i get:
> > >root@ceph01:~# fdisk -l /dev/nbd1
> > >fdisk: cannot open /dev/nbd1: Input/output error
> > >
> > > dmesg shows stuff like:
> > >[Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > >
> > > Here is my state:
> > >
> > > root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd 
> > > --verbose
> > > health: OK
> > > images: 3 total
> > > 3 replaying
> > >
> > > vm-106-disk-0:
> > >   global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
> > >   state:   up+replaying
> > >   description: replaying, master_position=[object_number=3, tag_tid=2,
> > > entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
> > > entry_tid=3], entries_behind_master=0
> > >   last_update: 2020-03-19 09:29:17
> > >
> > > vm-114-disk-1:
> > >   global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
> > >   state:   up+replaying
> > >   description: replaying, master_position=[object_number=390,
> > > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
> > > tag_tid=6, entry_tid=328526], entries_behind_master=5764
> > >   last_update: 2020-03-19 09:29:17
> > >
> > > vm-115-disk-0:
> > >   global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
> > >   state:   up+replaying
> > >   description: replaying, master_position=[object_number=72,
> > > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
> > > tag_tid=1, entry_tid=67796], entries_behind_master=0
> > >   last_update: 2020-03-19 09:29:17
> > >
> > > More dmesg stuff:
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, 
> > > sector 0
> > > [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed
> > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > > 0, async page read
> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, 
> > > sector 1
> > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > > 1, async page read
> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, 
> > > sector 2
> > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > > 2, async page read
> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, 
> > > sector 3
> > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > > 3, async page read
> > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, 
> > > sector 4
> > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > >

[ceph-users] Re: Questions on Ceph cluster without OS disks

2020-03-23 Thread huxia...@horebdata.cn

Martin,

thanks a lot for the information. This is very interesting, and i will contact 
again if we decided to go this way.

best regards,

samuel



huxia...@horebdata.cn
 
From: Martin Verges
Date: 2020-03-22 20:50
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: Questions on Ceph cluster without OS disks
Hello Samuel,

we from croit.io don't use NFS to boot up Servers. We copy the OS directly into 
the RAM (approximately 0.5-1GB). Think of it like a container, you start it and 
throw it away when you no longer need it.
This way we can save the slots of OS harddisks to add more storage per node and 
reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less 
power.

If our management node is down, nothing will happen to the cluster. No impact, 
no downtime. However, you do need the mgmt node to boot up the cluster. So 
after a very rare total power outage, your first system would be the mgmt node 
and then the cluster itself. But again, if you configure your systems correct, 
no manual work is required to recover from that. For everything else, it is 
possible (but definitely not needed) to deploy our mgmt node in active/passive 
HA.

We have multiple hundred installations worldwide in production environments. 
Our strong PXE knowledge comes from more than 20 years of datacenter hosting 
experience and it never ever failed us in the last >10 years.

The main benefits out of that:
 - Immutable OS freshly booted: Every host has exactly the same version, same 
library, kernel, Ceph versions,...
 - OS is heavily tested by us: Every croit deployment has exactly the same 
image. We can find errors much faster and hit much fewer errors.
 - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No 
cluster downtime, No service Impact, full automatic handling by our mgmt 
Software.
 - No need to install OS: No maintenance costs, no labor required, no other OS 
management required.
 - Centralized Logs/Stats: As it is booted in memory, all logs and statistics 
are collected on a central place for easy access.
 - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the 
exact same image in a few seconds.
 .. lots more

Please do not hesitate to contact us directly. We always try to offer an 
excellent service and are strongly customer oriented.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxia...@horebdata.cn 
:
Hello， Martin，

I notice that Croit advocate the use of ceph cluster without OS disks, but with 
PXE boot. 

Do you use a NFS server to serve the root file system for each node? such as 
hosting configuration files, user and password, log files, etc. My question is, 
will the NFS server be a single point of failure? If the NFS server goes down, 
the network experience any outage, ceph nodes may not be able to write to the 
local file systems, possibly leading to service outage.

How do you deal with the above potential issues in production? I am a bit 
worried...

best regards,

samuel






huxia...@horebdata.cn

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

49 matches

Mail list logo