[ceph-users] radosgw bucket stats vs s3cmd du

2018-09-18 Thread Luis Periquito
Hi all,

I have a couple of very big s3 buckets that store temporary data. We
keep writing to the buckets some files which are then read and
deleted. They serve as a temporary storage.

We're writing (and deleting) circa 1TB of data daily in each of those
buckets, and their size has been mostly stable over time.

The issue has arisen that radosgw-admin bucket stats says one bucket
is 10T and the other is 4T; but s3cmd du (and I did a sync which
agrees) says 3.5T and 2.3T respectively.

The bigger bucket suffered from the orphaned objects bug
(http://tracker.ceph.com/issues/18331). The smaller was created as
10.2.3 so it may also had the suffered from the same bug.

Any ideas what could be at play here? How can we reduce actual usage?

trimming part of the radosgw-admin bucket stats output
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551572
},
"rgw.main": {
"size": 10870197197183,
"size_actual": 10873866362880,
"size_utilized": 18446743601253967400,
"size_kb": 10615426951,
"size_kb_actual": 10619010120,
"size_kb_utilized": 18014398048099578,
"num_objects": 1702444
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 406462
}
},
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] backup ceph

2018-09-18 Thread ST Wong (ITSC)
Hi,

We're newbie to Ceph.  Besides using incremental snapshots with RDB to backup 
data on one Ceph cluster to another running Ceph cluster, or using backup tools 
like backy2, will there be any recommended way to backup Ceph data  ?   Someone 
here suggested taking snapshot of RDB daily and keeps 30 days to replace 
backup.  I wonder if this is practical and if performance will be impact...

Thanks a lot.
Regards
/st wong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount cephfs without tiering

2018-09-18 Thread Konstantin Shalygin

I have cephfs with tiering.
Does anyone know if it's possible to mount a file system so that the tiring is 
not used?

I.e. I want mount cephfs on backup server without tiering usage and on samba 
server with tiering usage.

It's possible?


https://ceph.com/community/new-luminous-erasure-coding-rbd-cephfs/



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is luminous ceph rgw can only run with the civetweb ?

2018-09-18 Thread Konstantin Shalygin

In jewel I use the below config rgw is work well with the nginx. But with 
luminous  the nginx look like can not work with the rgw.


In your case use proxy before rgw is totally overhead IMHO (no 
balancing, no ha).


This is working configuration for your case:


upstream rados {
  server  10.11.3.57:7480 max_conns=512 max_fails=2;
  least_conn;
}

location / {
# AWS v4 SignedHeaders. Nginx will set this header
# only if it passed from client.
  proxy_set_header  Expect $http_expect;

  access_log    off;
  log_not_found off;
  proxy_pass    http://rados;
  proxy_set_header  Host $host;
  proxy_http_version    1.1;
  proxy_redirect    off;
  proxy_buffering   off;
  client_max_body_size  0;
}

server {
  server_name s3.yourdomain.com;
  root    /srv/http;
}



And ceph.conf

[global]
rgw_dns_name = s3.yourdomain.com
rgw_print_continue = false
rgw_s3_auth_aws4_force_boto2_compat = false




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dashboard Object Gateway

2018-09-18 Thread Hendrik Peyerl

Hello all,

we just deployed an Object Gateway to our CEPH Cluster via ceph-deploy 
in an IPv6 only Mimic Cluster. To make sure the RGW listens on IPv6 we 
set the following config:

rgw_frontends = civetweb port=[::]:7480

We now tried to enable the dashboard functionality for said gateway but 
we are running into an error 500 after trying to access it via the 
dashboard, the mgr log shows the following:


{"status": "500 Internal Server Error", "version": "3.2.2", "detail": 
"The server encountered an unexpected condition which prevented it from 
fulfilling the request.", "traceback": "Traceback (most recent call 
last):\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line 656, 
in respond\\nresponse.body = self.handler()\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line 
188, in __call__\\nself.body = self.oldhandler(*args, **kwargs)\\n 
File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/jsontools.py\\", 
line 61, in json_handler\\nvalue = 
cherrypy.serving.request._json_inner_handler(*args, **kwargs)\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line 34, 
in __call__\\nreturn self.callable(*self.args, **self.kwargs)\\n 
File \\"/usr/lib64/ceph/mgr/dashboard/controllers/rgw.py\\", line 23, in 
status\\ninstance = RgwClient.admin_instance()\\n  File 
\\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 138, in 
admin_instance\\nreturn 
RgwClient.instance(RgwClient._SYSTEM_USERID)\\n  File 
\\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 121, in 
instance\\nRgwClient._load_settings()\\n  File 
\\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 102, in 
_load_settings\\nhost, port = _determine_rgw_addr()\\n  File 
\\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 78, in 
_determine_rgw_addr\\nraise LookupError(\'Failed to determine RGW 
port\')\\nLookupError: Failed to determine RGW port\\n"}']



Any help would be greatly appreciated.

Thanks,

Hendrik

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore compression enabled but no data compressed

2018-09-18 Thread Frank Schilder
I seem to have a problem getting bluestore compression to do anything. I 
followed the documentation and enabled bluestore compression on various pools 
by executing "ceph osd pool set  compression_mode aggressive". 
Unfortunately, it seems like no data is compressed at all. As an example, below 
is some diagnostic output for a data pool used by a cephfs:

[root@ceph-01 ~]# ceph --version
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)

All defaults are OK:

[root@ceph-01 ~]# ceph --show-config | grep compression
[...]
bluestore_compression_algorithm = snappy
bluestore_compression_max_blob_size = 0
bluestore_compression_max_blob_size_hdd = 524288
bluestore_compression_max_blob_size_ssd = 65536
bluestore_compression_min_blob_size = 0
bluestore_compression_min_blob_size_hdd = 131072
bluestore_compression_min_blob_size_ssd = 8192
bluestore_compression_mode = none
bluestore_compression_required_ratio = 0.875000
[...]

Compression is reported as enabled:

[root@ceph-01 ~]# ceph osd pool ls detail
[...]
pool 24 'sr-fs-data-test' erasure size 8 min_size 7 crush_rule 10 object_hash 
rjenkins pg_num 50 pgp_num 50 last_change 7726 flags hashpspool,ec_overwrites 
stripe_width 24576 compression_algorithm snappy compression_mode aggressive 
application cephfs
[...]

[root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_mode
compression_mode: aggressive
[root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_algorithm
compression_algorithm: snappy

We dumped a 4Gib file with dd from /dev/zero. Should be easy to compress with 
excellent ratio. Search for a PG:

[root@ceph-01 ~]# ceph pg ls-by-pool sr-fs-data-test
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG 
DISK_LOG STATESTATE_STAMPVERSION  REPORTED UP   
UP_PRIMARY ACTING   ACTING_PRIMARY LAST_SCRUB 
SCRUB_STAMPLAST_DEEP_SCRUB DEEP_SCRUB_STAMP   
24.0 15  00 0   0  62914560  77 
  77 active+clean 2018-09-14 01:07:14.593007  7698'77 7735:142 
[53,47,36,30,14,55,57,5] 53 [53,47,36,30,14,55,57,5] 53
7698'77 2018-09-14 01:07:14.592966 0'0 2018-09-11 08:06:29.309010 

There is about 250MB data on the primary OSD, but noting seems to be compressed:

[root@ceph-07 ~]# ceph daemon osd.53 perf dump | grep blue
[...]
"bluestore_allocated": 313917440,
"bluestore_stored": 264362803,
"bluestore_compressed": 0,
"bluestore_compressed_allocated": 0,
"bluestore_compressed_original": 0,
[...]

Just to make sure, I checked one of the objects' contents:

[root@ceph-01 ~]# rados ls -p sr-fs-data-test
104.039c
[...]
104.039f

It is 4M chunks ...
[root@ceph-01 ~]# rados -p sr-fs-data-test stat 104.039f
sr-fs-data-test/104.039f mtime 2018-09-11 14:39:38.00, size 
4194304

... with all zeros:

[root@ceph-01 ~]# rados -p sr-fs-data-test get 104.039f obj

[root@ceph-01 ~]# hexdump -C obj
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0040

All as it should be, except for compression. Am I overlooking something?

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard Object Gateway

2018-09-18 Thread Lenz Grimmer
Hi Hendrik,

On 09/18/2018 12:57 PM, Hendrik Peyerl wrote:

> we just deployed an Object Gateway to our CEPH Cluster via ceph-deploy
> in an IPv6 only Mimic Cluster. To make sure the RGW listens on IPv6 we
> set the following config:
> rgw_frontends = civetweb port=[::]:7480
> 
> We now tried to enable the dashboard functionality for said gateway but
> we are running into an error 500 after trying to access it via the
> dashboard, the mgr log shows the following:
> 
> {"status": "500 Internal Server Error", "version": "3.2.2", "detail":
> "The server encountered an unexpected condition which prevented it from
> fulfilling the request.", "traceback": "Traceback (most recent call
> last):\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line 656,
> in respond\\n    response.body = self.handler()\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
> 188, in __call__\\n    self.body = self.oldhandler(*args, **kwargs)\\n
> File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/jsontools.py\\",
> line 61, in json_handler\\n    value =
> cherrypy.serving.request._json_inner_handler(*args, **kwargs)\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line 34,
> in __call__\\n    return self.callable(*self.args, **self.kwargs)\\n
> File \\"/usr/lib64/ceph/mgr/dashboard/controllers/rgw.py\\", line 23, in
> status\\n    instance = RgwClient.admin_instance()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 138, in
> admin_instance\\n    return
> RgwClient.instance(RgwClient._SYSTEM_USERID)\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 121, in
> instance\\n    RgwClient._load_settings()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 102, in
> _load_settings\\n    host, port = _determine_rgw_addr()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 78, in
> _determine_rgw_addr\\n    raise LookupError(\'Failed to determine RGW
> port\')\\nLookupError: Failed to determine RGW port\\n"}']
> 
> 
> Any help would be greatly appreciated.

Would you mind sharing the commands that you used to configure the RGW
connection details?

The host and port of the Object Gateway should be determined
automatically, I wonder the IPv6 notation gets mangled somewhere here.

Have you tried setting them explicitly using "ceph dashboard
set-rgw-api-host " and "ceph dashboard set-rgw-api-port "?

Thanks,

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard Object Gateway

2018-09-18 Thread Hendrik Peyerl

Hi Lenz,



Would you mind sharing the commands that you used to configure the RGW
connection details?


The RGW Node was installed like documented:

ceph-deploy install --rgw $SERVERNAME
ceph-deploy rgw create $SERVERNAME

After the installation the server was only listening on 0.0.0.0. I then 
added the following to ceph.conf:


[client.rgw.$SERVERNAME]
rgw_frontends = civetweb port=[::]:7480



Have you tried setting them explicitly using "ceph dashboard
set-rgw-api-host " and "ceph dashboard set-rgw-api-port "?


I did try that now and it solved my problem, I thought this was only 
neccessary for multiple zones therefor I didn't try it with only my 1 
Server.


If there is anything I can assist with to debug a potential issue please 
let me know, the setup is still in testing and I can delete and create 
the Server as we go


Thanks alot for your help,

Hendrik


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backup ceph

2018-09-18 Thread ceph
Hi,

I assume that you are speaking of rbd only

Taking snapshot of rbd volumes and keeping all of them on the cluster is
fine
However, this is no backup
A snapshot is only a backup if it is exported off-site

On 09/18/2018 11:54 AM, ST Wong (ITSC) wrote:
> Hi,
> 
> We're newbie to Ceph.  Besides using incremental snapshots with RDB to backup 
> data on one Ceph cluster to another running Ceph cluster, or using backup 
> tools like backy2, will there be any recommended way to backup Ceph data  ?   
> Someone here suggested taking snapshot of RDB daily and keeps 30 days to 
> replace backup.  I wonder if this is practical and if performance will be 
> impact...
> 
> Thanks a lot.
> Regards
> /st wong
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Odp.: backup ceph

2018-09-18 Thread Tomasz Kuzemko
Hello,

a colleague of mine has done a presentation at FOSDEM about how we (OVH) are 
doing RBD backups. You might find it interesting: 

https://archive.fosdem.org/2018/schedule/event/backup_ceph_at_scale/

--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com


Od: ceph-users  w imieniu użytkownika ST 
Wong (ITSC) 
Wysłane: wtorek, 18 września 2018 11:54
Do: ceph-users@lists.ceph.com
Temat: [ceph-users] backup ceph

Hi,

We’re newbie to Ceph.  Besides using incremental snapshots with RDB to backup 
data on one Ceph cluster to another running Ceph cluster, or using backup tools 
like backy2, will there be any recommended way to backup Ceph data  ?   Someone 
here suggested taking snapshot of RDB daily and keeps 30 days to replace 
backup.  I wonder if this is practical and if performance will be impact…

Thanks a lot.
Regards
/st wong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds

2018-09-18 Thread Augusto Rodrigues
I solved my slow requests by increasing the size of block.db. Calculate 4% per 
stored TB and preferably host the DB in NVME.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] No fix for 0x6706be76 CRCs ?

2018-09-18 Thread Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never 
needed, low iowait and still


7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad 
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected 
0x85a3fefe, device location [0x25ac04be000~1000], logical extent 
0x1e000~1000, object #2:fd955b81:::1729cdb.0006


It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] https://ceph-storage.slack.com

2018-09-18 Thread Alfredo Daniel Rezinovsky

Can anyone add me to this slack?

with my email alfrenov...@gmail.com

Thanks.

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No fix for 0x6706be76 CRCs ?

2018-09-18 Thread Paul Emmerich
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
:
> Changed all my hardware. Now I have plenty of free ram. swap never needed,
> low iowait and still
>
> 7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
> crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
> 0x85a3fefe, device location [0x25ac04be000~1000], logical extent
> 0x1e000~1000, object #2:fd955b81:::1729cdb.0006
>
> It happens sometimes, in all my OSDs.
>
> Bluestore OSDs with data in HDD and block.db in SSD
>
> After running pg repair the pgs were always repaired.
>
> running ceph in ubuntu 13.2.1-1bionic
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No fix for 0x6706be76 CRCs ?

2018-09-18 Thread Alfredo Daniel Rezinovsky

MOMENT !!!

"Some kernels (4.9+) sometime fail to return data when readingfrom a 
block device under memory pressure."


I dind't knew that was the problem. Can't I just dowgrade the kernel?

There are known working versions o just need to be prior 4.9?

On 18/09/18 16:19, Paul Emmerich wrote:

We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
:

Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still

7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::1729cdb.0006

It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No fix for 0x6706be76 CRCs ?

2018-09-18 Thread Paul Emmerich
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).

4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.


Paul

2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
:
> MOMENT !!!
>
> "Some kernels (4.9+) sometime fail to return data when reading from a block
> device under memory pressure."
>
> I dind't knew that was the problem. Can't I just dowgrade the kernel?
>
> There are known working versions o just need to be prior 4.9?
>
>
> On 18/09/18 16:19, Paul Emmerich wrote:
>
> We built a work-around here: https://github.com/ceph/ceph/pull/23273
> Which hasn't been backported, but we'll ship 13.2.2 in our Debian
> packages for the croit OS image.
>
>
> Paul
>
>
> 2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
> :
>
> Changed all my hardware. Now I have plenty of free ram. swap never needed,
> low iowait and still
>
> 7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
> crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
> 0x85a3fefe, device location [0x25ac04be000~1000], logical extent
> 0x1e000~1000, object #2:fd955b81:::1729cdb.0006
>
> It happens sometimes, in all my OSDs.
>
> Bluestore OSDs with data in HDD and block.db in SSD
>
> After running pg repair the pgs were always repaired.
>
> running ceph in ubuntu 13.2.1-1bionic
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No fix for 0x6706be76 CRCs ?

2018-09-18 Thread Alfredo Daniel Rezinovsky
I started with this after upgrade to bionic. I had Xenial with lts 
kernels (4.13) without problem.


I will try to change to ubuntu 4.13 and wait for the logs.

Thanks


On 18/09/18 16:27, Paul Emmerich wrote:

Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).

4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.


Paul

2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
:

MOMENT !!!

"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."

I dind't knew that was the problem. Can't I just dowgrade the kernel?

There are known working versions o just need to be prior 4.9?


On 18/09/18 16:19, Paul Emmerich wrote:

We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
:

Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still

7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::1729cdb.0006

It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo





--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread solarflow99
Hi, anyone able to answer these few questions?



On Mon, Sep 17, 2018 at 4:13 PM solarflow99  wrote:

> Hi, I read through the various documentation and had a few questions:
>
> - From what I understand cephFS clients reach the OSDs directly, does the
> cluster network need to be opened up as a public network?
>
> - Is it still necessary to have a public and cluster network when the
> using cephFS since the clients all reach the OSD's directly?
>
> - Simplest way to do HA on the mons for providing NFS, etc?
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Jean-Charles Lopez
> On Sep 17, 2018, at 16:13, solarflow99  wrote:
> 
> Hi, I read through the various documentation and had a few questions:
> 
> - From what I understand cephFS clients reach the OSDs directly, does the 
> cluster network need to be opened up as a public network? 
Client traffic only goes over the public network. Only OSD to OSD traffic 
(replication, rebalancing, recovery go over the cluster network)
> 
> - Is it still necessary to have a public and cluster network when the using 
> cephFS since the clients all reach the OSD's directly?  
Separating the network is a plus for troubleshooting and sizing for bandwidth
> 
> - Simplest way to do HA on the mons for providing NFS, etc?  
Don’t really understand the question (NFS vs CephFS).
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] (no subject)

2018-09-18 Thread Kevin Olbrich
Hi!

is the compressible hint / incompressible hint supported on qemu+kvm?

http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/

If not, only aggressive would work in this case for rbd, right?

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Paul Emmerich
I would almost never separate the cluster and backend networks, it
usually creates more problems than it solves.


Paul

2018-09-18 21:37 GMT+02:00 Jean-Charles Lopez :
>> On Sep 17, 2018, at 16:13, solarflow99  wrote:
>>
>> Hi, I read through the various documentation and had a few questions:
>>
>> - From what I understand cephFS clients reach the OSDs directly, does the 
>> cluster network need to be opened up as a public network?
> Client traffic only goes over the public network. Only OSD to OSD traffic 
> (replication, rebalancing, recovery go over the cluster network)
>>
>> - Is it still necessary to have a public and cluster network when the using 
>> cephFS since the clients all reach the OSD's directly?
> Separating the network is a plus for troubleshooting and sizing for bandwidth
>>
>> - Simplest way to do HA on the mons for providing NFS, etc?
> Don’t really understand the question (NFS vs CephFS).
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Jonathan D. Proulx
On Tue, Sep 18, 2018 at 12:33:21PM -0700, solarflow99 wrote:
:Hi, anyone able to answer these few questions?

I'm not using CephFS but for RBD (my primary use case) clients also
access OSDs directly.

I use separate cluster and public networks mainly so replication
bandwidth and client bandwidth don't compete. Though I wouldn't call
this necessary.

-Jon

:
:
:On Mon, Sep 17, 2018 at 4:13 PM solarflow99  wrote:
:
:> Hi, I read through the various documentation and had a few questions:
:>
:> - From what I understand cephFS clients reach the OSDs directly, does the
:> cluster network need to be opened up as a public network?
:>
:> - Is it still necessary to have a public and cluster network when the
:> using cephFS since the clients all reach the OSD's directly?
:>
:> - Simplest way to do HA on the mons for providing NFS, etc?
:>

:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
I've finally learned enough about the OSD backend track down this issue to
what I believe is the root cause.  LevelDB compaction is the common thread
every time we move data around our cluster.  I've ruled out PG subfolder
splitting, EC doesn't seem to be the root cause of this, and it is cluster
wide as opposed to specific hardware.

One of the first things I found after digging into leveldb omap compaction
was [1] this article with a heading "RocksDB instead of LevelDB"
which mentions that leveldb was replaced with rocksdb as the default db
backend for filestore OSDs and was even backported to Jewel because of the
performance improvements.

I figured there must be a way to be able to upgrade an OSD to use rocksdb
from leveldb without needing to fully backfill the entire OSD.  There is
[2] this article, but you need to have an active service account with
RedHat to access it.  I eventually came across [3] this article about
optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
due to omap compaction to migrate to using rocksdb.  It links to the RedHat
article, but also has [4] these steps outlined in it.  I tried to follow
the steps, but the OSD I tested this on was unable to start with [5] this
segfault.  And then trying to move the OSD back to the original LevelDB
omap folder resulted in [6] this in the log.  I apologize that all of my
logging is with log level 1.  If needed I can get some higher log levels.

My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can
update my filestore backend from leveldb to rocksdb?  Or if that's the
wrong direction and I should be looking elsewhere?  Thank you.


[1] https://ceph.com/community/new-luminous-rados-improvements/
[2] https://access.redhat.com/solutions/3210951
[3]
https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize%20Ceph%20object%20storage%20for%20production%20in%20multisite%20clouds.pdf

[4] ■ Stop the OSD
■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
■ ulimit -n 65535
■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy
/var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
--command check
■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
■ Start the OSD

[5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy
not supported or corrupted Snappy compressed block contents
2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

[6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to
mount object store
2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init
failed: (1) Operation not permittedESC[0m
2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
(ceph:ceph)
2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
(unknown), pid 361535
2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty
--pid-file
2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load:
isa
2018-09-17 19:27:54.260520 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261135 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261750 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2018-09-17 19:27:54.261757 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2018-09-17 19:27:54.261758 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice()
is disabled via 'filestore splice' config option
2018-09-17 19:27:54.286454 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2018-09-17 19:27:54.286572 7f7f03308d80  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is
disabled by conf
2018-09-17 19:27:54.287119 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
2018-09-17 19:27:54.287527 7f7f03308d80 -1
filestore(/var/lib/ceph/osd/ceph-0) mount(1723): Error initializing leveldb
: Corruption: VersionEdit: unknown tag
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread Pavan Rallabhandi
The steps that were outlined for conversion are correct, have you tried setting 
some the relevant ceph conf values too:

filestore_rocksdb_options = 
"max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"

filestore_omap_backend = rocksdb

Thanks,
-Pavan.

From: ceph-users  on behalf of David Turner 

Date: Tuesday, September 18, 2018 at 4:09 PM
To: ceph-users 
Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

I've finally learned enough about the OSD backend track down this issue to what 
I believe is the root cause.  LevelDB compaction is the common thread every 
time we move data around our cluster.  I've ruled out PG subfolder splitting, 
EC doesn't seem to be the root cause of this, and it is cluster wide as opposed 
to specific hardware. 

One of the first things I found after digging into leveldb omap compaction was 
[1] this article with a heading "RocksDB instead of LevelDB" which mentions 
that leveldb was replaced with rocksdb as the default db backend for filestore 
OSDs and was even backported to Jewel because of the performance improvements.

I figured there must be a way to be able to upgrade an OSD to use rocksdb from 
leveldb without needing to fully backfill the entire OSD.  There is [2] this 
article, but you need to have an active service account with RedHat to access 
it.  I eventually came across [3] this article about optimizing Ceph Object 
Storage which mentions a resolution to OSDs flapping due to omap compaction to 
migrate to using rocksdb.  It links to the RedHat article, but also has [4] 
these steps outlined in it.  I tried to follow the steps, but the OSD I tested 
this on was unable to start with [5] this segfault.  And then trying to move 
the OSD back to the original LevelDB omap folder resulted in [6] this in the 
log.  I apologize that all of my logging is with log level 1.  If needed I can 
get some higher log levels.

My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can 
update my filestore backend from leveldb to rocksdb?  Or if that's the wrong 
direction and I should be looking elsewhere?  Thank you.


[1] https://ceph.com/community/new-luminous-rados-improvements/
[2] https://access.redhat.com/solutions/3210951
[3] 
https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
 Ceph object storage for production in multisite clouds.pdf

[4] ■ Stop the OSD
■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
■ ulimit -n 65535
■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy 
/var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap --command 
check
■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
■ Start the OSD

[5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy not 
supported or corrupted Snappy compressed block contents
2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

[6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to 
mount object store
2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init failed: 
(1) Operation not permittedESC[0m
2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167 (ceph:ceph)
2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4 
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process 
(unknown), pid 361535
2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty 
--pid-file
2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load: isa
2018-09-17 19:27:54.260520 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261135 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261750 7f7f03308d80  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2018-09-17 19:27:54.261757 7f7f03308d80  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2018-09-17 19:27:54.261758 7f7f03308d80  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is 
disabled via 'filestore splice' config option
2018-09-17 19:27:54.286454 7f7f03308d80  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2018-09-17 19:27:54.286572 7f7f03308d80  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is 
disabled by conf
2018-09-17 19:27:54.287119 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) 
start omap initiation
2018-09-17 19:27:54.287527 7f7f03308d80 -1 fil

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
Are those settings fine to have be global even if not all OSDs on a node
have rocksdb as the backend?  Or will I need to convert all OSDs on a node
at the same time?

On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> The steps that were outlined for conversion are correct, have you tried
> setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users  on behalf of David
> Turner 
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users 
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster
> unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this issue to
> what I believe is the root cause.  LevelDB compaction is the common thread
> every time we move data around our cluster.  I've ruled out PG subfolder
> splitting, EC doesn't seem to be the root cause of this, and it is cluster
> wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap compaction
> was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use rocksdb
> from leveldb without needing to fully backfill the entire OSD.  There is
> [2] this article, but you need to have an active service account with
> RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
> My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can
> update my filestore backend from leveldb to rocksdb?  Or if that's the
> wrong direction and I should be looking elsewhere?  Thank you.
>
>
> [1] https://ceph.com/community/new-luminous-rados-improvements/
> [2] https://access.redhat.com/solutions/3210951
> [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
> [4] ■ Stop the OSD
> ■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
> ■ ulimit -n 65535
> ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy
> /var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
> ■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
> --command check
> ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
> ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
> ■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
> ■ Start the OSD
>
> [5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy
> not supported or corrupted Snappy compressed block contents
> 2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **
>
> [6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to
> mount object store
> 2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
> 2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
> (ceph:ceph)
> 2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 361535
> 2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty
> --pid-file
> 2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load:
> isa
> 2018-09-17 19:27:54.260520 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261135 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261750 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2018-09-17 19:27:54.261757 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2018-09-17 19:27:54.261758 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice()
> is disabled via 'filestore splice' config option
> 2018-09-17 19:27:54.286454 7f7f03308d80  0
> genericfiles

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread Pavan Rallabhandi
You should be able to set them under the global section and that reminds me, 
since you are on Luminous already, I guess those values are already the 
default, you can verify from the admin socket of any OSD.

But the stack trace didn’t hint as if the superblock on the OSD is still 
considering the omap backend to be leveldb and to do with the compression.

Thanks,
-Pavan.

From: David Turner 
Date: Tuesday, September 18, 2018 at 5:07 PM
To: Pavan Rallabhandi 
Cc: ceph-users 
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

Are those settings fine to have be global even if not all OSDs on a node have 
rocksdb as the backend?  Or will I need to convert all OSDs on a node at the 
same time?

On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi 
 wrote:
The steps that were outlined for conversion are correct, have you tried setting 
some the relevant ceph conf values too:

filestore_rocksdb_options = 
"max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"

filestore_omap_backend = rocksdb

Thanks,
-Pavan.

From: ceph-users  on behalf of David 
Turner 
Date: Tuesday, September 18, 2018 at 4:09 PM
To: ceph-users 
Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

I've finally learned enough about the OSD backend track down this issue to what 
I believe is the root cause.  LevelDB compaction is the common thread every 
time we move data around our cluster.  I've ruled out PG subfolder splitting, 
EC doesn't seem to be the root cause of this, and it is cluster wide as opposed 
to specific hardware. 

One of the first things I found after digging into leveldb omap compaction was 
[1] this article with a heading "RocksDB instead of LevelDB" which mentions 
that leveldb was replaced with rocksdb as the default db backend for filestore 
OSDs and was even backported to Jewel because of the performance improvements.

I figured there must be a way to be able to upgrade an OSD to use rocksdb from 
leveldb without needing to fully backfill the entire OSD.  There is [2] this 
article, but you need to have an active service account with RedHat to access 
it.  I eventually came across [3] this article about optimizing Ceph Object 
Storage which mentions a resolution to OSDs flapping due to omap compaction to 
migrate to using rocksdb.  It links to the RedHat article, but also has [4] 
these steps outlined in it.  I tried to follow the steps, but the OSD I tested 
this on was unable to start with [5] this segfault.  And then trying to move 
the OSD back to the original LevelDB omap folder resulted in [6] this in the 
log.  I apologize that all of my logging is with log level 1.  If needed I can 
get some higher log levels.

My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can 
update my filestore backend from leveldb to rocksdb?  Or if that's the wrong 
direction and I should be looking elsewhere?  Thank you.


[1] https://ceph.com/community/new-luminous-rados-improvements/
[2] https://access.redhat.com/solutions/3210951
[3] 
https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
 Ceph object storage for production in multisite clouds.pdf

[4] ■ Stop the OSD
■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
■ ulimit -n 65535
■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy 
/var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap --command 
check
■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
■ Start the OSD

[5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy not 
supported or corrupted Snappy compressed block contents
2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

[6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to 
mount object store
2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init failed: 
(1) Operation not permittedESC[0m
2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167 (ceph:ceph)
2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4 
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process 
(unknown), pid 361535
2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty 
--pid-file
2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load: isa
2018-09-17 19:27:54.260520 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261135 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261750

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread Pavan Rallabhandi
I meant the stack trace hints that the superblock still has leveldb in it, have 
you verified that already?

On 9/18/18, 5:27 PM, "Pavan Rallabhandi"  wrote:

You should be able to set them under the global section and that reminds 
me, since you are on Luminous already, I guess those values are already the 
default, you can verify from the admin socket of any OSD.

But the stack trace didn’t hint as if the superblock on the OSD is still 
considering the omap backend to be leveldb and to do with the compression.

Thanks,
-Pavan.

From: David Turner 
Date: Tuesday, September 18, 2018 at 5:07 PM
To: Pavan Rallabhandi 
Cc: ceph-users 
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the 
cluster unusable and takes forever

Are those settings fine to have be global even if not all OSDs on a node 
have rocksdb as the backend?  Or will I need to convert all OSDs on a node at 
the same time?

On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi 
 wrote:
The steps that were outlined for conversion are correct, have you tried 
setting some the relevant ceph conf values too:

filestore_rocksdb_options = 
"max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"

filestore_omap_backend = rocksdb

Thanks,
-Pavan.

From: ceph-users  on behalf of 
David Turner 
Date: Tuesday, September 18, 2018 at 4:09 PM
To: ceph-users 
Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

I've finally learned enough about the OSD backend track down this issue to 
what I believe is the root cause.  LevelDB compaction is the common thread 
every time we move data around our cluster.  I've ruled out PG subfolder 
splitting, EC doesn't seem to be the root cause of this, and it is cluster wide 
as opposed to specific hardware. 

One of the first things I found after digging into leveldb omap compaction 
was [1] this article with a heading "RocksDB instead of LevelDB" which mentions 
that leveldb was replaced with rocksdb as the default db backend for filestore 
OSDs and was even backported to Jewel because of the performance improvements.

I figured there must be a way to be able to upgrade an OSD to use rocksdb 
from leveldb without needing to fully backfill the entire OSD.  There is [2] 
this article, but you need to have an active service account with RedHat to 
access it.  I eventually came across [3] this article about optimizing Ceph 
Object Storage which mentions a resolution to OSDs flapping due to omap 
compaction to migrate to using rocksdb.  It links to the RedHat article, but 
also has [4] these steps outlined in it.  I tried to follow the steps, but the 
OSD I tested this on was unable to start with [5] this segfault.  And then 
trying to move the OSD back to the original LevelDB omap folder resulted in [6] 
this in the log.  I apologize that all of my logging is with log level 1.  If 
needed I can get some higher log levels.

My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can 
update my filestore backend from leveldb to rocksdb?  Or if that's the wrong 
direction and I should be looking elsewhere?  Thank you.


[1] https://ceph.com/community/new-luminous-rados-improvements/
[2] https://access.redhat.com/solutions/3210951
[3] 
https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
 Ceph object storage for production in multisite clouds.pdf

[4] ■ Stop the OSD
■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
■ ulimit -n 65535
■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy 
/var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap 
--command check
■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
■ Start the OSD

[5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy 
not supported or corrupted Snappy compressed block contents
2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

[6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to 
mount object store
2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init 
failed: (1) Operation not permittedESC[0m
2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167 
(ceph:ceph)
2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4 
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process 
(unknown), pid 361535
2018

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
Here's the [1] full log from the time the OSD was started to the end of the
crash dump.  These logs are so hard to parse.  Is there anything useful in
them?

I did confirm that all perms were set correctly and that the superblock was
changed to rocksdb before the first time I attempted to start the OSD with
it's new DB.  This is on a fully Luminous cluster with [2] the defaults you
mentioned.

[1] https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b79ed
[2] "filestore_omap_backend": "rocksdb",
"filestore_rocksdb_options":
"max_background_compactions=8,compaction_readahead_size=2097152,compression=kNoCompression",

On Tue, Sep 18, 2018 at 5:29 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> I meant the stack trace hints that the superblock still has leveldb in it,
> have you verified that already?
>
> On 9/18/18, 5:27 PM, "Pavan Rallabhandi" 
> wrote:
>
> You should be able to set them under the global section and that
> reminds me, since you are on Luminous already, I guess those values are
> already the default, you can verify from the admin socket of any OSD.
>
> But the stack trace didn’t hint as if the superblock on the OSD is
> still considering the omap backend to be leveldb and to do with the
> compression.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Tuesday, September 18, 2018 at 5:07 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> Are those settings fine to have be global even if not all OSDs on a
> node have rocksdb as the backend?  Or will I need to convert all OSDs on a
> node at the same time?
>
> On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> The steps that were outlined for conversion are correct, have you
> tried setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users  on behalf
> of David Turner 
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users 
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this
> issue to what I believe is the root cause.  LevelDB compaction is the
> common thread every time we move data around our cluster.  I've ruled out
> PG subfolder splitting, EC doesn't seem to be the root cause of this, and
> it is cluster wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap
> compaction was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use
> rocksdb from leveldb without needing to fully backfill the entire OSD.
> There is [2] this article, but you need to have an active service account
> with RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
> My Ceph version is 12.2.4.  Does anyone have any suggestions for how I
> can update my filestore backend from leveldb to rocksdb?  Or if that's the
> wrong direction and I should be looking elsewhere?  Thank you.
>
>
> [1] https://ceph.com/community/new-luminous-rados-improvements/
> [2] https://access.redhat.com/solutions/3210951
> [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
> [4] ■ Stop the OSD
> ■ mv /var/lib/ceph/osd/ceph-/current/omap
> /var/lib/ceph/osd/ceph-/omap.orig
> ■ ulimit -n 65535
> ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig
> store-copy /var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
> ■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
> --command check
> ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
> ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/c

Re: [ceph-users] lost osd while migrating EC pool to device-class crush rules

2018-09-18 Thread Graham Allan

On 09/17/2018 04:33 PM, Gregory Farnum wrote:
On Mon, Sep 17, 2018 at 8:21 AM Graham Allan > wrote:


Looking back through history it seems that I *did* override the
min_size
for this pool, however I didn't reduce it - it used to have min_size 2!
That made no sense to me - I think it must be an artifact of a very
early (hammer?) ec pool creation, but it pre-dates me.

I found the documentation on what min_size should be a bit confusing
which is how I arrived at 4. Fully agree that k+1=5 makes way more
sense.

I don't think I was the only one confused by this though, eg
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html

I suppose the safest thing to do is update min_size->5 right away to
force any size-4 pgs down until they can perform recovery. I can set
force-recovery on these as well...


Mmm, this is embarrassing but that actually doesn't quite work due to 
https://github.com/ceph/ceph/pull/24095, which has been on my task list 
but at the bottom for a while. :( So if your cluster is stable now I'd 
let it clean up and then change the min_size once everything is repaired.


Thanks for your feedback, Greg. Since declaring the dead osd as lost, 
the downed pg became active again, and is successfully serving data. The 
cluster is considerably more stable now; I've set force-backfill or 
force-recovery on any size=4 pgs and can wait for that to complete 
before changing anything else.


Thanks again,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] total_used statistic incorrect

2018-09-18 Thread Mike Cave
Greetings,

I’ve recently run into an issue with my new Mimic deploy.

I created some pools and created volumes and did some general testing. In 
total, there was about 21 TiB used. Once testing was completed, I deleted the 
pools and thus thought I deleted the data.

However, the ‘total_used’ statistic given from running ‘ceph  -s’ shows that 
the space is still consumed. I have confirmed that the pools are deleted (rados 
df) but I cannot get the total_used to reflect the actual state of usage on the 
system.

Have I missed a step in deleting a pool? Is there some other step I need to 
perform other than what I found in the docs?

Please let me know if I can provide any additional data.

Cheers,
Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread solarflow99
thanks for the replies, I don't know that cephFS clients go through the
MONs, they reach the OSDs directly.  When I mentioned NFS, I meant NFS
clients (ie. not cephFS clients) This should have been pretty straight
forward.
Anyone doing HA on the MONs?  How do you mount the cephFS shares, surely
you'd have a vip?



On Tue, Sep 18, 2018 at 12:37 PM Jean-Charles Lopez 
wrote:

> > On Sep 17, 2018, at 16:13, solarflow99  wrote:
> >
> > Hi, I read through the various documentation and had a few questions:
> >
> > - From what I understand cephFS clients reach the OSDs directly, does
> the cluster network need to be opened up as a public network?
> Client traffic only goes over the public network. Only OSD to OSD traffic
> (replication, rebalancing, recovery go over the cluster network)
> >
> > - Is it still necessary to have a public and cluster network when the
> using cephFS since the clients all reach the OSD's directly?
> Separating the network is a plus for troubleshooting and sizing for
> bandwidth
> >
> > - Simplest way to do HA on the mons for providing NFS, etc?
> Don’t really understand the question (NFS vs CephFS).
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Jean-Charles Lopez
They don’t go though the MONs for IOs but they need access to the MONs over the 
public network for authentication and to receive the cluster map. 

JC

While moving. Excuse unintended typos.

> On Sep 18, 2018, at 17:51, Jean-Charles Lopez  wrote:
> 
> Hi
> 
> You deploy 3 MONs on a production cluster for HA. 
> 
> CephFS clients talk to MONs MDSs and OSDs over the public network. 
> 
> CephFS is not NFS and you’ll need ganesha to enable NFS access into your Ceph 
> File system. See http://docs.ceph.com/docs/master/cephfs/nfs/ Ganesha will 
> access your ceph cluster over the public network like any regular ceph 
> client. 
> 
> JC
> 
> While moving. Excuse unintended typos.
> 
>> On Sep 18, 2018, at 16:56, solarflow99  wrote:
>> 
>> thanks for the replies, I don't know that cephFS clients go through the 
>> MONs, they reach the OSDs directly.  When I mentioned NFS, I meant NFS 
>> clients (ie. not cephFS clients) This should have been pretty straight 
>> forward.
>> Anyone doing HA on the MONs?  How do you mount the cephFS shares, surely 
>> you'd have a vip?
>> 
>> 
>> 
>>> On Tue, Sep 18, 2018 at 12:37 PM Jean-Charles Lopez  
>>> wrote:
>>> > On Sep 17, 2018, at 16:13, solarflow99  wrote:
>>> > 
>>> > Hi, I read through the various documentation and had a few questions:
>>> > 
>>> > - From what I understand cephFS clients reach the OSDs directly, does the 
>>> > cluster network need to be opened up as a public network? 
>>> Client traffic only goes over the public network. Only OSD to OSD traffic 
>>> (replication, rebalancing, recovery go over the cluster network)
>>> > 
>>> > - Is it still necessary to have a public and cluster network when the 
>>> > using cephFS since the clients all reach the OSD's directly?  
>>> Separating the network is a plus for troubleshooting and sizing for 
>>> bandwidth
>>> > 
>>> > - Simplest way to do HA on the mons for providing NFS, etc?  
>>> Don’t really understand the question (NFS vs CephFS).
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backup ceph

2018-09-18 Thread ST Wong (ITSC)
Hi,

Thanks for your help.

> I assume that you are speaking of rbd only
Yes, as we just started studying Ceph, we only aware of backup of RBD.   Will 
there be other areas that need backup?   Sorry for my ignorance.

> Taking snapshot of rbd volumes and keeping all of them on the cluster is fine 
> However, this is no backup A snapshot is only a backup if it is exported 
> off-site
Will this scheme (e.g. keeping 30 daily snapshots) impact performance?  
Besides, can we somehow "mount" snapshot of nth day to get the backup of 
particular file ?  sorry that we're still based on traditional SAN snapshot 
concepts.

Sorry to bother, and thanks a lot.

Rgds,
/st wong

-Original Message-
From: ceph-users  On Behalf Of 
c...@jack.fr.eu.org
Sent: Tuesday, September 18, 2018 8:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] backup ceph

Hi,

I assume that you are speaking of rbd only

Taking snapshot of rbd volumes and keeping all of them on the cluster is fine 
However, this is no backup A snapshot is only a backup if it is exported 
off-site

On 09/18/2018 11:54 AM, ST Wong (ITSC) wrote:
> Hi,
> 
> We're newbie to Ceph.  Besides using incremental snapshots with RDB to backup 
> data on one Ceph cluster to another running Ceph cluster, or using backup 
> tools like backy2, will there be any recommended way to backup Ceph data  ?   
> Someone here suggested taking snapshot of RDB daily and keeps 30 days to 
> replace backup.  I wonder if this is practical and if performance will be 
> impact...
> 
> Thanks a lot.
> Regards
> /st wong
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [RGWRados]librados: Objecter returned from getxattrs r=-36

2018-09-18 Thread fatkun chan
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous
(stable)

I have a file with long name , when I cat the file through minio client,
the error show.
librados: Objecter returned from getxattrs r=-36


the log is come from radosgw

2018-09-15 03:38:24.763109 7f833c0ed700  2 req 20:0.000272:s3:GET
/hand-gesture/:list_bucket:verifying op params
2018-09-15 03:38:24.763111 7f833c0ed700  2 req 20:0.000273:s3:GET
/hand-gesture/:list_bucket:pre-executing
2018-09-15 03:38:24.763112 7f833c0ed700  2 req 20:0.000274:s3:GET
/hand-gesture/:list_bucket:executing
2018-09-15 03:38:24.763115 7f833c0ed700 10 cls_bucket_list
hand-gesture[7f3000c9-66f8-4598-9811-df3800e4469a.804194.12]) start []
num_entries 1001
2018-09-15 03:38:24.763822 7f833c0ed700 20 get_obj_state:
rctx=0x7f833c0e5790
obj=hand-gesture:train_result/mobilenetv2_160_0.35_feature16_pyramid3_minside160_lr0.01_batchsize32_steps2000_limitratio0.5625_slot_blankdata201809041612_bluedata201808302300composite_background_201809111827/201809111827/logs/events.out.tfevents.1536672273.tf-hand-gesture-58-worker-s7uc-0-jsuf7
state=0x7f837553c0a0 s->prefetch_data=0
2018-09-15 03:38:24.763841 7f833c0ed700 10 librados: getxattrs
oid=7f3000c9-66f8-4598-9811-df3800e4469a.804194.12_train_result/mobilenetv2_160_0.35_feature16_pyramid3_minside160_lr0.01_batchsize32_steps2000_limitratio0.5625_slot_blankdata201809041612_bluedata201808302300composite_background_201809111827/201809111827/logs/events.out.tfevents.1536672273.tf-hand-gesture-58-worker-s7uc-0-jsuf7
nspace=
2018-09-15 03:38:24.764283 7f833c0ed700 10 librados: Objecter returned from
getxattrs r=-36
2018-09-15 03:38:24.764304 7f833c0ed700  2 req 20:0.001466:s3:GET
/hand-gesture/:list_bucket:completing
2018-09-15 03:38:24.764308 7f833c0ed700  0 WARNING: set_req_state_err
err_no=36 resorting to 500
2018-09-15 03:38:24.764355 7f833c0ed700  2 req 20:0.001517:s3:GET
/hand-gesture/:list_bucket:op status=-36
2018-09-15 03:38:24.764362 7f833c0ed700  2 req 20:0.001524:s3:GET
/hand-gesture/:list_bucket:http status=500
2018-09-15 03:38:24.764364 7f833c0ed700  1 == req done
req=0x7f833c0e7110 op status=-36 http_status=500 ==
2018-09-15 03:38:24.764371 7f833c0ed700 20 process_request() returned -36
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Erik McCormick
On Tue, Sep 18, 2018, 7:56 PM solarflow99  wrote:

> thanks for the replies, I don't know that cephFS clients go through the
> MONs, they reach the OSDs directly.  When I mentioned NFS, I meant NFS
> clients (ie. not cephFS clients) This should have been pretty straight
> forward.
> Anyone doing HA on the MONs?  How do you mount the cephFS shares, surely
> you'd have a vip?
>

When you mount cephfs or an RBD (for your NFS case) you provide a list of
monitors. They are, by nature, highly available. They do not rely in any
sort of VIP failover like keepalived or pacemaker.

-Erik

>
>
>
> On Tue, Sep 18, 2018 at 12:37 PM Jean-Charles Lopez 
> wrote:
>
>> > On Sep 17, 2018, at 16:13, solarflow99  wrote:
>> >
>> > Hi, I read through the various documentation and had a few questions:
>> >
>> > - From what I understand cephFS clients reach the OSDs directly, does
>> the cluster network need to be opened up as a public network?
>> Client traffic only goes over the public network. Only OSD to OSD traffic
>> (replication, rebalancing, recovery go over the cluster network)
>> >
>> > - Is it still necessary to have a public and cluster network when the
>> using cephFS since the clients all reach the OSD's directly?
>> Separating the network is a plus for troubleshooting and sizing for
>> bandwidth
>> >
>> > - Simplest way to do HA on the mons for providing NFS, etc?
>> Don’t really understand the question (NFS vs CephFS).
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic upgrade failure

2018-09-18 Thread KEVIN MICHAEL HRPCEK
Sage,

Unfortunately the mon election problem came back yesterday and it makes it 
really hard to get a cluster to stay healthy. A brief unexpected network outage 
occurred and sent the cluster into a frenzy and when I had it 95% healthy the 
mons started their nonstop reelections. In the previous logs I sent were you 
able to identify why the mons are constantly electing? The elections seem to be 
triggered by the below paxos message but do you know which lease timeout is 
being reached or why the lease isn't renewed instead of calling for an election?

One thing I tried was to shutdown the entire cluster and bring up only the mon 
and mgr. The mons weren't able to hold their quorum with no osds running and 
the ceph-mon ms_dispatch thread runs at 100% for > 60s at a time.

2018-09-19 03:56:21.729 7f4344ec1700  1 mon.sephmon2@1(peon).paxos(paxos active 
c 133382665..133383355) lease_timeout -- calling new election

Thanks
Kevin

On 09/10/2018 07:06 AM, Sage Weil wrote:

I took a look at the mon log you sent.  A few things I noticed:

- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and half of those seem to
be recoveries (cancellation of the failure message).

It does smell a bit like a networking issue, or some tunable that relates
to the messaging layer.  It might be worth looking at an OSD log for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection?  That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).

I'd also confirm that with nodown set the mon quorum stabilizes...

sage




On Mon, 10 Sep 2018, Kevin Hrpcek wrote:



Update for the list archive.

I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier after
everything was on mimic since the random mass OSD heartbeat failures stopped
and the constant mon election problem went away. I'm still battling with the
cluster reacting poorly to host reboots or small map changes, but I feel like
my current pg:osd ratio may be playing a factor in that since we are 2x normal
pg count while migrating data to new EC pools.

I'm not sure of the root cause but it seems like the mix of luminous and mimic
did not play well together for some reason. Maybe it has to do with the scale
of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster
has scaled to this size.

Kevin


On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:


Nothing too crazy for non default settings. Some of those osd settings were
in place while I was testing recovery speeds and need to be brought back
closer to defaults. I was setting nodown before but it seems to mask the
problem. While its good to stop the osdmap changes, OSDs would come up, get
marked up, but at some point go down again (but the process is still
running) and still stay up in the map. Then when I'd unset nodown the
cluster would immediately mark 250+ osd down again and i'd be back where I
started.

This morning I went ahead and finished the osd upgrades to mimic to remove
that variable. I've looked for networking problems but haven't found any. 2
of the mons are on the same switch. I've also tried combinations of shutting
down a mon to see if a single one was the problem, but they keep electing no
matter the mix of them that are up. Part of it feels like a networking
problem but I haven't been able to find a culprit yet as everything was
working normally before starting the upgrade. Other than the constant mon
elections, yesterday I had the cluster 95% healthy 3 or 4 times, but it
doesn't last long since at some point the OSDs start trying to fail each
other through their heartbeats.
2018-09-09 17:37:29.079 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
prepare_failure osd.39 10.1.9.2:6802/168438 from osd.49 10.1.9.3:6884/317908
is reporting failure:1
2018-09-09 17:37:29.079 7eff774f5700  0 log_channel(cluster) log [DBG] :
osd.39 10.1.9.2:6802/168438 reported failed by osd.49 10.1.9.3:6884/317908
2018-09-09 17:37:29.083 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
prepare_failure osd.93 10.1.9.9:6853/287469 from osd.372
10.1.9.13:6801/275806 is reporting failure:1

I'm working on getting things mostly good again with everything on mimic and
will see if it behaves better.

Thanks for your input on this David.


[global]
mon_initial_members = sephmon1, sephmon2, sephmon3
mon_host = 10.1.9.201,10.1.9.202,10.1.9.203
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 10.1.0.0/16
osd backfill full ratio = 0.92
osd failsafe nearfull ratio = 0.90
osd max object size = 21474836480
mon max pg per osd = 350

[mon]
mon warn on legacy crush tunables = false
mon pg warn max per osd = 300
mon osd down out subtree limit = host
mon osd nearfull ratio = 0.90
mon osd full ratio = 0.97
mon health preluminou