Re: [ceph-users] jewel: bug? forgotten rbd files?

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello Greg,

if i remove the files manually from the primary - it does not help
either. The primary osd is than crashing that trim_object can't find the
files.

Is there any chance that i manually correct the omap digest so that it
just matches the files?

Greets,
Stefan


Am 05.08.2017 um 21:43 schrieb Gregory Farnum:
> is OSD 20 actually a member of the pg right now? It could be stray data
> that is slowly getting cleaned up.
> 
> Also, you've got "snapdir" listings there. Those indicate the object is
> snapshotted but the "head" got deleted. So it may just be delayed
> cleanup of snapshots.
> 
> On Sat, Aug 5, 2017 at 12:34 PM Stefan Priebe - Profihost AG
> mailto:s.pri...@profihost.ag>> wrote:
> 
> Hello,
> 
> today i deleted an rbd image which had the following
> prefix:
> 
> block_name_prefix: rbd_data.106dd406b8b4567
> 
> the rm command went fine.
> 
> also the rados list command does not show any objects with that string:
> # rados -p rbd ls | grep 106dd406b8b4567
> 
> But find on an osd still has them?
> 
> osd.20]#  find . -name "*106dd406b8b4567*" -exec ls -la "{}" \;
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__9d5e4_9E65861A__3
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:36
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__9d84a_9E65861A__3
> -rw-r--r-- 1 ceph ceph 0 Aug  5 11:47
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__snapdir_9E65861A__3
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:49
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d455_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 1400832 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d5e4_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 1400832 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d84a_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 0 Aug  5 11:47
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__snapdir_BCB2A61A__3
> 
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel: bug? forgotten rbd files?

2017-08-07 Thread Stefan Priebe - Profihost AG
ceph-dencoder type object_info_t import /tmp/a decode dump_json

results in:
error: buffer::malformed_input: void
object_info_t::decode(ceph::buffer::list::iterator&) decode past end of
struct encoding

Greets,
Stefan

Am 05.08.2017 um 21:43 schrieb Gregory Farnum:
> is OSD 20 actually a member of the pg right now? It could be stray data
> that is slowly getting cleaned up.
> 
> Also, you've got "snapdir" listings there. Those indicate the object is
> snapshotted but the "head" got deleted. So it may just be delayed
> cleanup of snapshots.
> 
> On Sat, Aug 5, 2017 at 12:34 PM Stefan Priebe - Profihost AG
> mailto:s.pri...@profihost.ag>> wrote:
> 
> Hello,
> 
> today i deleted an rbd image which had the following
> prefix:
> 
> block_name_prefix: rbd_data.106dd406b8b4567
> 
> the rm command went fine.
> 
> also the rados list command does not show any objects with that string:
> # rados -p rbd ls | grep 106dd406b8b4567
> 
> But find on an osd still has them?
> 
> osd.20]#  find . -name "*106dd406b8b4567*" -exec ls -la "{}" \;
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__9d5e4_9E65861A__3
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:36
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__9d84a_9E65861A__3
> -rw-r--r-- 1 ceph ceph 0 Aug  5 11:47
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_8/rbd\udata.106dd406b8b4567.2315__snapdir_9E65861A__3
> -rw-r--r-- 1 ceph ceph 4194304 Aug  5 09:49
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d455_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 1400832 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d5e4_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 1400832 Aug  5 09:32
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__9d84a_BCB2A61A__3
> -rw-r--r-- 1 ceph ceph 0 Aug  5 11:47
> 
> ./current/3.61a_head/DIR_A/DIR_1/DIR_6/DIR_A/rbd\udata.106dd406b8b4567.018c__snapdir_BCB2A61A__3
> 
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to migrate cached erasure pool to another type of erasure?

2017-08-07 Thread Малков Петр Викторович
Hello!

Luminous v 12.1.2

Rgw ssd tiering over EC pool works fine.
But I want to change type of erasure (now and in the future).
Type of erasure code is not allowed for on-fly changing
Only new pool with new coding

First Idea was to add second tiering level
Ssd - EC - ISA  and to evict all down, then change to ssd - ISA
But double tiering is also not allowed

Any other scenario? Without client's connections dropping?

--
Petr Malkov

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-08-07 Thread Andras Pataki

I've filed a tracker bug for this: http://tracker.ceph.com/issues/20938

Andras


On 08/01/2017 10:26 AM, Andras Pataki wrote:

Hi John,

Sorry for the delay, it took a bit of work to set up a luminous test 
environment.  I'm sorry to have to report that the 12.1.1 RC version 
also suffers from this problem - when two nodes open the same file for 
read/write, and read from it, the performance is awful (under 1 
operation/second).  The behavior is exactly the same as with the 
latest Jewel.


I'm running an all luminous setup (12.1.1 mon/mds/osds and fuse 
client).  My original mail has a small test program that easily 
reproduces the issue.  Let me know if there is anything I can help 
with for tracking the issue down further.


Andras


On 07/21/2017 05:41 AM, John Spray wrote:

On Thu, Jul 20, 2017 at 9:19 PM, Andras Pataki
 wrote:
We are having some difficulties with cephfs access to the same file 
from
multiple nodes concurrently.  After debugging some large-ish 
applications
with noticeable performance problems using CephFS (with the fuse 
client), I

have a small test program to reproduce the problem.

The core of the problem boils down to the following operation being 
run on

the same file on multiple nodes (in a loop in the test program):

 int fd = open(filename, mode);
 read(fd, buffer, 100);
 close(fd);

Here are some results on our cluster:

One node, mode=read-only: 7000 opens/second
One node, mode=read-write: 7000 opens/second
Two nodes, mode=read-only: 7000 opens/second/node
Two nodes, mode=read-write: around 0.5 opens/second/node (!!!)
Two nodes, one read-only, one read-write: around 0.5 
opens/second/node (!!!)
Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)' 
line from

the code: 500 opens/second/node


So there seems to be some problems with opening the same file 
read/write and

reading from the file on multiple nodes.  That operation seems to be 3
orders of magnitude slower than other parallel access patterns to 
the same
file.  The 1 second time to open files almost seems like some 
timeout is

happening somewhere.  I have some suspicion that this has to do with
capability management between the fuse client and the MDS, but I 
don't know

enough about that protocol to make an educated assessment.

You're pretty much spot on.  Things happening at 0.5 per second is
characteristic of a particular class of bug where we are not flushing
the journal soon enough, and instead waiting for the next periodic
(every five second) flush.  Hence there is an average 2.5 second dely,
hence operations happening at approximately half an operation per
second.


[And an aside - how does this become a problem?  I.e. why open a file
read/write and read from it?  Well, it turns out gfortran compiled 
code does

this by default if the user doesn't explicitly says otherwise].

All the nodes in this test are very lightly loaded, so there does 
not seems
to be any noticeable performance bottleneck (network, CPU, etc.).  
The code
to reproduce the problem is attached.  Simply compile it, create a 
test file
with a few bytes of data in it, and run the test code on two 
separate nodes

on the same file.

We are running ceph 10.2.9 both on the server, and we use the 10.2.9 
fuse

client on the client nodes.

Any input/help would be greatly appreciated.

If you have a test/staging environment, it would be great if you could
re-test this on the 12.1.1 release candidate.  There have been MDS
fixes for similar slowdowns that were shown up in multi-mds testing,
so it's possible that the issue you're seeing here was fixed along the
way.

John


Andras


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 1 pg inconsistent, 1 pg unclean, 1 pg degraded

2017-08-07 Thread Marc Roos

I tried to fix a 1 pg inconsistent by taking the osd 12 out, hoping for 
the data to be copied to a different osd, and that one would be used as 
'active?'. 

- Would deleting the whole image in the rbd pool solve this? (or would 
it fail because of this status)

- Should I have done this rather with osd 9?

- Can't I force ceph to just one of the osd's 4MB object and then eg. 
Run fschk on the vm having this image? 


> ok
> PG_STAT STATE UP   UP_PRIMARY 
ACTING   ACTING_PRIMARY
> 17.36   active+degraded+remapped+inconsistent [9,0,13]  9 
[9,0,12]  9





{
"state": "active+degraded+remapped+inconsistent",
"snap_trimq": "[]",
"epoch": 8687,
"up": [
9,
0,
13
],
"acting": [
9,
0,
12
],
"backfill_targets": [
"13"
],
"actingbackfill": [
"0",
"9",
"12",
"13"
],
"info": {
"pgid": "17.36",
"last_update": "8686'95650",
"last_complete": "0'0",
"log_tail": "8387'91830",
"last_user_version": 95650,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [
{
"start": "1",
"length": "3"
}
],
"history": {
"epoch_created": 3636,
"epoch_pool_created": 3636,
"last_epoch_started": 8685,
"last_interval_started": 8684,
"last_epoch_clean": 8487,
"last_interval_clean": 8486,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 8683,
"same_interval_since": 8684,
"same_primary_since": 8392,
"last_scrub": "8410'93917",
"last_scrub_stamp": "2017-08-05 19:42:14.906055",
"last_deep_scrub": "8410'93917",
"last_deep_scrub_stamp": "2017-08-05 19:42:14.906055",
"last_clean_scrub_stamp": "2017-07-29 20:21:18.626777"
},
"stats": {
"version": "8686'95650",
"reported_seq": "141608",
"reported_epoch": "8687",
"state": "active+degraded+remapped+inconsistent",
"last_fresh": "2017-08-07 17:16:25.902001",
"last_change": "2017-08-07 17:16:25.902001",
"last_active": "2017-08-07 17:16:25.902001",
"last_peered": "2017-08-07 17:16:25.902001",
"last_clean": "2017-08-06 16:52:33.999429",
"last_became_active": "2017-08-07 13:01:14.646736",
"last_became_peered": "2017-08-07 13:01:14.646736",
"last_unstale": "2017-08-07 17:16:25.902001",
"last_undegraded": "2017-08-07 13:01:13.683550",
"last_fullsized": "2017-08-07 17:16:25.902001",
"mapping_epoch": 8684,
"log_start": "8387'91830",
"ondisk_log_start": "8387'91830",
"created": 3636,
"last_epoch_clean": 8487,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "8410'93917",
"last_scrub_stamp": "2017-08-05 19:42:14.906055",
"last_deep_scrub": "8410'93917",
"last_deep_scrub_stamp": "2017-08-05 19:42:14.906055",
"last_clean_scrub_stamp": "2017-07-29 20:21:18.626777",
"log_size": 3820,
"ondisk_log_size": 3820,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 7953924096,
"num_objects": 1910,
"num_object_clones": 30,
"num_object_copies": 5730,
"num_objects_missing_on_primary": 1,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 1596,
"num_objects_unfound": 1,
"num_objects_dirty": 1910,
"num_whiteouts": 0,
"num_read": 32774,
"num_read_kb": 1341008,
"num_write": 189252,
"num_write_kb": 9369192,
"num_scrub_errors": 3,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 3,
"num_objects_recovered": 417,
"num_bytes_recovered": 1732386816,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode

Re: [ceph-users] 1 pg inconsistent, 1 pg unclean, 1 pg degraded

2017-08-07 Thread Etienne Menguy
Hi,


Removing the whole OSD will work but it's overkill (if the inconsistent is not 
caused by a faulty disk [😉]  )


Which ceph version are you running? If you have a recent version you can check 
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent


rados list-inconsistent-obj 17.36 --format=json-pretty


Depending of the reason of the consistent object you may want to replace the 
disk, rerun a deepscrub to recheck it's checksum or repair the pg.


Étienne



From: ceph-users  on behalf of Marc Roos 

Sent: Monday, August 7, 2017 17:44
To: ceph-users
Subject: [ceph-users] 1 pg inconsistent, 1 pg unclean, 1 pg degraded


I tried to fix a 1 pg inconsistent by taking the osd 12 out, hoping for
the data to be copied to a different osd, and that one would be used as
'active?'.

- Would deleting the whole image in the rbd pool solve this? (or would
it fail because of this status)

- Should I have done this rather with osd 9?

- Can't I force ceph to just one of the osd's 4MB object and then eg.
Run fschk on the vm having this image?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] expanding cluster with minimal impact

2017-08-07 Thread Bryan Stillwell
Dan,

We recently went through an expansion of an RGW cluster and found that we 
needed 'norebalance' set whenever making CRUSH weight changes to avoid slow 
requests.  We were also increasing the CRUSH weight by 1.0 each time which 
seemed to reduce the extra data movement we were seeing with smaller weight 
increases.  Maybe something to try out next time?

Bryan

From: ceph-users  on behalf of Dan van der 
Ster 
Date: Friday, August 4, 2017 at 1:59 AM
To: Laszlo Budai 
Cc: ceph-users 
Subject: Re: [ceph-users] expanding cluster with minimal impact

Hi Laszlo,

The script defaults are what we used to do a large intervention (the
default delta weight is 0.01). For our clusters going any faster
becomes disruptive, but this really depends on your cluster size and
activity.

BTW, in case it wasn't clear, to use this script for adding capacity
you need to create the new OSDs to your cluster with initial crush
weight = 0.0

osd crush initial weight = 0
osd crush update on start = true

-- Dan



On Thu, Aug 3, 2017 at 8:12 PM, Laszlo Budai  wrote:
Dear all,

I need to expand a ceph cluster with minimal impact. Reading previous
threads on this topic from the list I've found the ceph-gentle-reweight
script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der Ster (Thank you Dan for sharing the script with us!).

I've done some experiments, and it looks promising, but it is needed to
properly set the parameters. Did any of you tested this script before? what
is the recommended delta_weight to be used? From the default parameters of
the script I can see that the default delta weight is .5% of the target
weight that means 200 reweighting cycles. I have experimented with a
reweight ratio of 5% while running a fio test on a client. The results were
OK (I mean no slow requests), but my  test cluster was a very small one.

If any of you has done some larger experiments with this script I would be
really interested to read about your results.

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com rsync errors

2017-08-07 Thread David Galloway
Thanks for bringing this to our attention.  I've removed the lockfiles
from download.ceph.com.

On 08/06/2017 11:10 PM, Matthew Taylor wrote:
> Hi,
> 
> The rsync target (rsync://download.ceph.com/ceph/) has been throwing the
> following errors for a while:
> 
>> rsync: send_files failed to open "/debian-hammer/db/lockfile" (in ceph): 
>> Permission denied (13)
>> rsync: send_files failed to open "/debian-jewel/db/lockfile" (in ceph): 
>> Permission denied (13)
>> rsync: send_files failed to open 
>> "/debian-jewel/pool/main/c/ceph/.ceph-fuse-dbg_10.1.0-1~bpo80+1_amd64.deb.h0JvHM"
>>  (in ceph): Permission denied (13)
>> rsync: send_files failed to open "/debian-luminous/db/lockfile" (in ceph): 
>> Permission denied (13)
>> rsync: send_files failed to open "/debian-testing/db/lockfile" (in ceph): 
>> Permission denied (13)
>> rsync: send_files failed to open 
>> "/rpm-jewel/el7/x86_64/.ceph-10.1.0-0.el7.x86_64.rpm.2FtlL3" (in ceph): 
>> Permission denied (13)
>> rsync: send_files failed to open 
>> "/rpm-luminous/el7/aarch64/.ceph-debuginfo-12.0.3-0.el7.aarch64.rpm.yQ0WpX" 
>> (in ceph): Permission denied (13)
>> rsync error: some files/attrs were not transferred (see previous errors) 
>> (code 23) at main.c(1518) [generator=3.0.9]
> 
> I posted on the Ceph mirror admin list, although I never received a
> response.
> 
> Is anyone able to sort this out?
> 
> Thanks,
> Matthew.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Jason Dillaman
Does the image "tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c" have
snapshots? If the deep-flatten feature isn't enabled, the flatten
operation is not able to dissociate child images from parents when
those child images have one or more snapshots.

On Fri, Aug 4, 2017 at 2:30 PM, Shawn Edwards  wrote:
> I have a child rbd that doesn't acknowledge its parent.  this is with Kraken
> (11.2.0)
>
> The misbehaving child was 'flatten'ed from its parent, but now I can't
> remove the snapshot because it thinks it has a child still.
>
> root@tyr-ceph-mon0:~# rbd snap ls
> tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete
> SNAPID NAMESIZE
>   2530 c20a31c5-fd88-4104-8579-a6b3cd723f2b 1000 GB
> root@tyr-ceph-mon0:~# rbd children
> tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
> tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> root@tyr-ceph-mon0:~# rbd flatten
> tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> Image flatten: 0% complete...failed.
> rbd: flatten error: (22) Invalid argument
> 2017-08-04 08:33:09.719796 7f5bfb7d53c0 -1 librbd::Operations: image has no
> parent
> root@tyr-ceph-mon0:~# rbd snap unprotect
> tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
> 2017-08-04 08:34:20.649532 7f91f5ffb700 -1 librbd::SnapshotUnprotectRequest:
> cannot unprotect: at least 1 child(ren) [1d0bce6194cfc3] in pool 'tyr-p0'
> 2017-08-04 08:34:20.649545 7f91f5ffb700 -1 librbd::SnapshotUnprotectRequest:
> encountered error: (16) Device or resource busy
> 2017-08-04 08:34:20.649550 7f91f5ffb700 -1 librbd::SnapshotUnprotectRequest:
> 0x55d69346da40 should_complete_error: ret_val=-16
> 2017-08-04 08:34:20.651800 7f91f5ffb700 -1 librbd::SnapshotUnprotectRequest:
> 0x55d69346da40 should_complete_error: ret_val=-16
> rbd: unprotecting snap failed: (16) Device or resource busy
> root@tyr-ceph-mon0:~# rbd info tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> rbd image 'a56eae5f-fd35-4299-bcdc-65839a00f14c':
> size 1000 GB in 256000 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.1d0bce6194cfc3
> format: 2
> features: layering
> flags:
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Shawn Edwards
Nailed it.  Did not have deep-flatten feature turned on for that image.

Deep-flatten cannot be added to an rbd after creation, correct?  What are
my options here?

On Mon, Aug 7, 2017 at 3:32 PM Jason Dillaman  wrote:

> Does the image "tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c" have
> snapshots? If the deep-flatten feature isn't enabled, the flatten
> operation is not able to dissociate child images from parents when
> those child images have one or more snapshots.
>
> On Fri, Aug 4, 2017 at 2:30 PM, Shawn Edwards 
> wrote:
> > I have a child rbd that doesn't acknowledge its parent.  this is with
> Kraken
> > (11.2.0)
> >
> > The misbehaving child was 'flatten'ed from its parent, but now I can't
> > remove the snapshot because it thinks it has a child still.
> >
> > root@tyr-ceph-mon0:~# rbd snap ls
> > tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete
> > SNAPID NAMESIZE
> >   2530 c20a31c5-fd88-4104-8579-a6b3cd723f2b 1000 GB
> > root@tyr-ceph-mon0:~# rbd children
> >
> tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
> > tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> > root@tyr-ceph-mon0:~# rbd flatten
> > tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> > Image flatten: 0% complete...failed.
> > rbd: flatten error: (22) Invalid argument
> > 2017-08-04 08:33:09.719796 7f5bfb7d53c0 -1 librbd::Operations: image has
> no
> > parent
> > root@tyr-ceph-mon0:~# rbd snap unprotect
> >
> tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
> > 2017-08-04 08:34:20.649532 7f91f5ffb700 -1
> librbd::SnapshotUnprotectRequest:
> > cannot unprotect: at least 1 child(ren) [1d0bce6194cfc3] in pool 'tyr-p0'
> > 2017-08-04 08:34:20.649545 7f91f5ffb700 -1
> librbd::SnapshotUnprotectRequest:
> > encountered error: (16) Device or resource busy
> > 2017-08-04 08:34:20.649550 7f91f5ffb700 -1
> librbd::SnapshotUnprotectRequest:
> > 0x55d69346da40 should_complete_error: ret_val=-16
> > 2017-08-04 08:34:20.651800 7f91f5ffb700 -1
> librbd::SnapshotUnprotectRequest:
> > 0x55d69346da40 should_complete_error: ret_val=-16
> > rbd: unprotecting snap failed: (16) Device or resource busy
> > root@tyr-ceph-mon0:~# rbd info
> tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
> > rbd image 'a56eae5f-fd35-4299-bcdc-65839a00f14c':
> > size 1000 GB in 256000 objects
> > order 22 (4096 kB objects)
> > block_name_prefix: rbd_data.1d0bce6194cfc3
> > format: 2
> > features: layering
> > flags:
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Jason Dillaman
Correct -- deep-flatten can only be enabled at image creation time. If
you do still have snapshots on that image and you wish to delete the
parent, you will need to delete the snapshots.

On Mon, Aug 7, 2017 at 4:52 PM, Shawn Edwards  wrote:
> Nailed it.  Did not have deep-flatten feature turned on for that image.
>
> Deep-flatten cannot be added to an rbd after creation, correct?  What are my
> options here?
>
> On Mon, Aug 7, 2017 at 3:32 PM Jason Dillaman  wrote:
>>
>> Does the image "tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c" have
>> snapshots? If the deep-flatten feature isn't enabled, the flatten
>> operation is not able to dissociate child images from parents when
>> those child images have one or more snapshots.
>>
>> On Fri, Aug 4, 2017 at 2:30 PM, Shawn Edwards 
>> wrote:
>> > I have a child rbd that doesn't acknowledge its parent.  this is with
>> > Kraken
>> > (11.2.0)
>> >
>> > The misbehaving child was 'flatten'ed from its parent, but now I can't
>> > remove the snapshot because it thinks it has a child still.
>> >
>> > root@tyr-ceph-mon0:~# rbd snap ls
>> > tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete
>> > SNAPID NAMESIZE
>> >   2530 c20a31c5-fd88-4104-8579-a6b3cd723f2b 1000 GB
>> > root@tyr-ceph-mon0:~# rbd children
>> >
>> > tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
>> > tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
>> > root@tyr-ceph-mon0:~# rbd flatten
>> > tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
>> > Image flatten: 0% complete...failed.
>> > rbd: flatten error: (22) Invalid argument
>> > 2017-08-04 08:33:09.719796 7f5bfb7d53c0 -1 librbd::Operations: image has
>> > no
>> > parent
>> > root@tyr-ceph-mon0:~# rbd snap unprotect
>> >
>> > tyr-p0/51774a43-8d67-4d6d-9711-d0b1e4e6b5e9_delete@c20a31c5-fd88-4104-8579-a6b3cd723f2b
>> > 2017-08-04 08:34:20.649532 7f91f5ffb700 -1
>> > librbd::SnapshotUnprotectRequest:
>> > cannot unprotect: at least 1 child(ren) [1d0bce6194cfc3] in pool
>> > 'tyr-p0'
>> > 2017-08-04 08:34:20.649545 7f91f5ffb700 -1
>> > librbd::SnapshotUnprotectRequest:
>> > encountered error: (16) Device or resource busy
>> > 2017-08-04 08:34:20.649550 7f91f5ffb700 -1
>> > librbd::SnapshotUnprotectRequest:
>> > 0x55d69346da40 should_complete_error: ret_val=-16
>> > 2017-08-04 08:34:20.651800 7f91f5ffb700 -1
>> > librbd::SnapshotUnprotectRequest:
>> > 0x55d69346da40 should_complete_error: ret_val=-16
>> > rbd: unprotecting snap failed: (16) Device or resource busy
>> > root@tyr-ceph-mon0:~# rbd info
>> > tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c
>> > rbd image 'a56eae5f-fd35-4299-bcdc-65839a00f14c':
>> > size 1000 GB in 256000 objects
>> > order 22 (4096 kB objects)
>> > block_name_prefix: rbd_data.1d0bce6194cfc3
>> > format: 2
>> > features: layering
>> > flags:
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FAILED assert(last_e.version.version < e.version.version) - Or: how to use ceph-kvstore-tool?

2017-08-07 Thread Ricardo J. Barberis
Sorry, forgot to mention it, it's Hammer 0.94.10.

But I already marked the OSDs as lost, after rebalancing finished.


I saw a bug report at http://tracker.ceph.com/issues/14471

I can post some debug logs there but I don't know if it'll be useful at this 
point.


Thank you,

El Miércoles 02/08/2017 a las 12:56, 刘畅 escribió:
> what's your ceph version?
>
> > 在 2017年8月2日,11:03,Ricardo J. Barberis  写道:
> >
> > Hello,
> >
> > We had a power failure and after some trouble 2 of our OSDs started
> > crashing with this error:
> >
> > "FAILED assert(last_e.version.version < e.version.version)"
> >
> >
> > I know what's the problematic PG, and searching the ceph lists and the
> > web I saw that ultimately I should fix that PG using ceph-kvstore-tool,
> > but I can't find an example of how to use it.
> >
> > Can anybody give me a hint?
> > Should I ask in ceph-devel?
> >
> > Thanks in advance.
> >
> >
> >
> > An extract from a debug log, can provide more if needed:
> >
> >-2> 2017-08-01 21:40:32.509895 7f4f506d8880 20 read_log
> > 1793746'173182313 (1793746'173182312) modify  
> > 3/44f40dd4/rbd_data.84873da26686612.00f1/head by
> > client.215472136.0:13700212 2017-08-01 12:26:06.904276
> >-1> 2017-08-01 21:40:32.509900 7f4f506d8880 20 read_log
> > 1794404'173182313 (1793746'173182220) modify  
> > 3/67fc6dd4/rbd_data.13957cd6ab8977c.0027/head by
> > client.267959820.0:11279245 2017-08-01 12:29:52.696307
> > 0> 2017-08-01 21:40:32.511324 7f4f506d8880 -1 osd/PGLog.cc: In
> > function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
> > ghobject_t, const pg_info_t&, std::map&,
> > PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
> > std::set >*)' thread 7f4f506d8880 time 2017-08-01
> > 21:40:32.509905
> > osd/PGLog.cc: 911: FAILED assert(last_e.version.version <
> > e.version.version)
> >
> >
> > Regards,
> > --
> > Ricardo J. Barberis
> > Usuario Linux Nº 250625: http://counter.li.org/
> > Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
> > Senior SysAdmin / IT Architect - www.DonWeb.com
-- 
Ricardo J. Barberis
Usuario Linux Nº 250625: http://counter.li.org/
Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
Senior SysAdmin / IT Architect - www.DonWeb.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer(0.94.5) librbd dead lock, i want to how to resolve

2017-08-07 Thread Jason Dillaman
I am not sure what you mean by "I stop ceph" (stopped all the OSDs?)
-- and I am not sure how you are seeing ETIMEDOUT errors on a
"rbd_write" call since it should just block assuming you are referring
to stopping the OSDs. What is your use-case? Are you developing your
own application on top of librbd?

Regardless, I can only assume there is another thread that is blocked
while it owns the librbd::ImageCtx::owner_lock.

On Mon, Aug 7, 2017 at 8:35 AM, Shilu  wrote:
> I write data by rbd_write,when I stop ceph, rbd_write timeout and return
> -110
>
>
>
> Then I call rbd_write again, it will deadlock, the code stack is showed
> below
>
>
>
>
>
>
>
> #0  pthread_rwlock_rdlock () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
>
> #1  0x7fafbf9f75a0 in RWLock::get_read (this=0x7fafc48e1198) at
> ./common/RWLock.h:76
>
> #2  0x7fafbfa31de0 in RLocker (lock=..., this=) at
> ./common/RWLock.h:130
>
> #3  librbd::aio_write (ictx=0x7fafc48e1000, off=71516229632, len=4096,
>
> buf=0x7fafc499e000 "\235?[\257\367n\255\263?\200\034\061\341\r",
> c=0x7fafab44ef80, op_flags=0) at librbd/internal.cc:3320
>
> #4  0x7fafbf9eff19 in Context::complete (this=0x7fafab4174c0,
> r=) at ./include/Context.h:65
>
> #5  0x7fafbfb00016 in ThreadPool::worker (this=0x7fafc4852c40,
> wt=0x7fafc4948550) at common/WorkQueue.cc:128
>
> #6  0x7fafbfb010b0 in ThreadPool::WorkThread::entry (this= out>) at common/WorkQueue.h:408
>
> #7  0x7fafc59b6184 in start_thread (arg=0x7fafadbed700) at
> pthread_create.c:312
>
> #8  0x7fafc52aaffd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> -
> 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from New
> H3C, which is
> intended only for the person or entity whose address is listed above. Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] implications of losing the MDS map

2017-08-07 Thread Daniel K
I finally figured out how to get the ceph-monstore-tool (compiled from
source) and am ready to attemp to recover my cluster.

I have one question -- in the instructions,
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
under Recovery from OSDs, Known limitations:

->

   - *MDS Maps*: the MDS maps are lost.


What are the implications of this? Do I just need to rebuild this, or is
there a data loss component to it? -- Is my data stored in CephFS still
safe?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph cluster experiencing major performance issues

2017-08-07 Thread Mclean, Patrick
High CPU utilization and inexplicably slow I/O requests

We have been having similar performance issues across several ceph
clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK
for a while, but eventually performance worsens and becomes (at first
intermittently, but eventually continually) HEALTH_WARN due to slow I/O
request blocked for longer than 32 sec. These slow requests are
accompanied by "currently waiting for rw locks", but we have not found
any network issue that normally is responsible for this warning.

Examining the individual slow OSDs from `ceph health detail` has been
unproductive; there don't seem to be any slow disks and if we stop the
OSD the problem just moves somewhere else.

We also think this trends with increased number of RBDs on the clusters,
but not necessarily a ton of Ceph I/O. At the same time, user %CPU time
spikes up to 95-100%, at first frequently and then consistently,
simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU
with 6 cores and 64GiB RAM per node.

ceph1 ~ $ sudo ceph status
cluster ----
 health HEALTH_WARN
547 requests are blocked > 32 sec
 monmap e1: 3 mons at
{cephmon1.XXX=XXX.XXX.XXX.XXX:/0,cephmon1.XXX=XXX.XXX.XXX.XX:/0,cephmon1.XXX=XXX.XXX.XXX.XXX:/0}
election epoch 16, quorum 0,1,2
cephmon1.XXX,cephmon1.XXX,cephmon1.XXX
 osdmap e577122: 72 osds: 68 up, 68 in
flags sortbitwise,require_jewel_osds
  pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects
126 TB used, 368 TB / 494 TB avail
4084 active+clean
  12 active+clean+scrubbing+deep
  client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr

ceph1 ~ $ vmstat 5 5
procs ---memory-- ---swap-- -io -system--
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa st
27  1  0 3112660 165544 3626169200   472  127401 22 
1 76  1  0
25  0  0 3126176 165544 3624650800   858 12692 12122 110478
97  2  1  0  0
22  0  0 3114284 165544 3625813600 1  6118 9586 118625
97  2  1  0  0
11  0  0 3096508 165544 3627624400 8  6762 10047 188618
89  3  8  0  0
18  0  0 2990452 165544 3638404800  1209 21170 11179 179878
85  4 11  0  0

There is no apparent memory shortage, and none of the HDDs or SSDs show
consistently high utilization, slow service times, or any other form of
hardware saturation, other than user CPU utilization. Can CPU starvation
be responsible for "waiting for rw locks"?

Our main pool (the one with all the data) currently has 1024 PGs,
leaving us room to add more PGs if needed, but we're concerned if we do
so that we'd consume even more CPU.

We have moved to running Ceph + jemalloc instead of tcmalloc, and that
has helped with CPU utilization somewhat, but we still see occurences of
95-100% CPU with not terribly high Ceph workload.

Any suggestions of what else to look at? We have a peculiar use case
where we have many RBDs but only about 1-5% of them are active at the
same time, and we're constantly making and expiring RBD snapshots. Could
this lead to aberrant performance? For instance, is it normal to have
~40k snaps still in cached_removed_snaps?



[global]

cluster = 
fsid = ----

keyring = /etc/ceph/ceph.keyring

auth_cluster_required = none
auth_service_required = none
auth_client_required = none

mon_host = 
cephmon1.XXX,cephmon1.XXX,cephmon1.XXX
mon_addr = XXX.XXX.XXX.XXX:,XXX.XXX.XXX.XXX:XXX,XXX.XXX.XXX.XXX:
mon_initial_members = 
cephmon1.XXX,cephmon1.XXX,cephmon1.XXX

cluster_network = 172.20.0.0/18
public_network = XXX.XXX.XXX.XXX/20

mon osd full ratio = .80
mon osd nearfull ratio = .60

rbd default format = 2
rbd default order = 25
rbd_default_features = 1

osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 1024
osd pool default pgp num = 1024

osd_recovery_op_priority = 1

osd_max_backfills = 1

osd_recovery_threads = 1

osd_recovery_max_active = 1

osd_recovery_max_single_start = 1

osd_scrub_thread_suicide_timeout = 300

osd scrub during recovery = false

osd scrub sleep = 60
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] **** SPAM **** jewel - recovery keeps stalling (continues after restarting OSDs)

2017-08-07 Thread Nikola Ciprich
Hi,

I tried balancing number of OSDs per node, set their weights the same,
increased op recovery priority, but it still takes ages to recover..

I've got my cluster OK now, so I'll try switching to kraken to see if
it behaves better..

nik



On Mon, Aug 07, 2017 at 11:36:10PM +0800, cgxu wrote:
> I encountered same issue today and I solved problem by adjusting "osd 
> recovery op priority” to 63 temporarily.
> 
> It looks like recovery PUSH/PULL op starved in op_wq prioritized queue and 
> I’ve never experienced in hammer version.
> 
> Any other idea? 
> 
> 
> > Hi,
> > 
> > I'm trying to find reason for strange recovery issues I'm seeing on
> > our cluster..
> > 
> > it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> > across nodes. jewel 10.2.9
> > 
> > the problem is that after some disk replaces and data moves, recovery
> > is progressing extremely slowly.. pgs seem to be stuck in 
> > active+recovering+degraded
> > state:
> > 
> > [root@v1d ~]# ceph -s
> > cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
> >  health HEALTH_WARN
> > 159 pgs backfill_wait
> > 4 pgs backfilling
> > 259 pgs degraded
> > 12 pgs recovering
> > 113 pgs recovery_wait
> > 215 pgs stuck degraded
> > 266 pgs stuck unclean
> > 140 pgs stuck undersized
> > 151 pgs undersized
> > recovery 37788/2327775 objects degraded (1.623%)
> > recovery 23854/2327775 objects misplaced (1.025%)
> > noout,noin flag(s) set
> >  monmap e21: 3 mons at 
> > {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
> > election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
> >   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
> >  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
> > flags noout,noin,sortbitwise,require_jewel_osds
> >   pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
> > 9215 GB used, 35572 GB / 45365 GB avail
> > 37788/2327775 objects degraded (1.623%)
> > 23854/2327775 objects misplaced (1.025%)
> > 2912 active+clean
> >  130 active+undersized+degraded+remapped+wait_backfill
> >   97 active+recovery_wait+degraded
> >   29 active+remapped+wait_backfill
> >   12 active+recovery_wait+undersized+degraded+remapped
> >6 active+recovering+degraded
> >5 active+recovering+undersized+degraded+remapped
> >4 active+undersized+degraded+remapped+backfilling
> >4 active+recovery_wait+degraded+remapped
> >1 active+recovering+degraded+remapped
> >   client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
> > 
> > 
> >  when I restart affected OSDs, it bumps the recovery, but then another
> > PGs get stuck.. All OSDs were restarted multiple times, none are even close 
> > to
> > nearfull, I just cant find what I'm doing wrong..
> > 
> > possibly related OSD options:
> > 
> > osd max backfills = 4
> > osd recovery max active = 15
> > debug osd = 0/0
> > osd op threads = 4
> > osd backfill scan min = 4
> > osd backfill scan max = 16
> > 
> > Any hints would be greatly appreciated
> > 
> > thanks
> > 
> > nik
> > 
> > 
> > -- 
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz 
> > 
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> > 
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to reencode an object with ceph-dencoder

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello,

i want to modify an object_info_t xattr value. I grepped ceph._ and
ceph._@1 and decoded the object to json:
ceph-dencoder type object_info_t import /tmp/a decode dump_json

After modifying the json how can i encode the json to binary?

I also found this old post
https://www.spinics.net/lists/ceph-devel/msg16519.html where somebody
directly modified the binary data but i can't find the value i'm
searching for.

I want to to replace:
"data_digest": 3180692938,

with
"data_digest": 1138105437,

But a hex edit does not show the little endian value of 3180692938?

# printf '%x\n' 3180692938
bd9585ca

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to reencode an object with ceph-dencoder

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello,

OK i missed the reverse Order of the little endian value.

Now it worked.

Greets,
Stefan

Am 08.08.2017 um 08:22 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> i want to modify an object_info_t xattr value. I grepped ceph._ and
> ceph._@1 and decoded the object to json:
> ceph-dencoder type object_info_t import /tmp/a decode dump_json
> 
> After modifying the json how can i encode the json to binary?
> 
> I also found this old post
> https://www.spinics.net/lists/ceph-devel/msg16519.html where somebody
> directly modified the binary data but i can't find the value i'm
> searching for.
> 
> I want to to replace:
> "data_digest": 3180692938,
> 
> with
> "data_digest": 1138105437,
> 
> But a hex edit does not show the little endian value of 3180692938?
> 
> # printf '%x\n' 3180692938
> bd9585ca
> 
> Greets,
> Stefan
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to fix X is an unexpected clone

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello,

how can i fix this one:

2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
3:58654d3d:::rbd_data.106dd406b8b4567.018c:9d455 is an
unexpected clone
2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
pgs repair; 1 scrub errors
2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
scrub errors

If i just delete manually the relevant files ceph is crashing. rados
does not list those at all?

How can i fix this?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com