Re: [ceph-users] fio librbd result is poor

2016-12-19 Thread David Turner
All of our DC S3500 and S3510 all ran out of writes this week after being in 
production for 1.5 years as journal drives to 4 disks each.  Having 43 drives 
say they have less than 1% of their writes left is scary. I'd recommend having 
a monitoring check for your ssds durability in Ceph.

As a note, the DC S3700 series is warrantied for almost 30x more writes than 
the S3500 series.

Sent from my iPhone

> On Dec 19, 2016, at 12:50 AM, Christian Balzer  wrote:
>
>
> Hello,
>
>> On Mon, 19 Dec 2016 15:05:05 +0800 (CST) mazhongming wrote:
>>
>> Hi Christian,
>> Thanks for your reply.
>>
>>
>> At 2016-12-19 14:01:57, "Christian Balzer"  wrote:
>>>
>>> Hello,
>>>
 On Mon, 19 Dec 2016 13:29:07 +0800 (CST) 马忠明 wrote:

 Hi guys,

 So recently I was testing our ceph cluster which mainly used for block 
 usage(rbd).

 We have 30 ssd drives total(5 storage nodes,6 ssd drives each 
 node).However the result of fio is very poor.

>>> All relevant details are missing.
>>> SSD exact models, CPU/RAM config, network config, Ceph, OS/kernel, fio
>>
>>> versions, the config you tested this with, as in replication.
>> SSD:Intel® SSD DC S3510 Series 1.2TB 2.5"
> Slower than mine, but not massively so and many more of them.
> But your distribution (CRUSH map based on 3 racks, right?) limits that
> number advantage.
> I'd expect them to be busy around 50-60% busy with the RBD engine fio.
>
> The endurance of 0.3 DPWD (0.1 really after in-line journals and other
> overhead like FS journals) would worry me.
> Are you monitoring their wear-out levels?
>
>> CPU:2×Intel E5-2630v4
> Slightly slower than the ones in my test cluster, but not significantly so.
>
>> MEM:128GB
>> Network config:2*10G bond4  LACP network connection
>> Ceph:Hammer 0.94.6
> I'd upgrade to the latest Hammer, just in case anybody ever plays with
> cache-tiering on there, which is deadly broken in that version.
>
>> OS/kernel:  Ubuntu 14.04.5 LTS/3.13.0-96-generic
> That kernel is a bit dated and vastly different than mine, but it
> shouldn't be any factor in the result.
>
>> Fio:2.12
>>
> Not missing a .1. in there?
>
> Fio 2.1.11 in my case, but I really dislike the RBD engine and the various
> bugs/inconsistencies people keep finding with it.
>
> Testing from within a (librbd backed) VM should be more realistic anyway.
>
> And this turns out to be one of these fio RBD engine corner cases, as I did
> run your fio command line against an image that was just 20GB in size.
>
> When running from a VM with libaio or with a reduced test size of 5GB
> the IOPS came down to about 8500, still faster then your but only 2x
> instead of 4x.
>
>
>>
>>>
 We tested the workload on ssd pool with following parameter :

 "fio --size=50G \

   --ioengine=rbd \

   --direct=1 \

   --numjobs=1 \

   --rw=randwrite(randread) \

   --name=com_ssd_4k_randwrite(randread) \

   --bs=4k \

   --iodepth=32 \

   --pool=ssd_volumes \

   --runtime=60 \

   --ramp_time=30 \

 --rbdname=4k_test_image"

 and here is the result:

 random write:4631;random read:21127




 I also tested  the pool(size=1,min_size=1,pg_num=256) which is consisted 
 by  only one single ssd drive with same workload pattern which is more 
 acceptable.(random write:8303;random read:27859)

>>> I'm only going to comment on the write part.
>>>
>>> On my staging cluster (* see below) I ran your fio against the cache tier
>>> (so only SSDs involved) with this result:
>>>
>>> write: io=4206.3MB, bw=71784KB/s, iops=17945, runt= 60003msec
>>>   slat (usec): min=0, max=531, avg= 3.26, stdev=11.33
>>>   clat (usec): min=5, max=41996, avg=1770.23, stdev=2260.61
>>>lat (usec): min=9, max=41997, avg=1773.36, stdev=2260.60
>>>
>>> So more than 2 times better than your non-replicated test.
>>>
>>> 4k randwrites stress the CPUs (run atop or such on your OSD nodes
>>> when doing a test run), so this might be your limit here.
>>> Along with less than optimal SSDs or a high latency network.
>>
>>>
>> yes...CPU usage might be  the bottleneck of the whole system.BTW,our ceph 
>> cluster is combined with mirantis openstack,above result ran from one 
>> computer node.And I also ran pressure test with all 10 computer node.The 
>> result is almost same and cpu usage for all storage node  is nearly 
>> 50-60%.the cpu usage for every ssd osd is nearly 250-300%.
>>
>
> Yes, the OSD 300% CPU usage looks familiar.
> The hammer code seems to peter out there, even if there's still a core or
> 2 available.
>
> The Ceph latency is something that's obviously being addressed by the
> developers.
> Check the archives and google (Nick Fisk) for how to tune up your CPU
> settings to get every last IOPS from your HW.
>
> Another thing to always remember here is that you're testing network
> latency as well when running fio w

Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)

2016-12-19 Thread Yoann Moulin
Hello,

Finally, I found time to do some new benchmarks with the latest jewel release 
(10.2.5) on 4 nodes. Each node has 10 OSDs.

I ran 2 times "ceph tell osd.* bench" over 40 OSDs, here the average speed :

4.2.0-42-generic  97.45 MB/s
4.4.0-53-generic  55.73 MB/s
4.8.15-040815-generic 62.41 MB/s
4.9.0-040900-generic  60.88 MB/s

I have the same behaviour with at least 35 to 40% performance drop between 
kernel 4.2 and kernel > 4.4

I can do further benches if needed.

Yoann

Le 26/07/2016 à 09:09, Lomayani S. Laizer a écrit :
> Hello,
> do you have journal on disk too ?
> 
> Yes am having journal on same hard disk.
> 
> ok and could you do bench with kernel 4.2 ? just to see if you have better
> throughput. Thanks
> 
> In ubuntu 14 I was running 4.2 kernel. the throughput was the same around 
> 80-90MB/s per osd. I cant tell the difference because each test gives
> the speeds on same range. I did not test kernel 4.4 in ubuntu 14
> 
> 
> --
> Lomayani
> 
> On Tue, Jul 26, 2016 at 9:39 AM, Yoann Moulin  > wrote:
> 
> Hello,
> 
> > Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are 
> similar.
> 
> do you have journal on disk too ?
> 
> > I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have 
> around
> > 80-90MB/s of OSD speeds in both operating systems
> 
> ok and could you do bench with kernel 4.2 ? just to see if you have better
> throughput. Thanks
> 
> > Only issue am observing now with ubuntu 16 is sometime osd fails on 
> rebooting
> > until i start them manually or adding starting commands in rc.local.
> 
> in my case, it's a test environment, so I don't have notice those 
> behaviours
> 
> --
> Yoann
> 
> > On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin  
> > >> wrote:
> >
> > Hello,
> >
> > (this is a repost, my previous message seems to be slipping under 
> the radar)
> >
> > Does anyone get a similar behaviour to the one described below ?
> >
> > I found a big performance drop between kernel 3.13.0-88 (default 
> kernel on
> > Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 
> (default kernel on
> > Ubuntu Xenial 16.04)
> >
> > - ceph version is Jewel (10.2.2).
> > - All tests have been done under Ubuntu 14.04 on
> > - Each cluster has 5 nodes strictly identical.
> > - Each node has 10 OSDs.
> > - Journals are on the disk.
> >
> > Kernel 4.4 has a drop of more than 50% compared to 4.2
> > Kernel 4.4 has a drop of 40% compared to 3.13
> >
> > details below :
> >
> > With the 3 kernel I have the same performance on disks :
> >
> > Raw benchmark:
> > dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> 
> average ~230MB/s
> > dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => 
> average ~220MB/s
> >
> > Filesystem mounted benchmark:
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => 
> average ~205MB/s
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => 
> average ~214MB/s
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => 
> average ~190MB/s
> >
> > Ceph osd Benchmark:
> > Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average  
> ~81MB/s
> > Kernel 4.2.0-38-generic  : ceph tell osd.ID bench => average 
> ~109MB/s
> > Kernel 4.4.0-24-generic  : ceph tell osd.ID bench => average  
> ~50MB/s
> >
> > I did new benchmarks then on 3 new fresh clusters.
> >
> > - Each cluster has 3 nodes strictly identical.
> > - Each node has 10 OSDs.
> > - Journals are on the disk.
> >
> > bench5 : Ubuntu 14.04 / Ceph Infernalis
> > bench6 : Ubuntu 14.04 / Ceph Jewel
> > bench7 : Ubuntu 16.04 / Ceph jewel
> >
> > this is the average of 2 runs of "ceph tell osd.* bench" on each 
> cluster (2 x 30
> > OSDs)
> >
> > bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
> > bench6 / 14.04 / Jewel  / kernel 3.13 :  86.47 MB/s
> >
> > bench5 / 14.04 / Infernalis / kernel 4.2  :  63.38 MB/s
> > bench6 / 14.04 / Jewel  / kernel 4.2  : 107.75 MB/s
> > bench7 / 16.04 / Jewel  / kernel 4.2  : 101.54 MB/s
> >
> > bench5 / 14.04 / Infernalis / kernel 4.4  :  53.61 MB/s
> > bench6 / 14.04 / Jewel  / kernel 4.4  :  65.82 MB/s
> > bench7 / 16.04 / Jewel  / kernel 4.4  :  61.57 MB/s
> >
> > If needed, I have the raw output of "ceph tell osd.* bench"
> >
> > Best regards
> 
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
cep

[ceph-users] CephFS metdata inconsistent PG Repair Problem

2016-12-19 Thread Sean Redmond
Hi Ceph-Users,

I have been running into a few issue with cephFS metadata pool corruption
over the last few weeks, For background please see
tracker.ceph.com/issues/17177

# ceph -v
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

I am currently facing a side effect of this issue that is making repairing
an inconsistent PG in the metadata pool (pool 5) difficult and I could use
some pointers

The PG I am having the issue with is 5.c0:

0# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
noout,sortbitwise,require_jewel_osds flag(s) set
pg 5.c0 is active+clean+inconsistent, acting [38,10,29]
1 scrub errors
noout,sortbitwise,require_jewel_osds flag(s) set
#

ceph pg 5.c0 query = http://pastebin.com/9yqrArTg

rados list-inconsistent-obj 5.c0 | python -m json.tool =
http://pastebin.com/iZV1TfxE

I have looked at the error log and it reports:

2016-12-19 16:43:36.944457 osd.38 172.27.175.12:6800/194902 10 : cluster
[ERR] 5.c0 shard 38: soid 5:035881fa:::10002639cb6.:head
omap_digest 0xc54c7
938 != best guess omap_digest 0xb6531260 from auth shard 10

If I attempted to repair this using 'ceph pg repair 5.c0' the cluster
health returns to OK, but if I force a deep scrub using 'ceph pg deep-scrub
5.c0' the same error is reported with exactly the same omap_digest values.

To understand the differences between the three osd's I performed the below
steps on each of the osd's 38,10,29

-Stop the osd
-ceph-objectstore-tool --op list --pgid 5.c0 --data-path
/var/lib/ceph/osd/ceph-$OSDID | grep 10002639cb6 (The output is used in the
next command)
- ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSDID
'["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
list-omap

Taking the output of the above I ran a diff and found that osd.38 has the
below difference:

# diff osd10-5.c0.txt osd38-5.c0.txt
4405a4406
> B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
#

I assumed the above is a file name, using a find on the file system I
confirmed the file did not exist So I must assume it was deleted and that
is expected, so I am happy to try and correct this difference.

As the 'ceph pg repair 5.c0' was not working next I tried following
http://ceph.com/planet/ceph-manually-repair-object/ to remove the object
from the file system. Upon doing a deep-scrub before a repair it reports
the object as missing, after running the repair the object is copied back
into the osd.38, a further deep-scrub however returns exactly the
same omap_digest values with osd.38 having a difference (
http://pastebin.com/iZV1TfxE)

I assume it is because this omap data is stored inside levelDB and not just
as extended attributes

getfattr -d
/var/lib/ceph/osd/ceph-38/current/5.60_head/DIR_0/DIR_6/DIR_C/DIR_A/18ad724.__head_CD74AC60__5
= http://pastebin.com/4Mc2mNNj

I tried to dig further into this by looking at the value of the opmap key
using

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-38
'["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
output = http://pastebin.com/vVUmw9Qi

I also tried this on osd.29 and found it strange the value existed using
the below, but the key ' B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head' is
not listed in the output of omap-list

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29
'["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head

I maybe walking down the wrong track, but if anyone has any pointers that
could help with repairing this PG or anything else I should be looking at
to investigate further that would be very helpful.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw civetweb ssl official documentation?

2016-12-19 Thread Christian Wuerdig
No official documentation but here is how I got it to work on Ubuntu
16.04.01 (in this case I'm using a self-signed certificate):

assuming you're running rgw on a computer called rgwnode:

1. create self-signed certificate

ssh rgwnode
openssl req -x509 -nodes -newkey rsa:4096 -keyout key.pem -out cert.pem
-days 1000

cat key.pem >> /usr/share/ca-certificates/cert.pem
 ^--- without doing this you get errors like this "civetweb:
0x564d0357d8c0: set_ssl_option: cannot open
/usr/share/ca-certificates/cert.pem: error:0906D06C:PEM
routines:PEM_read_bio:no start line"
cp cert.pem /usr/share/ca-certificates/

2. configure civitweb:

edit your ceph.conf on the admin node and add:

[client.rgw.rgwnode]
rgw_frontends = civetweb port=443s
ssl_certificate=/usr/share/ca-certificates/cert.pem

push the config
ceph-deploy push rgwnode

ssh rgwnode 'sudo systemctl restart ceph-radosgw@rgwnode'

this ended up not being enough and I found log messages like these in the
logs:
2016-09-09 17:22:21.593231 7f36c33f8a00  0 civetweb: 0x555a3b7988c0:
load_dll: cannot load libssl.so
2016-09-09 17:22:21.593278 7f36c33f8a00  0 civetweb: 0x555a3b7988c0:
load_dll: cannot load libcrypto.so

to fix it:
ssh rgwnode
sudo ln -s /lib/x86_64-linux-gnu/libssl.so.1.0.0 /usr/lib/libssl.so
sudo ln -s /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 /usr/lib/libcrypto.so


On Thu, Dec 8, 2016 at 7:44 AM, Puff, Jonathon 
wrote:

> There’s a few documents out around this subject, but I can’t find anything
> official.  Can someone point me to any official documentation for deploying
> this?   Other alternatives appear to be a HAproxy frontend.  Currently
> running 10.2.3 with a single radosgw.
>
>
>
> -JP
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-19 Thread Ken Dreyer
On Tue, Dec 13, 2016 at 4:42 AM, Francois Lafont
 wrote:
> So, now with 10.2.5 version, in my process, OSD daemons are stopped,
> then automatically restarted by the upgrade and then stopped again
> by the reboot. This is not an optimal process of course. ;)

We do not intend for anything in the packaging to restart the daemons.

The last time I looked into this issue, it behaved correctly (dpkg did
not restart the daemons during the apt-get process - the PID files
were the same before and after the upgrade).

Did you dig further to find out what is restarting them? Are you using
any configuration management system (ansible, chef, puppet) to do your
package upgrades?

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] centos 7.3 libvirt (2.0.0-10.el7_3.2) and openstack volume attachment w/ cephx broken

2016-12-19 Thread Mike Lowe
It looks like the libvirt (2.0.0-10.el7_3.2) that ships with centos 7.3 is 
broken out of the box when it comes to hot plugging new virtio-scsi devices 
backed by rbd and cephx auth.  If you use openstack, cephx auth, and centos, 
I’d caution against the upgrade to centos 7.3 right now.  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-19 Thread Ken Dreyer
On Mon, Dec 19, 2016 at 12:31 PM, Ken Dreyer  wrote:
> On Tue, Dec 13, 2016 at 4:42 AM, Francois Lafont
>  wrote:
>> So, now with 10.2.5 version, in my process, OSD daemons are stopped,
>> then automatically restarted by the upgrade and then stopped again
>> by the reboot. This is not an optimal process of course. ;)
>
> We do not intend for anything in the packaging to restart the daemons.
>
> The last time I looked into this issue, it behaved correctly (dpkg did
> not restart the daemons during the apt-get process - the PID files
> were the same before and after the upgrade).

I looked into this again on a Trusty VM today. I set up a single
mon+osd cluster on v10.2.3, with the following:

  # status ceph-osd id=0
  ceph-osd (ceph/0) start/running, process 1301

  #ceph daemon osd.0 version
  {"version":"10.2.3"}

I ran "apt-get upgrade" to get go 10.2.3 -> 10.2.5, and the OSD PID
(1301) and version from the admin socket (v10.2.3) remained the same.

Could something else be restarting the daemons in your case?

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] calamari monitoring multiple clusters

2016-12-19 Thread Vaysman, Marat
I am running Centos 6.6 calamari version. I am able to monitor nodes of a 
single cluster but cannot monitor two cluster simultaneously. I have the 
following questions:

Question 1: Does the Calamari allow monitoring of multiple clusters. On the 
right sight of the calamari dashboard I see drop-down list. But it contains 
only a single cluster.

Question 2: I was able to register nodes of the second cluster and could see 
them when I click on a manage icon. But could not see ant statistics and when I 
click on a dashboard icon these nodes not present . In 
/var/log/calamari/cthulhu.log I could see messages Ignoring event /salt/job and 
name of nodes from the second cluster.

Questions: Will it help if I remove monitored nodes of the first cluster and 
how  could I removed nodes without reinstalling  Calamari.

Appreciate any help,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS metdata inconsistent PG Repair Problem

2016-12-19 Thread Wido den Hollander

> Op 19 december 2016 om 18:14 schreef Sean Redmond :
> 
> 
> Hi Ceph-Users,
> 
> I have been running into a few issue with cephFS metadata pool corruption
> over the last few weeks, For background please see
> tracker.ceph.com/issues/17177
> 
> # ceph -v
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> 
> I am currently facing a side effect of this issue that is making repairing
> an inconsistent PG in the metadata pool (pool 5) difficult and I could use
> some pointers
> 
> The PG I am having the issue with is 5.c0:
> 
> 0# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
> noout,sortbitwise,require_jewel_osds flag(s) set
> pg 5.c0 is active+clean+inconsistent, acting [38,10,29]
> 1 scrub errors
> noout,sortbitwise,require_jewel_osds flag(s) set
> #
> 
> ceph pg 5.c0 query = http://pastebin.com/9yqrArTg
> 
> rados list-inconsistent-obj 5.c0 | python -m json.tool =
> http://pastebin.com/iZV1TfxE
> 
> I have looked at the error log and it reports:
> 
> 2016-12-19 16:43:36.944457 osd.38 172.27.175.12:6800/194902 10 : cluster
> [ERR] 5.c0 shard 38: soid 5:035881fa:::10002639cb6.:head
> omap_digest 0xc54c7
> 938 != best guess omap_digest 0xb6531260 from auth shard 10
> 
> If I attempted to repair this using 'ceph pg repair 5.c0' the cluster
> health returns to OK, but if I force a deep scrub using 'ceph pg deep-scrub
> 5.c0' the same error is reported with exactly the same omap_digest values.
> 
> To understand the differences between the three osd's I performed the below
> steps on each of the osd's 38,10,29
> 
> -Stop the osd
> -ceph-objectstore-tool --op list --pgid 5.c0 --data-path
> /var/lib/ceph/osd/ceph-$OSDID | grep 10002639cb6 (The output is used in the
> next command)
> - ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSDID
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> list-omap
> 
> Taking the output of the above I ran a diff and found that osd.38 has the
> below difference:
> 
> # diff osd10-5.c0.txt osd38-5.c0.txt
> 4405a4406
> > B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> #
> 
> I assumed the above is a file name, using a find on the file system I
> confirmed the file did not exist So I must assume it was deleted and that
> is expected, so I am happy to try and correct this difference.
> 
> As the 'ceph pg repair 5.c0' was not working next I tried following
> http://ceph.com/planet/ceph-manually-repair-object/ to remove the object
> from the file system. Upon doing a deep-scrub before a repair it reports
> the object as missing, after running the repair the object is copied back
> into the osd.38, a further deep-scrub however returns exactly the
> same omap_digest values with osd.38 having a difference (
> http://pastebin.com/iZV1TfxE)
> 
> I assume it is because this omap data is stored inside levelDB and not just
> as extended attributes
> 
> getfattr -d
> /var/lib/ceph/osd/ceph-38/current/5.60_head/DIR_0/DIR_6/DIR_C/DIR_A/18ad724.__head_CD74AC60__5
> = http://pastebin.com/4Mc2mNNj
> 
> I tried to dig further into this by looking at the value of the opmap key
> using
> 
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-38
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> output = http://pastebin.com/vVUmw9Qi
> 
> I also tried this on osd.29 and found it strange the value existed using
> the below, but the key ' B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head' is
> not listed in the output of omap-list
> 
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> 
> I maybe walking down the wrong track, but if anyone has any pointers that
> could help with repairing this PG or anything else I should be looking at
> to investigate further that would be very helpful.
> 

Thinking out loud, what about using ceph-objectstore-tool to export the PG from 
a healthy OSD (you have to start it for a moment) and importing it with the 
same tool into osd.38?

1. Stop osd.38
2. Stop osd.10
3. Export on osd.10
4. Import on osd.38
5. Start osd.10
6. Wait 5 min for PG peering and recovery
7. Start osd.38

Haven't tried this on a system, but something that popped up in my mind.

Wido

> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS metdata inconsistent PG Repair Problem

2016-12-19 Thread Goncalo Borges
Hi Sean
In our case, the last time we had this error, we stopped the osd, mark it out, 
let ceph recover and then reinstall it. We did it because we were suspecting of 
issues with the osd and that was why we decided to take this approach. The fact 
is that the pg we were seeing constantly declared as inconsistent does not has 
problems since a couple of months.
Cheers 
Gonçalo

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
Hollander [w...@42on.com]
Sent: 20 December 2016 08:29
To: ceph-users; Sean Redmond
Subject: Re: [ceph-users] CephFS metdata inconsistent PG Repair Problem

> Op 19 december 2016 om 18:14 schreef Sean Redmond :
>
>
> Hi Ceph-Users,
>
> I have been running into a few issue with cephFS metadata pool corruption
> over the last few weeks, For background please see
> tracker.ceph.com/issues/17177
>
> # ceph -v
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> I am currently facing a side effect of this issue that is making repairing
> an inconsistent PG in the metadata pool (pool 5) difficult and I could use
> some pointers
>
> The PG I am having the issue with is 5.c0:
>
> 0# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
> noout,sortbitwise,require_jewel_osds flag(s) set
> pg 5.c0 is active+clean+inconsistent, acting [38,10,29]
> 1 scrub errors
> noout,sortbitwise,require_jewel_osds flag(s) set
> #
>
> ceph pg 5.c0 query = http://pastebin.com/9yqrArTg
>
> rados list-inconsistent-obj 5.c0 | python -m json.tool =
> http://pastebin.com/iZV1TfxE
>
> I have looked at the error log and it reports:
>
> 2016-12-19 16:43:36.944457 osd.38 172.27.175.12:6800/194902 10 : cluster
> [ERR] 5.c0 shard 38: soid 5:035881fa:::10002639cb6.:head
> omap_digest 0xc54c7
> 938 != best guess omap_digest 0xb6531260 from auth shard 10
>
> If I attempted to repair this using 'ceph pg repair 5.c0' the cluster
> health returns to OK, but if I force a deep scrub using 'ceph pg deep-scrub
> 5.c0' the same error is reported with exactly the same omap_digest values.
>
> To understand the differences between the three osd's I performed the below
> steps on each of the osd's 38,10,29
>
> -Stop the osd
> -ceph-objectstore-tool --op list --pgid 5.c0 --data-path
> /var/lib/ceph/osd/ceph-$OSDID | grep 10002639cb6 (The output is used in the
> next command)
> - ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSDID
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> list-omap
>
> Taking the output of the above I ran a diff and found that osd.38 has the
> below difference:
>
> # diff osd10-5.c0.txt osd38-5.c0.txt
> 4405a4406
> > B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> #
>
> I assumed the above is a file name, using a find on the file system I
> confirmed the file did not exist So I must assume it was deleted and that
> is expected, so I am happy to try and correct this difference.
>
> As the 'ceph pg repair 5.c0' was not working next I tried following
> http://ceph.com/planet/ceph-manually-repair-object/ to remove the object
> from the file system. Upon doing a deep-scrub before a repair it reports
> the object as missing, after running the repair the object is copied back
> into the osd.38, a further deep-scrub however returns exactly the
> same omap_digest values with osd.38 having a difference (
> http://pastebin.com/iZV1TfxE)
>
> I assume it is because this omap data is stored inside levelDB and not just
> as extended attributes
>
> getfattr -d
> /var/lib/ceph/osd/ceph-38/current/5.60_head/DIR_0/DIR_6/DIR_C/DIR_A/18ad724.__head_CD74AC60__5
> = http://pastebin.com/4Mc2mNNj
>
> I tried to dig further into this by looking at the value of the opmap key
> using
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-38
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> output = http://pastebin.com/vVUmw9Qi
>
> I also tried this on osd.29 and found it strange the value existed using
> the below, but the key ' B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head' is
> not listed in the output of omap-list
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29
> '["5.c0",{"oid":"10002639cb6.","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
>
> I maybe walking down the wrong track, but if anyone has any pointers that
> could help with repairing this PG or anything else I should be looking at
> to investigate further that would be very helpful.
>

Thinking out loud, what about using ceph-objectstore-tool to export the PG from 
a healthy OSD (you have to start it for a moment) and importing it with the 
same tool into osd.38?

1. Stop osd.38
2. Stop osd.10
3. Export on osd.10
4. Import on osd.38

Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-19 Thread Christian Balzer

Hello,

On Mon, 19 Dec 2016 12:31:44 -0700 Ken Dreyer wrote:

> On Tue, Dec 13, 2016 at 4:42 AM, Francois Lafont
>  wrote:
> > So, now with 10.2.5 version, in my process, OSD daemons are stopped,
> > then automatically restarted by the upgrade and then stopped again
> > by the reboot. This is not an optimal process of course. ;)
> 
> We do not intend for anything in the packaging to restart the daemons.
>
I think you're fixating wrongly on the "restart" bit.

What Francois wrote and what most Ceph admins do is to manually stop
demons before starting the upgrade process.

And what he didn't want nor expected was for Ceph to start these demons up
after/during the upgrade.

I think this might be related to the new feature of Ceph to restart demons
if they failed.
Something that's nice enough, but ought to be configurable for people who
do NOT want to trawl log files to see if something had a hiccup.
The only time OSD or MON demons died on me was during brutal tests on my
vastly underpowered (especially RAM) alpha-test cluster.

Christian
 
> The last time I looked into this issue, it behaved correctly (dpkg did
> not restart the daemons during the apt-get process - the PID files
> were the same before and after the upgrade).
> 
> Did you dig further to find out what is restarting them? Are you using
> any configuration management system (ansible, chef, puppet) to do your
> package upgrades?
> 
> - Ken
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] centos 7.3 libvirt (2.0.0-10.el7_3.2) and openstack volume attachment w/ cephx broken

2016-12-19 Thread Jason Dillaman
Do you happen to know if there is an existing bugzilla ticket against
this issue?

On Mon, Dec 19, 2016 at 3:46 PM, Mike Lowe  wrote:
> It looks like the libvirt (2.0.0-10.el7_3.2) that ships with centos 7.3 is 
> broken out of the box when it comes to hot plugging new virtio-scsi devices 
> backed by rbd and cephx auth.  If you use openstack, cephx auth, and centos, 
> I’d caution against the upgrade to centos 7.3 right now.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-19 Thread Francois Lafont
Hi,

On 12/19/2016 09:58 PM, Ken Dreyer wrote:

> I looked into this again on a Trusty VM today. I set up a single
> mon+osd cluster on v10.2.3, with the following:
> 
>   # status ceph-osd id=0
>   ceph-osd (ceph/0) start/running, process 1301
> 
>   #ceph daemon osd.0 version
>   {"version":"10.2.3"}
> 
> I ran "apt-get upgrade" to get go 10.2.3 -> 10.2.5, and the OSD PID
> (1301) and version from the admin socket (v10.2.3) remained the same.
> 
> Could something else be restarting the daemons in your case?

As Christian said, this is not _exactly_ the "problem" I have described
in my first message. You can read it again, I give _verbatim_ the commands
I launch in the host during an upgrade. Personally, I stop manually the
daemons before the "ceph" upgrade (which is not the case in your example
above):

1. I stop manually all OSD daemons in the host.
2. I make the "ceph" upgrade (sudo apt-get update && sudo apt-get upgrade)

Then...

3(i).  Before the 10.2.5 version, the ceph daemons are still stopped.
3(ii). With the 10.2.5 version, the ceph daemons have been started 
automatically.

Personally I would prefer the 3i scenario (all details are in my first message).
I don't know what exactly but something has changed with the version 10.2.5.

Regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] centos 7.3 libvirt (2.0.0-10.el7_3.2) and openstack volume attachment w/ cephx broken

2016-12-19 Thread Mike Lowe
Not that I’ve found, it’s a little hard to search for.  I believe it’s related 
to this libvirt mailing list thread 
https://www.redhat.com/archives/libvir-list/2016-October/msg00396.html 

You’ll find this in the libvirt qemu log for the instance 'No secret with id 
'scsi0-0-0-1-secret0’’ and this in the nova-compute log 'libvirtError: internal 
error: unable to execute QEMU command '__com.redhat_drive_add': Device 
'drive-scsi0-0-0-1' could not be initialized’.  I was able to yum downgrade 
twice to get to something from the 1.2 series.


> On Dec 19, 2016, at 6:40 PM, Jason Dillaman  wrote:
> 
> Do you happen to know if there is an existing bugzilla ticket against
> this issue?
> 
> On Mon, Dec 19, 2016 at 3:46 PM, Mike Lowe  wrote:
>> It looks like the libvirt (2.0.0-10.el7_3.2) that ships with centos 7.3 is 
>> broken out of the box when it comes to hot plugging new virtio-scsi devices 
>> backed by rbd and cephx auth.  If you use openstack, cephx auth, and centos, 
>> I’d caution against the upgrade to centos 7.3 right now.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How exactly does rgw work?

2016-12-19 Thread Gerald Spencer
Hello all,

We're currently waiting on a delivery of equipment for a small 50TB proof
of concept cluster, and I've been lurking/learning a ton from you. Thanks
for how active everyone is.

Question(s):
How does the raids gateway work exactly?
Does it introduce a single point of failure?
Does all of the traffic go through the host running the rgw server?

I just don't fully understand that side of things. As for architecture our
poc will have:
- 1 monitor
- 4 OSDs with 12 x 6TB drives, 1 x 800 PCIe journal

I'd all goes as planned, this will scale up to:
- 3 monitors
- 48 osds

This should give us enough storage (~1.2PB) wth enough throughput to handle
the data requirements of our machines to saturate our 100Gb link...





Cheers,
G
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tracker.ceph.com

2016-12-19 Thread Nathan Cutler

Please let me know if you notice anything is amiss.


I haven't received any email notifications since the crash. Normally on 
a Monday I'd have several dozen.


--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com