Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-26 Thread Konstantin Shalygin

On 1/26/19 10:24 PM, Kevin Olbrich wrote:

I just had the time to check again: even after removing the broken
OSD, mgr still crashes.
All OSDs are on and in.
If I run "ceph balancer on" on a HEALTH_OK cluster, an optimization
plan is generated and started. After some minutes all MGRs die.

This is a major problem for me, as I still got that SSD OSD that is
inbalanced and limiting the whole pools space.


Try to run mgr with `debug mgr = 4/5` and look to mgr log file.



k


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How To Properly Failover a HA Setup

2019-01-26 Thread Charles Tassell
I tried setting noout and that did provide a bit better result.  
Basically I could stop the OSD on the inactive server and everything 
still worked (after a 2-3 second pause) but then when I rebooted the 
inactive server everything hung again until it came back online and 
resynced with the cluster.  This is what I saw in ceph -s:


    cluster eb2003cf-b16d-4551-adb7-892469447f89
 health HEALTH_WARN
    128 pgs degraded
    124 pgs stuck unclean
    128 pgs undersized
    recovery 805252/1610504 objects degraded (50.000%)
    mds cluster is degraded
    1/2 in osds are down
    noout flag(s) set
 monmap e1: 3 mons at 
{FILE1=10.1.1.201:6789/0,FILE2=10.1.1.202:6789/0,MON1=10.1.1.90:6789/0}

    election epoch 216, quorum 0,1,2 FILE1,FILE2,MON1
  fsmap e796: 1/1/1 up {0=FILE2=up:rejoin}
 osdmap e360: 2 osds: 1 up, 2 in; 128 remapped pgs
    flags noout,sortbitwise,require_jewel_osds
  pgmap v7056802: 128 pgs, 3 pools, 164 GB data, 786 kobjects
    349 GB used, 550 GB / 899 GB avail
    805252/1610504 objects degraded (50.000%)
 128 active+undersized+degraded
  client io 1379 B/s rd, 1 op/s rd, 0 op/s wr

These are the commands I ran and the results:
ceph osd set noout
systemctl stop ceph-mds@FILE2.service
# Everything still works on the clients...
systemctl stop ceph-osd@0.service # This was on FILE2 wile FILE1 was the 
active fsmap

# Fails over quickly, can still read content on the clients..
# Rebooted FILE2
# File access on the clients locked up until FILE2 rejoined


This is on Ubuntu 16 with kernel 4.4.0-141, so I'm not sure if that 
qualifies for David's warning about old kernels...


Is there a command or a logfile I can look at that will better help to 
diagnose this issue?  Is three servers (with only 2 OSDs) enough to run 
a HA cluster on ceph, or does it just die when it doesn't have 3 active 
servers for a quorum? Would installing MDS and MON on a 4th box (but 
sticking with 2 OSDs) be what's required to resolve this?  I really 
don't want to do that, but if I have to I guess I can look into find 
another box.



On 2019-01-21 5:01 p.m., ceph-users-requ...@lists.ceph.com wrote:

Message: 14
Date: Mon, 21 Jan 2019 10:05:15 +0100
From: Robert Sander
To:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How To Properly Failover a HA Setup
Message-ID:<587dac75-96bc-8719-ee62-38e71491c...@heinlein-support.de>
Content-Type: text/plain; charset="utf-8"

On 21.01.19 09:22, Charles Tassell wrote:


Hello Everyone,

  ? I've got a 3 node Jewel cluster setup, and I think I'm missing
something.? When I want to take one of my nodes down for maintenance
(kernel upgrades or the like) all of my clients (running the kernel
module for the cephfs filesystem) hang for a couple of minutes before
the redundant servers kick in.


Have you set the noout flag before doing cluster maintenance?

ceph osd set noout

and afterwards

ceph osd unset noout

Regards
-- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Questions about using existing HW for PoC cluster

2019-01-26 Thread Will Dennis
Hi all,

Kind of new to Ceph (have been using 10.2.11 on a 3-node Proxmox 4.x cluster 
[hyperconverged], works great!) and now I'm thinking of perhaps using it for a 
bigger data storage project at work, a PoC at first, but built as correctly as 
possible for performance and availability. I have the following server 
equipment available to use for the PoC; if it all goes well, I'd think new 
hardware for an actual production installation would be in order :)

For the OSD servers, I have:

(5) Intel R2312GL4GS 2U servers (c. 2013) with the following specs --
  - (2) Intel Xeon E5-2660 CPUs (8-core, dual-threaded)
  - 64GB memory
  - (1) dual-port 10Gbase-T NIC (Intel X540-AT2)
  - (1) dual-port Infiniband HBA (Mellanox MT27500 ConnectX-3) (probably won't 
use, and would remove)
  - (4) Intel 1Gbase-T NICs (on mobo)
  - (1) Intel 240GB SATA SSD (OS)
  - (8) Hitachi 2TB SATA drives

I am not bound to using the existing disk in these servers, but also want to 
keep the price down, as this is only a PoC. Was thinking of either putting an 
Intel Optane 900P PCIe SSD (480G) in for journal, or else some sort of SATA SSD 
in one of the available front bays (it's a 12 hotswap-bay machine, + two 
internal SSD mounts.) I also could get some higher capacity (and newer!) SATA 
drives, so as to keep the number of OSDs down for a given capacity (shooting 
for 25-50TB to start.) However, I'd love it if I didn't have to ask for any 
money ;)

For monitor machines, I have available three Supermicro (c.2011) 1U servers 
with:
  - (2) Intel Xeon X5680 CPUs
  - 48GB memory
  - (2) 1Gbase-T NICs (on mobo)
  - (1) WD 2TB SATA drive

I am considering also the rack placement; the 5 servers I'd use for OSD all 
currently live in one rack, and the Mon servers in another. I could move them 
if necessary.

So, a few questions to start ;)

- Is the above an acceptable collection of useful equipment for a PoC of modern 
Ceph? (thinking of installing Mimic with Bluestore)
- Is putting the journal on a partition of the SATA drives a real I/O killer? 
(this is how my Proxmox boxes are set up)
- If YES to the above, then is a SATA SSD acceptable for journal device, or 
should I definitely consider PCIe SSD? (I'd have to limit to one per server, 
which I know isn't optimal, but price prevents otherwise...)
- Should I spread the servers out over racks, which would probably force me to 
use 3 out of the 5 avail OSD servers, and put bigger disks in them to get the 
desired capacity (I only have three racks to work with), or is it OK for a PoC 
to keep all OSD servers in one rack?
- Are the platforms I'm proposing to use for monitor servers acceptable as-is, 
or do they need more memory, SSD drives, or 10GbE NICs?

OK, enough q's for now - thanks for helping a new Ceph'r out :)

Best,
Will



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

2019-01-26 Thread Christian Balzer

Hello,

this is where (depending on your topology) something like:
---
mon_osd_down_out_subtree_limit = host
---
can come in very handy.

Provided you have correct monitoring, alerting and operations, recovering
a down node can often be restored long before any recovery would be
finished and you also avoid the data movement back and forth.
And if you see that recovering the node will take a long time, just
manually set things out for the time being.

Christian

On Sun, 27 Jan 2019 00:02:54 +0100 Götz Reinicke wrote:

> Dear Chris,
> 
> Thanks for your feedback. The node/OSDs in question are part of an erasure 
> coded pool and during the weekend the workload should be close to none.
> 
> But anyway, I could get a look on the console and on the server; the power is 
> up, but I cant use any console, the Loginprompt is shown, but no key is 
> accepted.
> 
> I’ll have to reboot the server and check what he is complaining about 
> tomorrow morning ASAP I can access the server again.
> 
>   Fingers crossed and regards. Götz
> 
> 
> 
> > Am 26.01.2019 um 23:41 schrieb Chris :
> > 
> > It sort of depends on your workload/use case.  Recovery operations can be 
> > computationally expensive.  If your load is light because its the weekend 
> > you should be able to turn that host back on  as soon as you resolve 
> > whatever the issue is with minimal impact.  You can also increase the 
> > priority of the recovery operation to make it go faster if you feel you can 
> > spare additional IO and it won't affect clients.
> > 
> > We do this in our cluster regularly and have yet to see an issue (given 
> > that we take care to do it during periods of lower client io)
> > 
> > On January 26, 2019 17:16:38 Götz Reinicke  
> > wrote:
> >   
> >> Hi,
> >> 
> >> one host out of 10 is down for yet unknown reasons. I guess a power 
> >> failure. I could not yet see the server.
> >> 
> >> The Cluster is recovering and remapping fine, but still has some objects 
> >> to process.
> >> 
> >> My question: May I just switch the server back on and in best case, the 24 
> >> OSDs get back online and recovering will do the job without problems.
> >> 
> >> Or what might be a good way to handle that host? Should I first wait till 
> >> the recover is finished?
> >> 
> >> Thanks for feedback and suggestions - Happy Saturday Night  :) . Regards . 
> >> Götz  
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bucket logging howto

2019-01-26 Thread Marc Roos




>From the owner account of the bucket I am trying to enable logging, but 
I don't get how this should work. I see the s3:PutBucketLogging is 
supported, so I guess this should work. How do you enable it? And how do 
you access the log?


[@ ~]$ s3cmd -c .s3cfg accesslog s3://archive Access logging for: 
s3://archive/
   Logging Enabled: False

[@ ~]$ s3cmd -c .s3cfg.archive accesslog s3://archive 
--access-logging-target-prefix=s3://archive/xx
ERROR: S3 error: 405 (MethodNotAllowed)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bucket logging howto

2019-01-26 Thread Marc Roos



>From the owner account of the bucket I am trying to enable logging, but 
I don't get how this should work. I see the s3:PutBucketLogging is 
supported, so I guess this should work. How do you enable it? And how do 
you access the log?


[@ ~]$ s3cmd -c .s3cfg accesslog s3://archive
Access logging for: s3://archive/
   Logging Enabled: False
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

2019-01-26 Thread Brian Topping
I went through this as I reformatted all the OSDs with a much smaller cluster 
last weekend. When turning nodes back on, PGs would sometimes move, only to 
move back, prolonging the operation and system stress. 

What I took away is it’s least overall system stress to have the OSD tree back 
to target state as quickly as safe and practical. Replication will happen as 
replication will, but if the strategy changes midway, it just means the same 
speed of movement over a longer time. 

> On Jan 26, 2019, at 15:41, Chris  wrote:
> 
> It sort of depends on your workload/use case.  Recovery operations can be 
> computationally expensive.  If your load is light because its the weekend you 
> should be able to turn that host back on  as soon as you resolve whatever the 
> issue is with minimal impact.  You can also increase the priority of the 
> recovery operation to make it go faster if you feel you can spare additional 
> IO and it won't affect clients.
> 
> We do this in our cluster regularly and have yet to see an issue (given that 
> we take care to do it during periods of lower client io)
> 
>> On January 26, 2019 17:16:38 Götz Reinicke  
>> wrote:
>> 
>> Hi,
>> 
>> one host out of 10 is down for yet unknown reasons. I guess a power failure. 
>> I could not yet see the server.
>> 
>> The Cluster is recovering and remapping fine, but still has some objects to 
>> process.
>> 
>> My question: May I just switch the server back on and in best case, the 24 
>> OSDs get back online and recovering will do the job without problems.
>> 
>> Or what might be a good way to handle that host? Should I first wait till 
>> the recover is finished?
>> 
>> Thanks for feedback and suggestions - Happy Saturday Night  :) . Regards . 
>> Götz
>> 
>> 
>> --
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

2019-01-26 Thread Götz Reinicke
Dear Chris,

Thanks for your feedback. The node/OSDs in question are part of an erasure 
coded pool and during the weekend the workload should be close to none.

But anyway, I could get a look on the console and on the server; the power is 
up, but I cant use any console, the Loginprompt is shown, but no key is 
accepted.

I’ll have to reboot the server and check what he is complaining about tomorrow 
morning ASAP I can access the server again.

Fingers crossed and regards. Götz



> Am 26.01.2019 um 23:41 schrieb Chris :
> 
> It sort of depends on your workload/use case.  Recovery operations can be 
> computationally expensive.  If your load is light because its the weekend you 
> should be able to turn that host back on  as soon as you resolve whatever the 
> issue is with minimal impact.  You can also increase the priority of the 
> recovery operation to make it go faster if you feel you can spare additional 
> IO and it won't affect clients.
> 
> We do this in our cluster regularly and have yet to see an issue (given that 
> we take care to do it during periods of lower client io)
> 
> On January 26, 2019 17:16:38 Götz Reinicke  
> wrote:
> 
>> Hi,
>> 
>> one host out of 10 is down for yet unknown reasons. I guess a power failure. 
>> I could not yet see the server.
>> 
>> The Cluster is recovering and remapping fine, but still has some objects to 
>> process.
>> 
>> My question: May I just switch the server back on and in best case, the 24 
>> OSDs get back online and recovering will do the job without problems.
>> 
>> Or what might be a good way to handle that host? Should I first wait till 
>> the recover is finished?
>> 
>> Thanks for feedback and suggestions - Happy Saturday Night  :) . Regards . 
>> Götz



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

2019-01-26 Thread Chris
It sort of depends on your workload/use case.  Recovery operations can be 
computationally expensive.  If your load is light because its the weekend 
you should be able to turn that host back on  as soon as you resolve 
whatever the issue is with minimal impact.  You can also increase the 
priority of the recovery operation to make it go faster if you feel you can 
spare additional IO and it won't affect clients.


We do this in our cluster regularly and have yet to see an issue (given 
that we take care to do it during periods of lower client io)


On January 26, 2019 17:16:38 Götz Reinicke  
wrote:



Hi,

one host out of 10 is down for yet unknown reasons. I guess a power 
failure. I could not yet see the server.


The Cluster is recovering and remapping fine, but still has some objects to 
process.


My question: May I just switch the server back on and in best case, the 24 
OSDs get back online and recovering will do the job without problems.


Or what might be a good way to handle that host? Should I first wait till 
the recover is finished?


Thanks for feedback and suggestions - Happy Saturday Night  :) . Regards . Götz


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One host with 24 OSDs is offline - best way to get it back online

2019-01-26 Thread Götz Reinicke
Hi,

one host out of 10 is down for yet unknown reasons. I guess a power failure. I 
could not yet see the server.

The Cluster is recovering and remapping fine, but still has some objects to 
process.

My question: May I just switch the server back on and in best case, the 24 OSDs 
get back online and recovering will do the job without problems.

Or what might be a good way to handle that host? Should I first wait till the 
recover is finished?

Thanks for feedback and suggestions - Happy Saturday Night  :) . 
Regards . Götz

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] repair do not work for inconsistent pg which three replica are the same

2019-01-26 Thread ceph


Am 10. Januar 2019 08:43:30 MEZ schrieb Wido den Hollander :
>
>
>On 1/10/19 8:36 AM, hnuzhoulin2 wrote:
>> 
>> Hi,cephers
>> 
>> I have two inconsistent pg.I try list inconsistent obj,got nothing.
>> 
>> rados list-inconsistent-obj 388.c29
>> No scrub information available for pg 388.c29
>> error 2: (2) No such file or directory
>> 
>
>
>Have you tried to run a deep-scrub on this PG and see what that does?
>
>Wido
>
>> so I search the log to find the obj name, and I search this name in
>> three replica. Yes, three replica all the same(md5 is the same).
>> error log is: 388.c29 shard 295: soid
>>
>388:9430fef2:::c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4__multipart_dumbo%2f180888654%2f20181221%2fxtrabackup_full_x19_30044_20181221025000%2fx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595:head
>> candidate had a readerror

In Addition i would Check the  underlying Disk... perhaps something in dmesg?

- Mehmet  
>> 
>> obj name is:
>>
>DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184
>> all md5 is : 73281ed56c92a56da078b1ae52e888e0  
>> 
>> stat info is:
>> root@cld-osd3-48:/home/ceph/var/lib/osd/ceph-33/current/388.c29_head#
>> stat
>>
>DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184
>>   Size: 4194304   Blocks: 8200       IO Block: 4096   regular file
>> Device: 891h/2193dInode: 4300403471  Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (  999/    ceph)   Gid: (  999/  
> ceph)
>> Access: 2018-12-21 14:17:12.945132144 +0800
>> Modify: 2018-12-21 14:17:12.965132073 +0800
>> Change: 2018-12-21 14:17:13.761129235 +0800
>>  Birth: -
>> 
>>
>root@cld-osd24-48:/home/ceph/var/lib/osd/ceph-279/current/388.c29_head#
>> stat
>>
>DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184
>>   Size: 4194304   Blocks: 8200       IO Block: 4096   regular file
>> Device: 831h/2097dInode: 8646464869  Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (  999/    ceph)   Gid: (  999/  
> ceph)
>> Access: 2019-01-07 10:54:23.010293026 +0800
>> Modify: 2019-01-07 10:54:23.010293026 +0800
>> Change: 2019-01-07 10:54:23.014293004 +0800
>>  Birth: -
>> 
>>
>root@cld-osd31-48:/home/ceph/var/lib/osd/ceph-363/current/388.c29_head#
>> stat
>>
>DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184
>>   Size: 4194304   Blocks: 8200       IO Block: 4096   regular file
>> Device: 831h/2097dInode: 13141445890  Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (  999/    ceph)   Gid: (  999/  
> ceph)
>> Access: 2018-12-21 14:17:12.946862160 +0800
>> Modify: 2018-12-21 14:17:12.966862262 +0800
>> Change: 2018-12-21 14:17:13.762866312 +0800
>>  Birth: -
>> 
>> 
>> another pg os the same.I try run deep-scrub and repair. do not work.
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-26 Thread Kevin Olbrich
Hi!

I just had the time to check again: even after removing the broken
OSD, mgr still crashes.
All OSDs are on and in.
If I run "ceph balancer on" on a HEALTH_OK cluster, an optimization
plan is generated and started. After some minutes all MGRs die.

This is a major problem for me, as I still got that SSD OSD that is
inbalanced and limiting the whole pools space.


root@adminnode:~# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   29.91933 root default
-16   29.91933 datacenter dc01
-19   29.91933 pod dc01-agg01
-10   16.52396 rack dc01-rack02
 -46.29695 host node1001
  0   hdd  0.90999 osd.0 up  1.0 1.0
  1   hdd  0.90999 osd.1 up  1.0 1.0
  5   hdd  0.90999 osd.5 up  1.0 1.0
 29   hdd  0.90970 osd.29up  1.0 1.0
 33   hdd  0.90970 osd.33up  1.0 1.0
  2   ssd  0.43700 osd.2 up  1.0 1.0
  3   ssd  0.43700 osd.3 up  1.0 1.0
  4   ssd  0.43700 osd.4 up  1.0 1.0
 30   ssd  0.43660 osd.30up  1.0 1.0
 -76.29724 host node1002
  9   hdd  0.90999 osd.9 up  1.0 1.0
 10   hdd  0.90999 osd.10up  1.0 1.0
 11   hdd  0.90999 osd.11up  1.0 1.0
 12   hdd  0.90999 osd.12up  1.0 1.0
 35   hdd  0.90970 osd.35up  1.0 1.0
  6   ssd  0.43700 osd.6 up  1.0 1.0
  7   ssd  0.43700 osd.7 up  1.0 1.0
  8   ssd  0.43700 osd.8 up  1.0 1.0
 31   ssd  0.43660 osd.31up  1.0 1.0
-282.18318 host node1005
 34   ssd  0.43660 osd.34up  1.0 1.0
 36   ssd  0.87329 osd.36up  1.0 1.0
 37   ssd  0.87329 osd.37up  1.0 1.0
-291.74658 host node1006
 42   ssd  0.87329 osd.42up  1.0 1.0
 43   ssd  0.87329 osd.43up  1.0 1.0
-11   13.39537 rack dc01-rack03
-225.38794 host node1003
 17   hdd  0.90999 osd.17up  1.0 1.0
 18   hdd  0.90999 osd.18up  1.0 1.0
 24   hdd  0.90999 osd.24up  1.0 1.0
 26   hdd  0.90999 osd.26up  1.0 1.0
 13   ssd  0.43700 osd.13up  1.0 1.0
 14   ssd  0.43700 osd.14up  1.0 1.0
 15   ssd  0.43700 osd.15up  1.0 1.0
 16   ssd  0.43700 osd.16up  1.0 1.0
-255.38765 host node1004
 23   hdd  0.90999 osd.23up  1.0 1.0
 25   hdd  0.90999 osd.25up  1.0 1.0
 27   hdd  0.90999 osd.27up  1.0 1.0
 28   hdd  0.90970 osd.28up  1.0 1.0
 19   ssd  0.43700 osd.19up  1.0 1.0
 20   ssd  0.43700 osd.20up  1.0 1.0
 21   ssd  0.43700 osd.21up  1.0 1.0
 22   ssd  0.43700 osd.22up  1.0 1.0
-302.61978 host node1007
 38   ssd  0.43660 osd.38up  1.0 1.0
 39   ssd  0.43660 osd.39up  1.0 1.0
 40   ssd  0.87329 osd.40up  1.0 1.0
 41   ssd  0.87329 osd.41up  1.0 1.0



root@adminnode:~# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 0.90999  1.0  932GiB  353GiB  579GiB 37.87 0.83  95
 1   hdd 0.90999  1.0  932GiB  400GiB  531GiB 42.98 0.94 108
 5   hdd 0.90999  1.0  932GiB  267GiB  664GiB 28.70 0.63  72
29   hdd 0.90970  1.0  932GiB  356GiB  576GiB 38.19 0.84  96
33   hdd 0.90970  1.0  932GiB  344GiB  587GiB 36.94 0.81  93
 2   ssd 0.43700  1.0  447GiB  273GiB  174GiB 61.09 1.34  52
 3   ssd 0.43700  1.0  447GiB  252GiB  195GiB 56.38 1.23  61
 4   ssd 0.43700  1.0  447GiB  308GiB  140GiB 68.78 1.51  59
30   ssd 0.43660  1.0  447GiB  231GiB  216GiB 51.77 1.13  48
 9   hdd 0.90999  1.0  932GiB  358GiB  573GiB 38.48 0.84  97
10   hdd 0.90999  1.0  932GiB  347GiB  585GiB 37.25 0.82  94
11   hdd 0.90999  

Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Götz Reinicke


> Am 26.01.2019 um 14:16 schrieb Kevin Olbrich :
> 
> Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke
> :
>> 
>> Hi,
>> 
>> I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.
>> 
>> I grow that rbd and ext4 starting with an 2TB rbd that way:
>> 
>> rbd resize testpool/disk01--size 4194304
>> 
>> resize2fs /dev/rbd0
>> 
>> Today I wanted to extend that ext4 to 8 TB and did:
>> 
>> rbd resize testpool/disk01--size 8388608
>> 
>> resize2fs /dev/rbd0
>> 
>> => which gives an error: The filesystem is already 1073741824 blocks. 
>> Nothing to do.
>> 
>> 
>>I bet I missed something very simple. Any hint? Thanks and regards . 
>> Götz
> 
> Try "partprobe" to read device metrics again.

Did not change anything and did not give any output/log messages. 

/Götz




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Kevin Olbrich
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke
:
>
> Hi,
>
> I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.
>
> I grow that rbd and ext4 starting with an 2TB rbd that way:
>
> rbd resize testpool/disk01--size 4194304
>
> resize2fs /dev/rbd0
>
> Today I wanted to extend that ext4 to 8 TB and did:
>
> rbd resize testpool/disk01--size 8388608
>
> resize2fs /dev/rbd0
>
> => which gives an error: The filesystem is already 1073741824 blocks. Nothing 
> to do.
>
>
> I bet I missed something very simple. Any hint? Thanks and regards . 
> Götz

Try "partprobe" to read device metrics again.

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Götz Reinicke
Hi,

I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.

I grow that rbd and ext4 starting with an 2TB rbd that way:

rbd resize testpool/disk01--size 4194304

resize2fs /dev/rbd0

Today I wanted to extend that ext4 to 8 TB and did:

rbd resize testpool/disk01--size 8388608

resize2fs /dev/rbd0

=> which gives an error: The filesystem is already 1073741824 blocks. Nothing 
to do.


I bet I missed something very simple. Any hint? Thanks and regards . 
Götz



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating to a dedicated cluster network

2019-01-26 Thread Simon Leinen
Paul Emmerich writes:
> Split networks is rarely worth it. One fast network is usually better.
> And since you mentioned having only two interfaces: one bond is way
> better than two independent interfaces.

> IPv4/6 dual stack setups will be supported in Nautilus, you currently
> have to use either IPv4 or IPv6.

> Jumbo frames: often mentioned but usually not worth it.
> (Yes, I know that this is somewhat controversial and increasing MTU is
> often a standard trick for performance tuning, but I still have to see
> have a benchmark that actually shows a significant performance
> improvements. Some quick tests show that I can save around 5-10% CPU
> load on a system doing ~50 gbit/s of IO traffic which is almost
> nothing given the total system load)

Agree with everything Paul said.  (I know this is lame, but I think all
of this bears repeating :-)

To address another question in Jan's original post:

I would not consider using link-local IPv6 addressing.  Not just because
I doubt that this would work (Ceph would always need to know/tell the OS
which interface it should use with such an address), but mainly because
even if it does work, it will only work as long as everything is on a
single logical IPv6 network.  This will artificially limit your options
for the evolution of your cluster.

Routable addresses are cheap in IPv6, use them!
-- 
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-26 Thread Simon Leinen
cmonty14  writes:
> due to performance issues RGW is not an option.  This statement may be
> wrong, but there's the following aspect to consider.

> If I write a backup that is typically a large file, this is normally a
> single IO stream.
> This causes massive performance issues on Ceph because this single IO
> stream is sequentially written in small pieces on OSDs.
> To overcome this issue multi IO stream should be used when writing
> large files, and this means the application writing the backup must
> support multi IO stream.

RGW (and the S3 protocol in general) supports multi-stream uploads
nicely, via the "multipart upload" feature: You split your file into
many pieces, which can be uploaded in parallel.

RGW with multipart uploads seems like a good fit for your application.
It could solve your naming and permission issues, has low overhead, and
could give you good performance as long as you use multipart uploads
with parallel threads.  You just need to make sure that your RGW
gateways have enough throughput, but this capacity is relatively easy
and inexpensive to provide.

> Considering this the following question comes up: If I write a backup
> into a RBD (that could be considered as a network share), will Ceph
> use single IO stream or multi IO stream on storage side?

Ceph should be able to handle multiple parallel streams of I/O to an RBD
device (in general, writes will go to different "chunks" of the RBD, and
those chunk objects will be on different OSDs).  But it's another
question whether your RBD client will be able to issue parallel streams
of requests.  Usually you have some kind of file system and kernel block
I/O layer on the client side, and it's possible that those will
serialize I/O, which will make it hard to get high throughput.
-- 
Simon.

> THX

> Am Di., 22. Jan. 2019 um 23:20 Uhr schrieb Christian Wuerdig
> :
>> 
>> If you use librados directly it's up to you to ensure you can
>> identify your objects. Generally RADOS stores objects and not files
>> so when you provide your object ids you need to come up with a
>> convention so you can correctly identify them. If you need to
>> provide meta data (i.e. a list of all existing backups, when they
>> were taken etc.) then again you need to manage that yourself
>> (probably in dedicated meta-data objects). Using RADOS namespaces
>> (like one per database) is probably a good idea.
>> Also keep in mind that for example Bluestore has a maximum object
>> size of 4GB so mapping files 1:1 to object is probably not a wise
>> approach and you should breakup your files into smaller chunks when
>> storing them. There is libradosstriper which handles the striping of
>> large objects transparently but not sure if that has support for
>> RADOS namespaces.
>> 
>> Using RGW instead might be an easier route to go down
>> 
>> On Wed, 23 Jan 2019 at 10:10, cmonty14 <74cmo...@gmail.com> wrote:
>>> 
>>> My backup client is using librados.
>>> I understand that defining a pool for the same application is recommended.
>>> 
>>> However this would not answer my other questions:
>>> How can I identify a backup created by client A that I want to restore
>>> on another client Z?
>>> I mean typically client A would write a backup file identified by the
>>> filename.
>>> Would it be possible on client Z to identify this backup file by
>>> filename? If yes, how?
>>> 
>>> Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb :
>>> >
>>> > Hi,
>>> >
>>> > Ceph's pool are meant to let you define specific engineering rules
>>> > and/or application (rbd, cephfs, rgw)
>>> > They are not designed to be created in a massive fashion (see pgs etc)
>>> > So, create a pool for each engineering ruleset, and store your data in 
>>> > them
>>> > For what is left of your project, I believe you have to implement that
>>> > on top of Ceph
>>> >
>>> > For instance, let say you simply create a pool, with a rbd volume in it
>>> > You then create a filesystem on that, and map it on some server
>>> > Finally, you can push your files on that mountpoint, using various
>>> > Linux's user, acl or whatever : beyond that point, there is nothing more
>>> > specific to Ceph, it is "just" a mounted filesystem
>>> >
>>> > Regards,
>>> >
>>> > On 01/22/2019 02:16 PM, cmonty14 wrote:
>>> > > Hi,
>>> > >
>>> > > my use case for Ceph is providing a central backup storage.
>>> > > This means I will backup multiple databases in Ceph storage cluster.
>>> > >
>>> > > This is my question:
>>> > > What is the best practice for creating pools & images?
>>> > > Should I create multiple pools, means one pool per database?
>>> > > Or should I create a single pool "backup" and use namespace when writing
>>> > > data in the pool?
>>> > >
>>> > > This is the security demand that should be considered:
>>> > > DB-owner A can only modify the files that belong to A; other files
>>> > > (owned by B, C or D) are accessible for A.
>>> > >
>>> > > And there's another issue:
>>> > > How can I identify a backup created by client A