[ceph-users] ceph pgs state forever stale+active+clean

2017-08-17 Thread Hyun Ha
Hi, Cephers!

I'm currently testing the situation of double failure for ceph cluster.
But, I faced that pgs are in stale state forever.

reproduce steps)
0. ceph version : jewel 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
1. Pool create : exp-volumes (size = 2, min_size = 1)
2. rbd create : testvol01
3. rbd map and create mkfs.xfs
4. mount and create file
5. list rados object
6. check osd map of each object
 # ceph osd map exp-volumes rbd_data.4a41f238e1f29.017a
   osdmap e199 pool 'exp-volumes' (2) object
'rbd_data.4a41f238e1f29.017a'
-> pg 2.3f04d6e2 (2.62) -> up ([2,6], p2) acting ([2,6], p2)
7. stop primary osd.2 and secondary osd.6 of above object at the same time
8. check ceph status
health HEALTH_ERR
16 pgs are stuck inactive for more than 300 seconds
16 pgs stale
16 pgs stuck stale
 monmap e11: 3 mons at {10.105.176.85=10.105.176.85:
6789/0,10.110.248.154=10.110.248.154:6789/0,10.110.249.153=
10.110.249.153:6789/0}
election epoch 84, quorum 0,1,2 10.105.176.85,10.110.248.154,
10.110.249.153
 osdmap e248: 6 osds: 4 up, 4 in; 16 remapped pgs
flags sortbitwise,require_jewel_osds
  pgmap v112095: 128 pgs, 1 pools, 14659 kB data, 17 objects
165 MB used, 159 GB / 160 GB avail
 112 active+clean
  16 stale+active+clean

# ceph health detail
HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 16 pgs
stale; 16 pgs stuck stale
pg 2.67 is stuck stale for 689.171742, current state stale+active+clean,
last acting [2,6]
pg 2.5a is stuck stale for 689.171748, current state stale+active+clean,
last acting [6,2]
pg 2.52 is stuck stale for 689.171753, current state stale+active+clean,
last acting [2,6]
pg 2.4d is stuck stale for 689.171757, current state stale+active+clean,
last acting [2,6]
pg 2.56 is stuck stale for 689.171755, current state stale+active+clean,
last acting [6,2]
pg 2.d is stuck stale for 689.171811, current state stale+active+clean,
last acting [6,2]
pg 2.79 is stuck stale for 689.171808, current state stale+active+clean,
last acting [2,6]
pg 2.1f is stuck stale for 689.171782, current state stale+active+clean,
last acting [6,2]
pg 2.76 is stuck stale for 689.171809, current state stale+active+clean,
last acting [6,2]
pg 2.17 is stuck stale for 689.171794, current state stale+active+clean,
last acting [6,2]
pg 2.63 is stuck stale for 689.171794, current state stale+active+clean,
last acting [2,6]
pg 2.77 is stuck stale for 689.171816, current state stale+active+clean,
last acting [2,6]
pg 2.1b is stuck stale for 689.171793, current state stale+active+clean,
last acting [6,2]
pg 2.62 is stuck stale for 689.171765, current state stale+active+clean,
last acting [2,6]
pg 2.30 is stuck stale for 689.171799, current state stale+active+clean,
last acting [2,6]
pg 2.19 is stuck stale for 689.171798, current state stale+active+clean,
last acting [6,2]

 # ceph pg dump_stuck stale
ok
pg_stat state   up  up_primary  acting  acting_primary
2.67stale+active+clean  [2,6]   2   [2,6]   2
2.5astale+active+clean  [6,2]   6   [6,2]   6
2.52stale+active+clean  [2,6]   2   [2,6]   2
2.4dstale+active+clean  [2,6]   2   [2,6]   2
2.56stale+active+clean  [6,2]   6   [6,2]   6
2.d stale+active+clean  [6,2]   6   [6,2]   6
2.79stale+active+clean  [2,6]   2   [2,6]   2
2.1fstale+active+clean  [6,2]   6   [6,2]   6
2.76stale+active+clean  [6,2]   6   [6,2]   6
2.17stale+active+clean  [6,2]   6   [6,2]   6
2.63stale+active+clean  [2,6]   2   [2,6]   2
2.77stale+active+clean  [2,6]   2   [2,6]   2
2.1bstale+active+clean  [6,2]   6   [6,2]   6
2.62stale+active+clean  [2,6]   2   [2,6]   2
2.30stale+active+clean  [2,6]   2   [2,6]   2
2.19stale+active+clean  [6,2]   6   [6,2]   6

# ceph pg 2.62 query
Error ENOENT: i don't have pgid 2.62

 # rados ls -p exp-volumes
rbd_data.4a41f238e1f29.003f
^C --> hang

I understand that this is a natural result becasue above pgs have no
primary and seconary osd. But this situation can be occurred so, I want to
recover ceph cluster and rbd images.

Firstly I want to know how to make ceph cluster's state clean.
I read document and try to solve this but nothing can help including below
commands.
 - ceph pg force_create_pg 2.6
 - ceph osd lost 2 --yes-i-really-mean-it
 - ceph osd lost 6 --yes-i-really-mean-it
 - ceph osd crush rm osd.2
 - ceph osd crush rm osd.6
 - cpeh osd rm osd.2
 - ceph osd rm osd.6

Is there any command to force delete pgs or make ceph cluster clean ?
Thank you in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph Cluster attempt to access beyond end of device

2017-08-17 Thread Hauke Homburg
Am 15.08.2017 um 16:34 schrieb ZHOU Yuan:
> Hi Hauke,
>
> It's possibly the XFS issue as discussed in the previous thread. I
> also saw this issue in some JBOD setup, running with RHEL 7.3
>
>
> Sincerely, Yuan
>
> On Tue, Aug 15, 2017 at 7:38 PM, Hauke Homburg
> mailto:hhomb...@w3-creative.de>> wrote:
>
> Hello,
>
>
> I found some error in the Cluster with dmes -T:
>
> attempt to access beyond end of device
>
> I found the following Post:
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39101.html
> 
>
> Is this a Problem with the Size of the Filesystem itself oder "only"
> eine Driver Bug? I ask becaue we habe in each Node 8 HDD with a
> Hardware
> RAID 6 running. In this RAID we have the XFS Partition.
>
> Also we have one big Filesystem in 1 OSD in each Server instead of 1
> Filesystem per HDD at 8 HDD in each Server.
>
> greetings
>
> Hauke
>
>
> --
> www.w3-creative.de 
>
> www.westchat.de 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>

Hello,

I upgraded the CentOS7 kernel to Kernel 4.12

https://www.tecmint.com/install-upgrade-kernel-version-in-centos-7/

after the Upgrade the Error are gone actual.

Thanks for help.

Greeting


Hauke

-- 
www.w3-creative.de

www.westchat.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Delete PG because ceph pg force_create_pg doesnt help

2017-08-17 Thread Hauke Homburg
Am 17.08.2017 um 22:35 schrieb Hauke Homburg:
> Am 16.08.2017 um 13:40 schrieb Hauke Homburg:
>> Hello,
>>
>>
>> How can i delete a pg completly from a ceph server? I think i have all
>> Data manually from the Server deleted. But i a ceph pg  query
>> shows the pg already? A ceph pg force_create_pg doesn't create the pg.
>>
>> The ceph says he has created the pg, and a pg is stuck more than 300 sec.
>>
>> Thanks for your help.
>>
> Hello,
>
>
> Sorry fpr my Mistake. I didnt remote the complete PG from HDD i only
> removes 1 Object.
>
> Today i removed from the primary Node and from 1 secondary the the
> complete PG with rm -Rf /var/lib/ceph/osd//.
> Before i stopped the OSD and cleared the journal. The Cluster restored
> the primary PG but the Inconsistent Marker left. After that i started
> again ceph pg repair .
>
> Has anyone an Idea what i can do to put the PG into consistent state?
>
> Thanks for Help
>
>
> Hauke
>

Hello,

The Cluster is health again. The Remove of the complete PG in the
Filesystem an the pg repair after sync helped.

Now i can go on with the Data Migration into the Cluster.

Thanks for Help

Hauke

-- 
www.w3-creative.de

www.westchat.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to distribute data

2017-08-17 Thread David Turner
Do you mean a lot of snapshots or creating a lot of clones from a snapshot?
I can agree to the pain of crating a lot of snapshots of rbds in ceph. I'm
assuming that you mean to say that you will have a template rbd with a
version snapshot that you clone each time you need to let someone log in.
Is that what you're planning?

On Thu, Aug 17, 2017, 9:51 PM Christian Balzer  wrote:

>
> Hello,
>
> On Fri, 18 Aug 2017 03:31:56 +0200 Oscar Segarra wrote:
>
> > Hi Christian,
> >
> > Thanks a lot for helping...
> >
> > Have you read:
> > http://docs.ceph.com/docs/master/rbd/rbd-openstack/
> >
> > So just from the perspective of qcow2, you seem to be doomed.
> > --> Sorry, I've talking about RAW + QCOW2 when I meant RBD images and RBD
> > snapshots...
> >
> I tested Snapshots with Hammer and the release before it, found them
> immensely painful (resource intensive) and avoided them since.
> That said, there are supposedly quite some improvements in recent versions
> (I suppose you'll deploy with Luminous), as well as more (and
> working) control knobs to reduce the impact of snapshot operations.
>
> > A sufficiently large cache tier should help there immensely and the base
> image
> > should be in cache (RAM, pagecache on the OSD servers really) all the
> time
> > anyway.
> > --> If talking about RBD images and RBD snapshots... it helps immensely
> as
> > well?
> >
> No experience, so nothing conclusive and authoritative from my end.
> If the VMs write/read alot of the same data (as in 4MB RBD objects),
> cache-tiering should help again.
> But promoting and demoting things through it when dealing with snapshots
> and deletions of them might be a pain.
>
> Christian
>
> > Sizing this and specifying the correct type of SSDs/NVMes for the
> cache-tier
> > is something that only you can answer based on existing data or
> sufficiently
> > detailed and realistic tests.
> > --> Yes, the problem is that I have to buy a HW and for Windows 10 VDI...
> > and I cannot make realistic tests previously :( but I will work on this
> > line...
> >
> > Thanks a lot again!
> >
> >
> >
> > 2017-08-18 3:14 GMT+02:00 Christian Balzer :
> >
> > >
> > > Hello,
> > >
> > > On Thu, 17 Aug 2017 23:56:49 +0200 Oscar Segarra wrote:
> > >
> > > > Hi David,
> > > >
> > > > Thanks a lot again for your quick answer...
> > > >
> > > > *The rules in the CRUSH map will always be followed.  It is not
> possible
> > > > for Ceph to go against that and put data into a root that shouldn't
> have
> > > > it.*
> > > > --> I will work on your proposal of creating two roots in the CRUSH
> > > map...
> > > > just one question more:
> > > > --> Regarding to space consumption, with this proposal, is it
> possible to
> > > > know how many disk space is it free in each pool?
> > > >
> > > > *The problem with a cache tier is that Ceph is going to need to
> promote
> > > and
> > > > evict stuff all the time (not free).  A lot of people that want to
> use
> > > SSD
> > > > cache tiering for RBDs end up with slower performance because of
> this.
> > > > Christian Balzer is the expert on Cache Tiers for RBD usage.  His
> primary
> > > > stance is that it's most likely a bad idea, but there are definite
> cases
> > > > where it's perfect.*
> > > > --> Christian, is there any advice for VDI --> BASE IMAGE (raw) +
> 1000
> > > > linked clones (qcow2)
> > > >
> > > Have you read:
> > > http://docs.ceph.com/docs/master/rbd/rbd-openstack/
> > >
> > > So just from the perspective of qcow2, you seem to be doomed.
> > >
> > > Windows always appears to be very chatty when it comes to I/O and also
> > > paging/swapping seemingly w/o need, rhyme or reason.
> > > A sufficiently large cache tier should help there immensely and the
> base
> > > image should be in cache (RAM, pagecache on the OSD servers really)
> all the
> > > time anyway.
> > > Sizing this and specifying the correct type of SSDs/NVMes for the
> > > cache-tier is something that only you can answer based on existing
> data or
> > > sufficiently detailed and realistic tests.
> > >
> > > Christian
> > >
> > > > Thanks a lot.
> > > >
> > > >
> > > > 2017-08-17 22:42 GMT+02:00 David Turner :
> > > >
> > > > > The rules in the CRUSH map will always be followed.  It is not
> possible
> > > > > for Ceph to go against that and put data into a root that shouldn't
> > > have it.
> > > > >
> > > > > The problem with a cache tier is that Ceph is going to need to
> promote
> > > and
> > > > > evict stuff all the time (not free).  A lot of people that want to
> use
> > > SSD
> > > > > cache tiering for RBDs end up with slower performance because of
> this.
> > > > > Christian Balzer is the expert on Cache Tiers for RBD usage.  His
> > > primary
> > > > > stance is that it's most likely a bad idea, but there are definite
> > > cases
> > > > > where it's perfect.
> > > > >
> > > > >
> > > > > On Thu, Aug 17, 2017 at 4:20 PM Oscar Segarra <
> oscar.sega...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi David,
> > > > >>
>

Re: [ceph-users] How to distribute data

2017-08-17 Thread Christian Balzer

Hello,

On Fri, 18 Aug 2017 03:31:56 +0200 Oscar Segarra wrote:

> Hi Christian,
> 
> Thanks a lot for helping...
> 
> Have you read:
> http://docs.ceph.com/docs/master/rbd/rbd-openstack/
> 
> So just from the perspective of qcow2, you seem to be doomed.
> --> Sorry, I've talking about RAW + QCOW2 when I meant RBD images and RBD  
> snapshots...
>
I tested Snapshots with Hammer and the release before it, found them
immensely painful (resource intensive) and avoided them since.
That said, there are supposedly quite some improvements in recent versions
(I suppose you'll deploy with Luminous), as well as more (and
working) control knobs to reduce the impact of snapshot operations. 
 
> A sufficiently large cache tier should help there immensely and the base image
> should be in cache (RAM, pagecache on the OSD servers really) all the time
> anyway.
> --> If talking about RBD images and RBD snapshots... it helps immensely as  
> well?
> 
No experience, so nothing conclusive and authoritative from my end.
If the VMs write/read alot of the same data (as in 4MB RBD objects),
cache-tiering should help again.
But promoting and demoting things through it when dealing with snapshots
and deletions of them might be a pain.

Christian

> Sizing this and specifying the correct type of SSDs/NVMes for the cache-tier
> is something that only you can answer based on existing data or sufficiently
> detailed and realistic tests.
> --> Yes, the problem is that I have to buy a HW and for Windows 10 VDI...  
> and I cannot make realistic tests previously :( but I will work on this
> line...
> 
> Thanks a lot again!
> 
> 
> 
> 2017-08-18 3:14 GMT+02:00 Christian Balzer :
> 
> >
> > Hello,
> >
> > On Thu, 17 Aug 2017 23:56:49 +0200 Oscar Segarra wrote:
> >  
> > > Hi David,
> > >
> > > Thanks a lot again for your quick answer...
> > >
> > > *The rules in the CRUSH map will always be followed.  It is not possible
> > > for Ceph to go against that and put data into a root that shouldn't have
> > > it.*  
> > > --> I will work on your proposal of creating two roots in the CRUSH  
> > map...  
> > > just one question more:  
> > > --> Regarding to space consumption, with this proposal, is it possible to 
> > >  
> > > know how many disk space is it free in each pool?
> > >
> > > *The problem with a cache tier is that Ceph is going to need to promote  
> > and  
> > > evict stuff all the time (not free).  A lot of people that want to use  
> > SSD  
> > > cache tiering for RBDs end up with slower performance because of this.
> > > Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
> > > stance is that it's most likely a bad idea, but there are definite cases
> > > where it's perfect.*  
> > > --> Christian, is there any advice for VDI --> BASE IMAGE (raw) + 1000  
> > > linked clones (qcow2)
> > >  
> > Have you read:
> > http://docs.ceph.com/docs/master/rbd/rbd-openstack/
> >
> > So just from the perspective of qcow2, you seem to be doomed.
> >
> > Windows always appears to be very chatty when it comes to I/O and also
> > paging/swapping seemingly w/o need, rhyme or reason.
> > A sufficiently large cache tier should help there immensely and the base
> > image should be in cache (RAM, pagecache on the OSD servers really) all the
> > time anyway.
> > Sizing this and specifying the correct type of SSDs/NVMes for the
> > cache-tier is something that only you can answer based on existing data or
> > sufficiently detailed and realistic tests.
> >
> > Christian
> >  
> > > Thanks a lot.
> > >
> > >
> > > 2017-08-17 22:42 GMT+02:00 David Turner :
> > >  
> > > > The rules in the CRUSH map will always be followed.  It is not possible
> > > > for Ceph to go against that and put data into a root that shouldn't  
> > have it.  
> > > >
> > > > The problem with a cache tier is that Ceph is going to need to promote  
> > and  
> > > > evict stuff all the time (not free).  A lot of people that want to use  
> > SSD  
> > > > cache tiering for RBDs end up with slower performance because of this.
> > > > Christian Balzer is the expert on Cache Tiers for RBD usage.  His  
> > primary  
> > > > stance is that it's most likely a bad idea, but there are definite  
> > cases  
> > > > where it's perfect.
> > > >
> > > >
> > > > On Thu, Aug 17, 2017 at 4:20 PM Oscar Segarra  > >  
> > > > wrote:
> > > >  
> > > >> Hi David,
> > > >>
> > > >> Thanks a lot for your quick answer!
> > > >>
> > > >> *If I'm understanding you correctly, you want to have 2 different  
> > roots  
> > > >> that pools can be made using.  The first being entirely SSD storage.  
> > The  
> > > >> second being HDD Storage with an SSD cache tier on top of it.  *  
> > > >> --> Yes, this is what I mean.  
> > > >>
> > > >> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-
> > > >> and-ssd-within-the-same-box/  
> > > >> --> I'm not an expert in CRUSH rules... Whit this configuration, it is 
> > > >>  
> > > >> guaranteed that objects stored in ssd pool d

Re: [ceph-users] Optimise Setup with Bluestore

2017-08-17 Thread Christian Balzer
On Fri, 18 Aug 2017 00:09:48 +0200 Mehmet wrote:

> *resend... this Time to the list...*
> Hey David Thank you for the response!
> 
> My use case is actually only rbd for kvm Images where mostly Running Lamp 
> systems on Ubuntu or centos.
> All Images (rbds) are created with "proxmox" where the ceph defaults are used 
> (actually Jewel in the near Future luminous...)
> 
> What i want to know is Primary which constelation would be optimal for 
> bluestore?
> 
> I.e.
> Put db and RAW Device on HDD and the Wal on nvme in my case?
> Should i replace a HDD with an ssd and use this for the dbs so thats finally 
> HDD as RAW Device, db on ssd and Wal on nvme?
> Or... use the HDD for RAW and the nvme for Wal and db?
> 

Ideally (based on what I've been reading here) you'll have your WALs on
NVMe (with a 2nd NVMe to reduce the failure domain to 12 OSDs and improve
IOPS/durability or RAID1 to remove a SPOF).
Then you'll have the DB on decent (and larger) SSDs (not just one, same as
above), as small writes will use this essentially as a journal again.
And finally HDD OSDs as before. 

If you get a 2nd NVMe and use it individually for 12 OSDs, you could go
for something like 24GB for the 12 WALs and 30GB each for the DBs, which
would be a decent compromise I suppose.
But the DB size feels a bit small, somebody with actual experience ought
to pipe up here.

Christian

> Hope you (and others) understand what i mean :)
> 
> - Mehmet 
> 
> Am 16. August 2017 19:01:30 MESZ schrieb David Turner :
> >Honestly there isn't enough information about your use case.  RBD usage
> >with small IO vs ObjectStore with large files vs ObjectStore with small
> >files vs any number of things.  The answer to your question might be
> >that
> >for your needs you should look at having a completely different
> >hardware
> >configuration than what you're running.  There is no correct way to
> >configure your cluster based on what hardware you have.  What hardware
> >you
> >use and what configuration settings you use should be based on your
> >needs
> >and use case.
> >
> >On Wed, Aug 16, 2017 at 12:13 PM Mehmet  wrote:
> >  
> >> :( no suggestions or recommendations on this?
> >>
> >> Am 14. August 2017 16:50:15 MESZ schrieb Mehmet :
> >>  
> >>> Hi friends,
> >>>
> >>> my actual hardware setup per OSD-node is as follow:
> >>>
> >>> # 3 OSD-Nodes with
> >>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
> >>> Hyper-Threading
> >>> - 64GB RAM
> >>> - 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
> >>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device  
> >for  
> >>> 12 Disks (20G Journal size)
> >>> - 1x Samsung SSD 840/850 Pro only for the OS
> >>>
> >>> # and 1x OSD Node with
> >>> - 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
> >>> - 64GB RAM
> >>> - 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
> >>> - 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
> >>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device  
> >for  
> >>> 24 Disks (15G Journal size)
> >>> - 1x Samsung SSD 850 Pro only for the OS
> >>>
> >>> As you can see, i am using 1 (one) NVMe (Intel DC P3700 NVMe – 400G)
> >>> Device for whole Spinning Disks (partitioned) on each OSD-node.
> >>>
> >>> When „Luminous“ is available (as next LTE) i plan to switch vom
> >>> „filestore“ to „bluestore“ 😊
> >>>
> >>> As far as i have read bluestore consists of
> >>> - „the device“
> >>> - „block-DB“: device that store RocksDB metadata
> >>> - „block-WAL“: device that stores RocksDB „write-ahead journal“
> >>>
> >>> Which setup would be usefull in my case?
> >>> I Would setup the disks via "ceph-deploy".
> >>>
> >>> Thanks in advance for your suggestions!
> >>> - Mehmet
> >>> --
> >>>
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>> ___  
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to distribute data

2017-08-17 Thread Christian Balzer

Hello,

On Thu, 17 Aug 2017 23:56:49 +0200 Oscar Segarra wrote:

> Hi David,
> 
> Thanks a lot again for your quick answer...
> 
> *The rules in the CRUSH map will always be followed.  It is not possible
> for Ceph to go against that and put data into a root that shouldn't have
> it.*
> --> I will work on your proposal of creating two roots in the CRUSH map...  
> just one question more:
> --> Regarding to space consumption, with this proposal, is it possible to  
> know how many disk space is it free in each pool?
> 
> *The problem with a cache tier is that Ceph is going to need to promote and
> evict stuff all the time (not free).  A lot of people that want to use SSD
> cache tiering for RBDs end up with slower performance because of this.
> Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
> stance is that it's most likely a bad idea, but there are definite cases
> where it's perfect.*
> --> Christian, is there any advice for VDI --> BASE IMAGE (raw) + 1000  
> linked clones (qcow2)
> 
Have you read:
http://docs.ceph.com/docs/master/rbd/rbd-openstack/

So just from the perspective of qcow2, you seem to be doomed.

Windows always appears to be very chatty when it comes to I/O and also
paging/swapping seemingly w/o need, rhyme or reason.
A sufficiently large cache tier should help there immensely and the base
image should be in cache (RAM, pagecache on the OSD servers really) all the
time anyway.
Sizing this and specifying the correct type of SSDs/NVMes for the
cache-tier is something that only you can answer based on existing data or
sufficiently detailed and realistic tests.

Christian

> Thanks a lot.
> 
> 
> 2017-08-17 22:42 GMT+02:00 David Turner :
> 
> > The rules in the CRUSH map will always be followed.  It is not possible
> > for Ceph to go against that and put data into a root that shouldn't have it.
> >
> > The problem with a cache tier is that Ceph is going to need to promote and
> > evict stuff all the time (not free).  A lot of people that want to use SSD
> > cache tiering for RBDs end up with slower performance because of this.
> > Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
> > stance is that it's most likely a bad idea, but there are definite cases
> > where it's perfect.
> >
> >
> > On Thu, Aug 17, 2017 at 4:20 PM Oscar Segarra 
> > wrote:
> >  
> >> Hi David,
> >>
> >> Thanks a lot for your quick answer!
> >>
> >> *If I'm understanding you correctly, you want to have 2 different roots
> >> that pools can be made using.  The first being entirely SSD storage.  The
> >> second being HDD Storage with an SSD cache tier on top of it.  *  
> >> --> Yes, this is what I mean.  
> >>
> >> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-
> >> and-ssd-within-the-same-box/  
> >> --> I'm not an expert in CRUSH rules... Whit this configuration, it is  
> >> guaranteed that objects stored in ssd pool do not "go" to the hdd disks?
> >>
> >> *The above guide explains how to set up the HDD root and the SSD root.
> >> After that all you do is create a pool on the HDD root for RBDs, a pool on
> >> the SSD root for a cache tier to use with the HDD pool, and then a a pool
> >> on the SSD root for RBDs.  There aren't actually a lot of use cases out
> >> there where using an SSD cache tier on top of an HDD RBD pool is what you
> >> really want.  I would recommend testing this thoroughly and comparing your
> >> performance to just a standard HDD pool for RBDs without a cache tier.*  
> >> --> I'm working on a VDI solution where there are BASE IMAGES (raw) and  
> >> qcow2 linked clones... where I expect not all VDIs to be powered on at the
> >> same time and perform a configuration to avoid problems related to login
> >> storm. (1000 hosts)  
> >> --> Do you think it is not a good idea? do you know what does usually  
> >> people configure for this kind of scenarios?
> >>
> >> Thanks a lot.
> >>
> >>
> >> 2017-08-17 21:31 GMT+02:00 David Turner :
> >>  
> >>> If I'm understanding you correctly, you want to have 2 different roots
> >>> that pools can be made using.  The first being entirely SSD storage.  The
> >>> second being HDD Storage with an SSD cache tier on top of it.
> >>>
> >>> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-
> >>> and-ssd-within-the-same-box/
> >>>
> >>> The above guide explains how to set up the HDD root and the SSD root.
> >>> After that all you do is create a pool on the HDD root for RBDs, a pool on
> >>> the SSD root for a cache tier to use with the HDD pool, and then a a pool
> >>> on the SSD root for RBDs.  There aren't actually a lot of use cases out
> >>> there where using an SSD cache tier on top of an HDD RBD pool is what you
> >>> really want.  I would recommend testing this thoroughly and comparing your
> >>> performance to just a standard HDD pool for RBDs without a cache tier.
> >>>
> >>> On Thu, Aug 17, 2017 at 3:18 PM Oscar Segarra 
> >>> wrote:
> >>>  
>  Hi,
> 
>  Sorry guys, duri

Re: [ceph-users] Ceph cluster with SSDs

2017-08-17 Thread Christian Balzer

Hello,

On Fri, 18 Aug 2017 00:00:09 +0200 Mehmet wrote:

> Which ssds are used? Are they in production? If so how is your PG Count?
>
What he wrote.
W/o knowing which apples you're comparing to what oranges, this is
pointless.

Also testing osd bench is the LEAST relevant test you can do, as it only
deals with local bandwidth, while what people nearly always want/need in
the end is IOPS and low latency.
Which you test best from a real client perspective.

Christian
 
> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy 
> :
> >Hello,
> >I am using the Ceph cluster with HDDs and SSDs. Created separate pool
> >for each.
> >Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
> >and SSD's OSD show around 280MB/s.
> >
> >Ideally, what I expected was - SSD's OSDs should be at-least 40% high
> >as compared with HDD's OSD bench.
> >
> >Did I miss anything here? Any hint is appreciated.
> >
> >Thanks
> >Swami
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to distribute data

2017-08-17 Thread Oscar Segarra
Thanks a lot David,

for me is a little bit difficult to make some tests because I have to buy a
hardware... and the price is different with cache ssd tier o without it.

If anybody have experience with VDI/login storms... will be really welcome!

Note: I have removed the ceph-user list because I get errors when I copy it.

2017-08-18 2:20 GMT+02:00 David Turner :

> Get it set up and start running tests. You can always enable or disable
> the cache tier later. I don't know if Christian will chime in. And please
> stop removing the ceph-users list from your responses.
>
> On Thu, Aug 17, 2017, 7:41 PM Oscar Segarra 
> wrote:
>
>> Thanks a lot David!!!
>>
>> Let's wait the opinion of Christian about the suggested configuration for
>> VDI...
>>
>> Óscar Segarra
>>
>> 2017-08-18 1:03 GMT+02:00 David Turner :
>>
>>> `ceph df` and `ceph osd df` should give you enough information to know
>>> how full each pool, root, osd, etc are.
>>>
>>> On Thu, Aug 17, 2017, 5:56 PM Oscar Segarra 
>>> wrote:
>>>
 Hi David,

 Thanks a lot again for your quick answer...


 *The rules in the CRUSH map will always be followed.  It is not
 possible for Ceph to go against that and put data into a root that
 shouldn't have it.*
 --> I will work on your proposal of creating two roots in the CRUSH
 map... just one question more:
 --> Regarding to space consumption, with this proposal, is it possible
 to know how many disk space is it free in each pool?


 *The problem with a cache tier is that Ceph is going to need to promote
 and evict stuff all the time (not free).  A lot of people that want to use
 SSD cache tiering for RBDs end up with slower performance because of this.
 Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
 stance is that it's most likely a bad idea, but there are definite cases
 where it's perfect.*
 --> Christian, is there any advice for VDI --> BASE IMAGE (raw) + 1000
 linked clones (qcow2)

 Thanks a lot.


 2017-08-17 22:42 GMT+02:00 David Turner :

> The rules in the CRUSH map will always be followed.  It is not
> possible for Ceph to go against that and put data into a root that
> shouldn't have it.
>
> The problem with a cache tier is that Ceph is going to need to promote
> and evict stuff all the time (not free).  A lot of people that want to use
> SSD cache tiering for RBDs end up with slower performance because of this.
> Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
> stance is that it's most likely a bad idea, but there are definite cases
> where it's perfect.
>
>
> On Thu, Aug 17, 2017 at 4:20 PM Oscar Segarra 
> wrote:
>
>> Hi David,
>>
>> Thanks a lot for your quick answer!
>>
>> *If I'm understanding you correctly, you want to have 2 different
>> roots that pools can be made using.  The first being entirely SSD 
>> storage.
>> The second being HDD Storage with an SSD cache tier on top of it.  *
>> --> Yes, this is what I mean.
>>
>> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-
>> and-ssd-within-the-same-box/
>> --> I'm not an expert in CRUSH rules... Whit this configuration, it
>> is guaranteed that objects stored in ssd pool do not "go" to the hdd 
>> disks?
>>
>> *The above guide explains how to set up the HDD root and the SSD
>> root.  After that all you do is create a pool on the HDD root for RBDs, a
>> pool on the SSD root for a cache tier to use with the HDD pool, and then 
>> a
>> a pool on the SSD root for RBDs.  There aren't actually a lot of use 
>> cases
>> out there where using an SSD cache tier on top of an HDD RBD pool is what
>> you really want.  I would recommend testing this thoroughly and comparing
>> your performance to just a standard HDD pool for RBDs without a cache 
>> tier.*
>> --> I'm working on a VDI solution where there are BASE IMAGES (raw)
>> and qcow2 linked clones... where I expect not all VDIs to be powered on 
>> at
>> the same time and perform a configuration to avoid problems related to
>> login storm. (1000 hosts)
>> --> Do you think it is not a good idea? do you know what does usually
>> people configure for this kind of scenarios?
>>
>> Thanks a lot.
>>
>>
>> 2017-08-17 21:31 GMT+02:00 David Turner :
>>
>>> If I'm understanding you correctly, you want to have 2 different
>>> roots that pools can be made using.  The first being entirely SSD 
>>> storage.
>>> The second being HDD Storage with an SSD cache tier on top of it.
>>>
>>> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-
>>> and-ssd-within-the-same-box/
>>>
>>> The above guide explains how to set up the HDD root and the SSD
>>> root.  After that all you do is create a 

Re: [ceph-users] Fwd: Can't get fullpartition space

2017-08-17 Thread David Clarke
On 18/08/17 06:10, Maiko de Andrade wrote:
> Hi, 
> 
> I want install ceph in 3 machines. CEPH, CEPH-OSD-1 CEPH-OSD-2, each
> machines have 2 disk in RAID 0 with total 930GiB 
> 
> CEPH is mon and osd too..
> CEPH-OSD-1 osd
> CEPH-OSD-2 osd
> 
> I install and reinstall ceph many times. All installation the CEPH don't
> get full partion space. Take only 1GB. How I change? 
> 
> In first machine a have this:
> 
> 
> CEPH$ df -Ph
> Filesystem  Size  Used Avail Use% Mounted on
> udev3.9G 0  3.9G   0% /dev
> tmpfs   796M  8.8M  787M   2% /run
> /dev/sda1   182G  2.2G  171G   2% /
> tmpfs   3.9G 0  3.9G   0% /dev/shm
> tmpfs   5.0M 0  5.0M   0% /run/lock
> tmpfs   3.9G 0  3.9G   0% /sys/fs/cgroup
> tmpfs   796M 0  796M   0% /run/user/1000
> /dev/sda3   738G   33M  738G   1% /var/lib/ceph/osd/ceph-0
> 
> CEPH$ ceph osd tree
> ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
> -1   0.00980 root default
> -3   0.00980 host ceph
>  0   hdd 0.00980 osd.0 up  1.0 1.0
> 
> CEPH$ ceph -s
>   cluster:
> id: 6f3f162b-17ab-49b7-9e4b-904539cfce10
> health: HEALTH_OK
> 
>   services:
> mon: 1 daemons, quorum ceph
> mgr: ceph(active)
> osd: 1 osds: 1 up, 1 in
> 
>   data:
> pools:   0 pools, 0 pgs
> objects: 0 objects, 0 bytes
> usage:   1053 MB used, 9186 MB / 10240 MB avail
> pgs:
> 
> 
> I try use:
> CEPH$ ceph osd crush reweight osd.0 .72
> reweighted item id 0 name 'osd.0' to 0.72 in crush map
> 
> $ ceph osd tree
> ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
> -1   0.71999 root default
> -3   0.71999 host ceph
>  0   hdd 0.71999 osd.0 up  1.0 1.0
> 
> 
> $ ceph -s
>   cluster:
> id: 6f3f162b-17ab-49b7-9e4b-904539cfce10
> health: HEALTH_OK
> 
>   services:
> mon: 1 daemons, quorum ceph
> mgr: ceph(active)
> osd: 1 osds: 1 up, 1 in
> 
>   data:
> pools:   0 pools, 0 pgs
> objects: 0 objects, 0 bytes
> usage:   1054 MB used, 9185 MB / 10240 MB avail
> pgs:

I had similar problems when installing to disks with existing non-Ceph
partitions on them, and ended up setting 'bluestore_block_size' to the
size (in bytes) that I wanted the OSD to be.

That is very probably not the correct solution, and I'd strongly
recommend passing Ceph full, unused, devices instead of using the same
disks as the OS was installed on.  This was just a cluster for a proof
of concept, and I didn't have any spare disks, so I didn't look any further.

It ended up creating a file 'block' in /var/lib/ceph/osd/ceph-${osd}/,
instead of using a separate partition like it should.

From 'ceph-disk list':

Correct:

/dev/sda :
 /dev/sda1 ceph data, active, cluster ceph, osd.0, block /dev/sda2
 /dev/sda2 ceph block, for /dev/sda1

Shared OS disk:

/dev/sdc :
 /dev/sdc1 other, linux_raid_member
 /dev/sdc2 other, linux_raid_member
 /dev/sdc3 other, linux_raid_member
 /dev/sdc4 other, xfs, mounted on /var/lib/ceph/osd/ceph-2


# ls -lh /var/lib/ceph/osd/ceph-2/block
-rw-r--r-- 1 ceph ceph 932G Aug 18 11:19 /var/lib/ceph/osd/ceph-2/block


-- 
David Clarke
Systems Architect
Catalyst IT



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimise Setup with Bluestore

2017-08-17 Thread Mehmet
*resend... this Time to the list...*
Hey David Thank you for the response!

My use case is actually only rbd for kvm Images where mostly Running Lamp 
systems on Ubuntu or centos.
All Images (rbds) are created with "proxmox" where the ceph defaults are used 
(actually Jewel in the near Future luminous...)

What i want to know is Primary which constelation would be optimal for 
bluestore?

I.e.
Put db and RAW Device on HDD and the Wal on nvme in my case?
Should i replace a HDD with an ssd and use this for the dbs so thats finally 
HDD as RAW Device, db on ssd and Wal on nvme?
Or... use the HDD for RAW and the nvme for Wal and db?

Hope you (and others) understand what i mean :)

- Mehmet 

Am 16. August 2017 19:01:30 MESZ schrieb David Turner :
>Honestly there isn't enough information about your use case.  RBD usage
>with small IO vs ObjectStore with large files vs ObjectStore with small
>files vs any number of things.  The answer to your question might be
>that
>for your needs you should look at having a completely different
>hardware
>configuration than what you're running.  There is no correct way to
>configure your cluster based on what hardware you have.  What hardware
>you
>use and what configuration settings you use should be based on your
>needs
>and use case.
>
>On Wed, Aug 16, 2017 at 12:13 PM Mehmet  wrote:
>
>> :( no suggestions or recommendations on this?
>>
>> Am 14. August 2017 16:50:15 MESZ schrieb Mehmet :
>>
>>> Hi friends,
>>>
>>> my actual hardware setup per OSD-node is as follow:
>>>
>>> # 3 OSD-Nodes with
>>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
>>> Hyper-Threading
>>> - 64GB RAM
>>> - 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
>>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device
>for
>>> 12 Disks (20G Journal size)
>>> - 1x Samsung SSD 840/850 Pro only for the OS
>>>
>>> # and 1x OSD Node with
>>> - 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
>>> - 64GB RAM
>>> - 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
>>> - 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
>>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device
>for
>>> 24 Disks (15G Journal size)
>>> - 1x Samsung SSD 850 Pro only for the OS
>>>
>>> As you can see, i am using 1 (one) NVMe (Intel DC P3700 NVMe – 400G)
>>> Device for whole Spinning Disks (partitioned) on each OSD-node.
>>>
>>> When „Luminous“ is available (as next LTE) i plan to switch vom
>>> „filestore“ to „bluestore“ 😊
>>>
>>> As far as i have read bluestore consists of
>>> - „the device“
>>> - „block-DB“: device that store RocksDB metadata
>>> - „block-WAL“: device that stores RocksDB „write-ahead journal“
>>>
>>> Which setup would be usefull in my case?
>>> I Would setup the disks via "ceph-deploy".
>>>
>>> Thanks in advance for your suggestions!
>>> - Mehmet
>>> --
>>>
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster with SSDs

2017-08-17 Thread Mehmet
Which ssds are used? Are they in production? If so how is your PG Count?

Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy 
:
>Hello,
>I am using the Ceph cluster with HDDs and SSDs. Created separate pool
>for each.
>Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
>and SSD's OSD show around 280MB/s.
>
>Ideally, what I expected was - SSD's OSDs should be at-least 40% high
>as compared with HDD's OSD bench.
>
>Did I miss anything here? Any hint is appreciated.
>
>Thanks
>Swami
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Gregory Farnum
On Thu, Aug 17, 2017 at 1:02 PM, Andreas Calminder
 wrote:
> Hi!
> Thanks for getting back to me!
>
> Clients access the cluster through rgw (s3), we had some big buckets
> containing a lot of small files. Prior to this happening I removed a
> semi-stale bucket with a rather large index, 2.5 million objects, all but 30
> objects didn't actually exist which left the normal radosgw-admin bucket rm
> command to fail so I had to remove the bucket instances and bucket metadata
> by hand, leaving the remaining 30 objects floating around in the cluster.
>
> I don't have access to the logs at the moment, but I see the deep-scrub
> starting in the log for osd.34, after a while it starts with
>
> 1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread $THREADID' had timed out after 15
>
> the $THREADID seemingly is the same one as the deep scrub, after a while it
> will suicide and a lot of operations will happen until the deep scrub tries
> again for the same pg and the above repeats.
>
> The osd disk (we have 1 osd per disk) is rather large and pretty slow so it
> might be that, but I think the behaviour should've been observed elsewhere
> in the cluster as well since all osd disks are of the same type and size.
>
> One thought I had is to just kill the disk and re-add it since the data is
> supposed to be replicated to 3 nodes in the cluster, but I kind of want to
> find out what has happened and have it fixed.

Ah. Some people have also found that compacting the leveldb store
improves the situation a great deal. In most versions you can do this
by setting "leveldb_compact_on_mount = true" in the OSD's config file
and then restarting the daemon. You may also have admin socket
commands available to trigger it.

I'd try out those and then turn it on again with the high suicide
timeout and see if things improve.
-Greg


>
> /andreas
>
>
> On 17 Aug 2017 20:21, "Gregory Farnum"  wrote:
>
> On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder
>  wrote:
>>
>> Thanks,
>> I've modified the timeout successfully, unfortunately it wasn't enough
>> for the deep-scrub to finish, so I increased the
>> osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub
>> command will however get killed before this timeout is reached, I
>> figured it was osd_command_thread_suicide_timeout and adjusted it
>> accordingly and restarted the osd, but it still got killed
>> approximately 900s after starting.
>>
>> The log spits out:
>> 2017-08-17 09:01:35.723865 7f062e696700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>> 2017-08-17 09:01:40.723945 7f062e696700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>> 2017-08-17 09:01:45.012105 7f05cceee700  1 heartbeat_map reset_timeout
>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>
>> I'm thinking having an osd in a cluster locked for ~900s maybe isn't
>> the best thing, is there any way of doing this deep-scrub operation
>> "offline" or in some way that wont affect or get affected by the rest
>> of the cluster?
>
>
> Deep scrub actually timing out a thread is pretty weird anyway — I think it
> requires some combination of abnormally large objects/omap indexes and buggy
> releases.
>
> Is there any more information in the log about the thread that's timing out?
> What's leading you to believe it's the deep scrub? What kind of data is in
> the pool?
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread Gregory Farnum
Yeah, Alfredo said he would look into it. Presumably something
happened when he was fixing other broken pieces of things in the doc
links.

On Thu, Aug 17, 2017 at 12:44 PM, Jason Dillaman  wrote:
> It's up for me as well -- but for me the master branch docs are
> missing the table of contents on the nav pane on the left.
>
> On Thu, Aug 17, 2017 at 3:32 PM, David Turner  wrote:
>> I've been using docs.ceph.com all day and just double checked that it's up.
>> Make sure that your DNS, router, firewall, etc isn't blocking it.
>>
>> On Thu, Aug 17, 2017 at 3:28 PM  wrote:
>>>
>>> ... or at least since yesterday!
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimise Setup with Bluestore

2017-08-17 Thread Mehmet


Hey Mark :)

Am 16. August 2017 21:43:34 MESZ schrieb Mark Nelson :
>Hi Mehmet!
>
>On 08/16/2017 11:12 AM, Mehmet wrote:
>> :( no suggestions or recommendations on this?
>>
>> Am 14. August 2017 16:50:15 MESZ schrieb Mehmet :
>>
>> Hi friends,
>>
>> my actual hardware setup per OSD-node is as follow:
>>
>> # 3 OSD-Nodes with
>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
>> Hyper-Threading
>> - 64GB RAM
>> - 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling
>Device for
>> 12 Disks (20G Journal size)
>> - 1x Samsung SSD 840/850 Pro only for the OS
>>
>> # and 1x OSD Node with
>> - 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20
>Threads)
>> - 64GB RAM
>> - 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
>> - 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling
>Device for
>> 24 Disks (15G Journal size)
>> - 1x Samsung SSD 850 Pro only for the OS
>
>The single P3700 for 23 spinning disks is pushing it.  They have high 
>write durability but based on the model that is the 400GB version?

Yes. It is a 400GB Version. 

>If you are doing a lot of writes you might wear it out pretty fast and

Actually the intel isdct tool says this One should alive 40 years ^^ 
(EnduranceAnalyzer). But this should be proofed ;)

>it's 
>a single point of failure for the entire node (if it dies you have a
>lot 
>of data dying with it).  General unbalanced setups like this are 
>trickier to get performing well as well.
>

Yes. That is true. That could be happen to All of my 4 Nodes. Perhaps the chef 
should see what will happen before i can get Money to optimise the Nodes...

>>
>> As you can see, i am using 1 (one) NVMe (Intel DC P3700 NVMe –
>400G)
>> Device for whole Spinning Disks (partitioned) on each OSD-node.
>>
>> When „Luminous“ is available (as next LTE) i plan to switch vom
>> „filestore“ to „bluestore“ 😊
>>
>> As far as i have read bluestore consists of
>> - „the device“
>> - „block-DB“: device that store RocksDB metadata
>> - „block-WAL“: device that stores RocksDB „write-ahead journal“
>>
>> Which setup would be usefull in my case?
>> I Would setup the disks via "ceph-deploy".
>
>So typically we recommend something like a 1-2GB WAL partition on the 
>NVMe drive per OSD and use the remaining space for DB.  If you run out 
>of DB space, bluestore will start using the spinning disks to store KV 
>data instead.  I suspect this will still be the advice you will want to
>
>follow, though at some point having so many WAL and DB partitions on
>the 
>NVMe may start becoming a bottleneck.  Something like 63K sequential 
>writes to heavily fragmented objects might be worth testing, but in
>most 
>cases I suspect DB and WAL on NVMe is still going to be faster.
>

Thanks thats what i expected. Another idea would be to replace a Spinning Disk 
of the Nodes with an intel ssd for wal/db... Perhaps for the dbs?

- Mehmet

>>
>> Thanks in advance for your suggestions!
>> - Mehmet
>>
>
>>
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to distribute data

2017-08-17 Thread David Turner
The rules in the CRUSH map will always be followed.  It is not possible for
Ceph to go against that and put data into a root that shouldn't have it.

The problem with a cache tier is that Ceph is going to need to promote and
evict stuff all the time (not free).  A lot of people that want to use SSD
cache tiering for RBDs end up with slower performance because of this.
Christian Balzer is the expert on Cache Tiers for RBD usage.  His primary
stance is that it's most likely a bad idea, but there are definite cases
where it's perfect.

On Thu, Aug 17, 2017 at 4:20 PM Oscar Segarra 
wrote:

> Hi David,
>
> Thanks a lot for your quick answer!
>
> *If I'm understanding you correctly, you want to have 2 different roots
> that pools can be made using.  The first being entirely SSD storage.  The
> second being HDD Storage with an SSD cache tier on top of it.  *
> --> Yes, this is what I mean.
>
>
> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
> --> I'm not an expert in CRUSH rules... Whit this configuration, it is
> guaranteed that objects stored in ssd pool do not "go" to the hdd disks?
>
> *The above guide explains how to set up the HDD root and the SSD root.
> After that all you do is create a pool on the HDD root for RBDs, a pool on
> the SSD root for a cache tier to use with the HDD pool, and then a a pool
> on the SSD root for RBDs.  There aren't actually a lot of use cases out
> there where using an SSD cache tier on top of an HDD RBD pool is what you
> really want.  I would recommend testing this thoroughly and comparing your
> performance to just a standard HDD pool for RBDs without a cache tier.*
> --> I'm working on a VDI solution where there are BASE IMAGES (raw) and
> qcow2 linked clones... where I expect not all VDIs to be powered on at the
> same time and perform a configuration to avoid problems related to login
> storm. (1000 hosts)
> --> Do you think it is not a good idea? do you know what does usually
> people configure for this kind of scenarios?
>
> Thanks a lot.
>
>
> 2017-08-17 21:31 GMT+02:00 David Turner :
>
>> If I'm understanding you correctly, you want to have 2 different roots
>> that pools can be made using.  The first being entirely SSD storage.  The
>> second being HDD Storage with an SSD cache tier on top of it.
>>
>>
>> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
>>
>> The above guide explains how to set up the HDD root and the SSD root.
>> After that all you do is create a pool on the HDD root for RBDs, a pool on
>> the SSD root for a cache tier to use with the HDD pool, and then a a pool
>> on the SSD root for RBDs.  There aren't actually a lot of use cases out
>> there where using an SSD cache tier on top of an HDD RBD pool is what you
>> really want.  I would recommend testing this thoroughly and comparing your
>> performance to just a standard HDD pool for RBDs without a cache tier.
>>
>> On Thu, Aug 17, 2017 at 3:18 PM Oscar Segarra 
>> wrote:
>>
>>> Hi,
>>>
>>> Sorry guys, during theese days I'm asking a lot about how to distribute
>>> my data.
>>>
>>> I have two kinds of VM:
>>>
>>> 1.- Management VMs (linux) --> Full SSD dedicated disks
>>> 2.- Windows VM --> SSD + HHD (with tiering).
>>>
>>> I'm working on installing two clusters on the same host but I'm
>>> encountering lots of problems as named clusters look not be fully supported.
>>>
>>> In the same cluster, Is there any way to distribute my VMs as I like?
>>>
>>> Thanks a lot!
>>>
>>> Ó.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Modify user metadata in RGW multi-tenant setup

2017-08-17 Thread Sander van Schie
Hello,

I'm trying to modify the metadata of a RGW user in a multi-tenant
setup. For a regular user with the default implicit tenant it works
fine using the following to get metadata:

# radosgw-admin metadata get user:

I however can't figure out how to do the same for a user with an
explicit tenant. How would I go about doing this, if this is possible
at all?

What I'm trying to do is change the default_placement and
placement_tag options. If there's other ways to do this that'd be
great too.

Running the latest version of Jewel.

Thanks!

- Sander
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread Jason Dillaman
It's up for me as well -- but for me the master branch docs are
missing the table of contents on the nav pane on the left.

On Thu, Aug 17, 2017 at 3:32 PM, David Turner  wrote:
> I've been using docs.ceph.com all day and just double checked that it's up.
> Make sure that your DNS, router, firewall, etc isn't blocking it.
>
> On Thu, Aug 17, 2017 at 3:28 PM  wrote:
>>
>> ... or at least since yesterday!
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Andreas Calminder
Hi!
Thanks for getting back to me!

Clients access the cluster through rgw (s3), we had some big buckets
containing a lot of small files. Prior to this happening I removed a
semi-stale bucket with a rather large index, 2.5 million objects, all but
30 objects didn't actually exist which left the normal radosgw-admin bucket
rm command to fail so I had to remove the bucket instances and bucket
metadata by hand, leaving the remaining 30 objects floating around in the
cluster.

I don't have access to the logs at the moment, but I see the deep-scrub
starting in the log for osd.34, after a while it starts with

1 heartbeat_map is_healthy
'OSD::osd_op_tp thread $THREADID' had timed out after 15

the $THREADID seemingly is the same one as the deep scrub, after a while it
will suicide and a lot of operations will happen until the deep scrub tries
again for the same pg and the above repeats.

The osd disk (we have 1 osd per disk) is rather large and pretty slow so it
might be that, but I think the behaviour should've been observed elsewhere
in the cluster as well since all osd disks are of the same type and size.

One thought I had is to just kill the disk and re-add it since the data is
supposed to be replicated to 3 nodes in the cluster, but I kind of want to
find out what has happened and have it fixed.

/andreas


On 17 Aug 2017 20:21, "Gregory Farnum"  wrote:

On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder <
andreas.calmin...@klarna.com> wrote:

> Thanks,
> I've modified the timeout successfully, unfortunately it wasn't enough
> for the deep-scrub to finish, so I increased the
> osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub
> command will however get killed before this timeout is reached, I
> figured it was osd_command_thread_suicide_timeout and adjusted it
> accordingly and restarted the osd, but it still got killed
> approximately 900s after starting.
>
> The log spits out:
> 2017-08-17 09:01:35.723865 7f062e696700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
> 2017-08-17 09:01:40.723945 7f062e696700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
> 2017-08-17 09:01:45.012105 7f05cceee700  1 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>
> I'm thinking having an osd in a cluster locked for ~900s maybe isn't
> the best thing, is there any way of doing this deep-scrub operation
> "offline" or in some way that wont affect or get affected by the rest
> of the cluster?
>

Deep scrub actually timing out a thread is pretty weird anyway — I think it
requires some combination of abnormally large objects/omap indexes and
buggy releases.

Is there any more information in the log about the thread that's timing
out? What's leading you to believe it's the deep scrub? What kind of data
is in the pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread David Turner
I've been using docs.ceph.com all day and just double checked that it's
up.  Make sure that your DNS, router, firewall, etc isn't blocking it.

On Thu, Aug 17, 2017 at 3:28 PM  wrote:

> ... or at least since yesterday!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread David Turner
If you are on a different version of ceph than that, replace that in the
url.  jewel, hammer, etc.

On Thu, Aug 17, 2017 at 3:37 PM Jason Dillaman  wrote:

> I'm not sure what's going on w/ the master branch docs today, but in
> the meantime you can use the luminous docs [1] until this is sorted
> out since they should be nearly identical.
>
> [1] http://docs.ceph.com/docs/luminous/
>
> On Thu, Aug 17, 2017 at 2:52 PM,   wrote:
> > ... or at least since yesterday!
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread Jason Dillaman
I'm not sure what's going on w/ the master branch docs today, but in
the meantime you can use the luminous docs [1] until this is sorted
out since they should be nearly identical.

[1] http://docs.ceph.com/docs/luminous/

On Thu, Aug 17, 2017 at 2:52 PM,   wrote:
> ... or at least since yesterday!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to distribute data

2017-08-17 Thread David Turner
If I'm understanding you correctly, you want to have 2 different roots that
pools can be made using.  The first being entirely SSD storage.  The second
being HDD Storage with an SSD cache tier on top of it.

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

The above guide explains how to set up the HDD root and the SSD root.
After that all you do is create a pool on the HDD root for RBDs, a pool on
the SSD root for a cache tier to use with the HDD pool, and then a a pool
on the SSD root for RBDs.  There aren't actually a lot of use cases out
there where using an SSD cache tier on top of an HDD RBD pool is what you
really want.  I would recommend testing this thoroughly and comparing your
performance to just a standard HDD pool for RBDs without a cache tier.

On Thu, Aug 17, 2017 at 3:18 PM Oscar Segarra 
wrote:

> Hi,
>
> Sorry guys, during theese days I'm asking a lot about how to distribute my
> data.
>
> I have two kinds of VM:
>
> 1.- Management VMs (linux) --> Full SSD dedicated disks
> 2.- Windows VM --> SSD + HHD (with tiering).
>
> I'm working on installing two clusters on the same host but I'm
> encountering lots of problems as named clusters look not be fully supported.
>
> In the same cluster, Is there any way to distribute my VMs as I like?
>
> Thanks a lot!
>
> Ó.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] docs.ceph.com broken since... days?!?

2017-08-17 Thread ceph . novice
... or at least since yesterday!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to distribute data

2017-08-17 Thread Oscar Segarra
Hi,

Sorry guys, during theese days I'm asking a lot about how to distribute my
data.

I have two kinds of VM:

1.- Management VMs (linux) --> Full SSD dedicated disks
2.- Windows VM --> SSD + HHD (with tiering).

I'm working on installing two clusters on the same host but I'm
encountering lots of problems as named clusters look not be fully supported.

In the same cluster, Is there any way to distribute my VMs as I like?

Thanks a lot!

Ó.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD only keyring for client

2017-08-17 Thread Jason Dillaman
You should be able to set a CEPH_ARGS='--id rbd' environment variable.

On Thu, Aug 17, 2017 at 2:25 PM, David Turner  wrote:
> I already tested putting name, user, and id in the global section with
> client.rbd and rbd as the value (one at a time, testing in between). None of
> them had any affect. This is on a 10.2.7 cluster.
>
>
> On Thu, Aug 17, 2017, 2:06 PM Gregory Farnum  wrote:
>>
>> I think you just specify "name = client.rbd" as a config in the global
>> section of the machine's ceph.conf and it will use that automatically.
>> -Greg
>>
>> On Thu, Aug 17, 2017 at 10:34 AM, David Turner 
>> wrote:
>> > I created a user/keyring to be able to access RBDs, but I'm trying to
>> > find a
>> > way to set the config file on the client machine such that I don't need
>> > to
>> > use -n client.rbd in my commands when I'm on that host.  Currently I'm
>> > testing rbd-fuse vs rbd-nbd for our use case, but I'm having a hard time
>> > with the authentication because of the cephx user name.
>> >
>> > Does anyone have some experience they'd like to shine on this situation?
>> > Thank you.
>> >
>> > David Turner
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Delete PG because ceph pg force_create_pg doesnt help

2017-08-17 Thread Hauke Homburg
Am 16.08.2017 um 13:40 schrieb Hauke Homburg:
> Hello,
>
>
> How can i delete a pg completly from a ceph server? I think i have all
> Data manually from the Server deleted. But i a ceph pg  query
> shows the pg already? A ceph pg force_create_pg doesn't create the pg.
>
> The ceph says he has created the pg, and a pg is stuck more than 300 sec.
>
> Thanks for your help.
>

Hello,


Sorry fpr my Mistake. I didnt remote the complete PG from HDD i only
removes 1 Object.

Today i removed from the primary Node and from 1 secondary the the
complete PG with rm -Rf /var/lib/ceph/osd//.
Before i stopped the OSD and cleared the journal. The Cluster restored
the primary PG but the Inconsistent Marker left. After that i started
again ceph pg repair .

Has anyone an Idea what i can do to put the PG into consistent state?

Thanks for Help


Hauke

-- 
www.w3-creative.de

www.westchat.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD only keyring for client

2017-08-17 Thread David Turner
I already tested putting name, user, and id in the global section with
client.rbd and rbd as the value (one at a time, testing in between). None
of them had any affect. This is on a 10.2.7 cluster.

On Thu, Aug 17, 2017, 2:06 PM Gregory Farnum  wrote:

> I think you just specify "name = client.rbd" as a config in the global
> section of the machine's ceph.conf and it will use that automatically.
> -Greg
>
> On Thu, Aug 17, 2017 at 10:34 AM, David Turner 
> wrote:
> > I created a user/keyring to be able to access RBDs, but I'm trying to
> find a
> > way to set the config file on the client machine such that I don't need
> to
> > use -n client.rbd in my commands when I'm on that host.  Currently I'm
> > testing rbd-fuse vs rbd-nbd for our use case, but I'm having a hard time
> > with the authentication because of the cephx user name.
> >
> > Does anyone have some experience they'd like to shine on this situation?
> > Thank you.
> >
> > David Turner
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Gregory Farnum
On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder <
andreas.calmin...@klarna.com> wrote:

> Thanks,
> I've modified the timeout successfully, unfortunately it wasn't enough
> for the deep-scrub to finish, so I increased the
> osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub
> command will however get killed before this timeout is reached, I
> figured it was osd_command_thread_suicide_timeout and adjusted it
> accordingly and restarted the osd, but it still got killed
> approximately 900s after starting.
>
> The log spits out:
> 2017-08-17 09:01:35.723865 7f062e696700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
> 2017-08-17 09:01:40.723945 7f062e696700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
> 2017-08-17 09:01:45.012105 7f05cceee700  1 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>
> I'm thinking having an osd in a cluster locked for ~900s maybe isn't
> the best thing, is there any way of doing this deep-scrub operation
> "offline" or in some way that wont affect or get affected by the rest
> of the cluster?
>

Deep scrub actually timing out a thread is pretty weird anyway — I think it
requires some combination of abnormally large objects/omap indexes and
buggy releases.

Is there any more information in the log about the thread that's timing
out? What's leading you to believe it's the deep scrub? What kind of data
is in the pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Can't get fullpartition space

2017-08-17 Thread Maiko de Andrade
Hi,

I want install ceph in 3 machines. CEPH, CEPH-OSD-1 CEPH-OSD-2, each
machines have 2 disk in RAID 0 with total 930GiB

CEPH is mon and osd too..
CEPH-OSD-1 osd
CEPH-OSD-2 osd

I install and reinstall ceph many times. All installation the CEPH don't
get full partion space. Take only 1GB. How I change?

In first machine a have this:


CEPH$ df -Ph
Filesystem  Size  Used Avail Use% Mounted on
udev3.9G 0  3.9G   0% /dev
tmpfs   796M  8.8M  787M   2% /run
/dev/sda1   182G  2.2G  171G   2% /
tmpfs   3.9G 0  3.9G   0% /dev/shm
tmpfs   5.0M 0  5.0M   0% /run/lock
tmpfs   3.9G 0  3.9G   0% /sys/fs/cgroup
tmpfs   796M 0  796M   0% /run/user/1000
/dev/sda3   738G   33M  738G   1% /var/lib/ceph/osd/ceph-0

CEPH$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
-1   0.00980 root default
-3   0.00980 host ceph
 0   hdd 0.00980 osd.0 up  1.0 1.0

CEPH$ ceph -s
  cluster:
id: 6f3f162b-17ab-49b7-9e4b-904539cfce10
health: HEALTH_OK

  services:
mon: 1 daemons, quorum ceph
mgr: ceph(active)
osd: 1 osds: 1 up, 1 in

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   1053 MB used, 9186 MB / 10240 MB avail
pgs:


I try use:
CEPH$ ceph osd crush reweight osd.0 .72
reweighted item id 0 name 'osd.0' to 0.72 in crush map

$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
-1   0.71999 root default
-3   0.71999 host ceph
 0   hdd 0.71999 osd.0 up  1.0 1.0


$ ceph -s
  cluster:
id: 6f3f162b-17ab-49b7-9e4b-904539cfce10
health: HEALTH_OK

  services:
mon: 1 daemons, quorum ceph
mgr: ceph(active)
osd: 1 osds: 1 up, 1 in

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   1054 MB used, 9185 MB / 10240 MB avail
pgs:



[]´s
Maiko de Andrade
MAX Brasil
Desenvolvedor de Sistemas
+55 51 91251756 <(51)%209125-1756>
http://about.me/maiko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD only keyring for client

2017-08-17 Thread Gregory Farnum
I think you just specify "name = client.rbd" as a config in the global
section of the machine's ceph.conf and it will use that automatically.
-Greg

On Thu, Aug 17, 2017 at 10:34 AM, David Turner  wrote:
> I created a user/keyring to be able to access RBDs, but I'm trying to find a
> way to set the config file on the client machine such that I don't need to
> use -n client.rbd in my commands when I'm on that host.  Currently I'm
> testing rbd-fuse vs rbd-nbd for our use case, but I'm having a hard time
> with the authentication because of the cephx user name.
>
> Does anyone have some experience they'd like to shine on this situation?
> Thank you.
>
> David Turner
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph cluster with SSDs

2017-08-17 Thread M Ranga Swami Reddy
Hello,
I am using the Ceph cluster with HDDs and SSDs. Created separate pool for each.
Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
and SSD's OSD show around 280MB/s.

Ideally, what I expected was - SSD's OSDs should be at-least 40% high
as compared with HDD's OSD bench.

Did I miss anything here? Any hint is appreciated.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD only keyring for client

2017-08-17 Thread David Turner
I created a user/keyring to be able to access RBDs, but I'm trying to find
a way to set the config file on the client machine such that I don't need
to use -n client.rbd in my commands when I'm on that host.  Currently I'm
testing rbd-fuse vs rbd-nbd for our use case, but I'm having a hard time
with the authentication because of the cephx user name.

Does anyone have some experience they'd like to shine on this situation?
Thank you.

David Turner
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com