Re: [ceph-users] "rbd diff" disparity vs mounted usage

2016-04-27 Thread Tyler Wilson
Hello Jason,

Yes, I believe that is my question. Is there any way I can either reclaim
the space for this disk?

On Wed, Apr 27, 2016 at 1:25 PM, Jason Dillaman  wrote:

> The image size (50G) minus the fstrim size (1.7G) approximately equals
> the actual usage (48.19G).  Therefore, I guess the question is why
> doesn't fstrim think it can discard more space?
>
> On a semi-related note, we should probably improve the rbd copy
> sparsify logic.  Right now it requires the full stripe period (or
> object size if striping is disabled) to be zeroes before it skips the
> write operation to the destination.
>
> On Wed, Apr 27, 2016 at 2:26 PM, Tyler Wilson 
> wrote:
> > Hello Jason,
> >
> > Thanks for the quick reply, this was copied from an VM instance snapshot
> to
> > my backup pool (rbd snap create, rbd cp (to backup pool), rbd snap rm).
> I've
> > tried piping through grep per your recommendation and it still reports
> the
> > same usage
> >
> > $ rbd diff backup/cd4e5d37-3023-4640-be5a-5577d3f9307e | grep data | awk
> '{
> > SUM += $2 } END { print SUM/1024/1024 " MB" }'
> > 49345.4 MB
> >
> > Thanks for the help.
> >
> > On Wed, Apr 27, 2016 at 12:22 PM, Jason Dillaman 
> > wrote:
> >>
> >> On Wed, Apr 27, 2016 at 2:07 PM, Tyler Wilson 
> >> wrote:
> >> > $ rbd diff backup/cd4e5d37-3023-4640-be5a-5577d3f9307e | awk '{ SUM +=
> >> > $2 }
> >> > END { print SUM/1024/1024 " MB" }'
> >> > 49345.4 MB
> >>
> >> Is this a cloned image?  That awk trick doesn't account for discarded
> >> regions (i.e. when column three says "zero" instead of "data"). Does
> >> the number change when you pipe the "rbd diff" results through "grep
> >> data" before piping to awk?
> >>
> >> > Could this be affected by replica counts some how? It seems to be
> twice
> >> > as
> >> > large as what is reported in the filesystem which matches my replica
> >> > count.
> >>
> >> No, the "rbd diff" output is only reporting image data and zeroed
> >> extents -- so the replication factor is not included.
> >>
> >> --
> >> Jason
> >
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd diff" disparity vs mounted usage

2016-04-27 Thread Tyler Wilson
Hello Jason,

Thanks for the quick reply, this was copied from an VM instance snapshot to
my backup pool (rbd snap create, rbd cp (to backup pool), rbd snap rm).
I've tried piping through grep per your recommendation and it still reports
the same usage

$ rbd diff backup/cd4e5d37-3023-4640-be5a-5577d3f9307e | grep data | awk '{
SUM += $2 } END { print SUM/1024/1024 " MB" }'
49345.4 MB

Thanks for the help.

On Wed, Apr 27, 2016 at 12:22 PM, Jason Dillaman 
wrote:

> On Wed, Apr 27, 2016 at 2:07 PM, Tyler Wilson 
> wrote:
> > $ rbd diff backup/cd4e5d37-3023-4640-be5a-5577d3f9307e | awk '{ SUM +=
> $2 }
> > END { print SUM/1024/1024 " MB" }'
> > 49345.4 MB
>
> Is this a cloned image?  That awk trick doesn't account for discarded
> regions (i.e. when column three says "zero" instead of "data"). Does
> the number change when you pipe the "rbd diff" results through "grep
> data" before piping to awk?
>
> > Could this be affected by replica counts some how? It seems to be twice
> as
> > large as what is reported in the filesystem which matches my replica
> count.
>
> No, the "rbd diff" output is only reporting image data and zeroed
> extents -- so the replication factor is not included.
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "rbd diff" disparity vs mounted usage

2016-04-27 Thread Tyler Wilson
Hello All,

I am currently trying to get an accurate count of bytes used for an rbd
image. I've tried trimming the filesystem which relieves about 1.7gb
however there is still a huge disparity of size reported in the filesystem
vs what 'rbd diff' shows;

$ rbd map backup/cd4e5d37-3023-4640-be5a-5577d3f9307e
/dev/rbd3

$ mount -o discard /dev/rbd3p1 /tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e/

$ df -h /tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e/
Filesystem  Size  Used Avail Use% Mounted on
/dev/rbd3p1  50G   24G   24G  50%
/tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e

$ df -i /tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e/
Filesystem  Inodes  IUsed   IFree IUse% Mounted on
/dev/rbd3p13276800 582930 2693870   18%
/tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e

$ fstrim -v /tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e/
/tmp/cd4e5d37-3023-4640-be5a-5577d3f9307e/: 1.7 GiB (1766875136 bytes)
trimmed

$ rbd diff backup/cd4e5d37-3023-4640-be5a-5577d3f9307e | awk '{ SUM += $2 }
END { print SUM/1024/1024 " MB" }'
49345.4 MB

Could this be affected by replica counts some how? It seems to be twice as
large as what is reported in the filesystem which matches my replica count.

Thanks for any and all assistance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Remove incomplete PG

2016-04-20 Thread Tyler Wilson
Hello All,

Are there any documented steps to remove a placement group that is stuck
inactive? I had a situation where we had two nodes go offline and tried
rescuing with https://ceph.com/community/incomplete-pgs-oh-my/ however the
PG remained inactive after importing and starting, now I am just trying to
get the cluster able to read/write again with the remaining placement
groups.



Thanks for any and all assistance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High CPU/Delay when Removing Layered Child RBD Image

2014-12-18 Thread Tyler Wilson
Okay, this is rather unrelated to Ceph but I might as well mention how this
is fixed. When using the Juno-Release OpenStack pages the
'rbd_store_chunk_size = 8' now sets it to 8192 bytes rather than 8192 kB
(8MB) causing quite a bit more objects to be stored and deleted. Setting
this to 8192 got me the expected object size of 8MB.


On Thu, Dec 18, 2014 at 6:22 PM, Tyler Wilson  wrote:
>
> Hey All,
>
> On a new Cent7 deployment with firefly I'm noticing a strange behavior
> when deleting RBD child disks. It appears upon deletion cpu usage on each
> OSD process raises to about 75% for 30+ seconds. On my previous deployments
> with CentOS 6.x and Ubuntu 12/14 this was never a problem.
>
> Each RBD Disk is 4GB created with 'rbd clone
> images/136dd921-f6a2-432f-b4d6-e9902f71baa6@snap compute/test'
>
> ## Ubuntu12 3.11.0-18-generic with Ceph 0.80.7
> root@node-1:~# date; rbd rm compute/test123; date
> Fri Dec 19 01:09:31 UTC 2014
> Removing image: 100% complete...done.
> Fri Dec 19 01:09:31 UTC 2014
>
> ## Cent7 3.18.1-1.el7.elrepo.x86_64 with Ceph 0.80.7
> [root@hvm003 ~]# date; rbd rm compute/test; date
> Fri Dec 19 01:08:32 UTC 2014
> Removing image: 100% complete...done.
> Fri Dec 19 01:09:00 UTC 2014
>
> root@cpl001 ~]# ceph -s
> cluster d033718a-2cb9-409e-b968-34370bd67bd0
>  health HEALTH_OK
>  monmap e1: 3 mons at {cpl001=
> 10.0.0.1:6789/0,mng001=10.0.0.3:6789/0,net001=10.0.0.2:6789/0}, election
> epoch 10, quorum 0,1,2 cpl001,net001,mng001
>  osdmap e84: 9 osds: 9 up, 9 in
>   pgmap v618: 1792 pgs, 12 pools, 4148 MB data, 518 kobjects
> 15106 MB used, 4257 GB / 4272 GB avail
> 1792 active+clean
>
>
> Any assistance would be appreciated.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High CPU/Delay when Removing Layered Child RBD Image

2014-12-18 Thread Tyler Wilson
Hey All,

On a new Cent7 deployment with firefly I'm noticing a strange behavior when
deleting RBD child disks. It appears upon deletion cpu usage on each OSD
process raises to about 75% for 30+ seconds. On my previous deployments
with CentOS 6.x and Ubuntu 12/14 this was never a problem.

Each RBD Disk is 4GB created with 'rbd clone
images/136dd921-f6a2-432f-b4d6-e9902f71baa6@snap compute/test'

## Ubuntu12 3.11.0-18-generic with Ceph 0.80.7
root@node-1:~# date; rbd rm compute/test123; date
Fri Dec 19 01:09:31 UTC 2014
Removing image: 100% complete...done.
Fri Dec 19 01:09:31 UTC 2014

## Cent7 3.18.1-1.el7.elrepo.x86_64 with Ceph 0.80.7
[root@hvm003 ~]# date; rbd rm compute/test; date
Fri Dec 19 01:08:32 UTC 2014
Removing image: 100% complete...done.
Fri Dec 19 01:09:00 UTC 2014

root@cpl001 ~]# ceph -s
cluster d033718a-2cb9-409e-b968-34370bd67bd0
 health HEALTH_OK
 monmap e1: 3 mons at {cpl001=
10.0.0.1:6789/0,mng001=10.0.0.3:6789/0,net001=10.0.0.2:6789/0}, election
epoch 10, quorum 0,1,2 cpl001,net001,mng001
 osdmap e84: 9 osds: 9 up, 9 in
  pgmap v618: 1792 pgs, 12 pools, 4148 MB data, 518 kobjects
15106 MB used, 4257 GB / 4272 GB avail
1792 active+clean


Any assistance would be appreciated.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Which OS for fresh install?

2014-07-23 Thread Tyler Wilson
Brian,

Please see http://ceph.com/docs/master/start/os-recommendations/ I would go
with anything with a 'C' rating matching the version of Ceph that you will
want to install.


On Wed, Jul 23, 2014 at 11:12 AM, Brian Lovett 
wrote:

> I'm evaluating ceph for our new private and public cloud environment. I
> have a
> "working" ceph cluster running on centos 6.5, but have had a heck of a time
> figuring out how to get rbd support to connect to cloudstack. Today I found
> out that the default kernel is too old, and while I could compile a new
> one in
> the 3.x series, I would rather look at switching to a newer OS that
> supports
> that natively. I see that centos 7 is out now, and has the newer kernel I
> need. Since we are just now starting the project, would it be better to go
> with centos 7, or are there known issues that should push me to another
> distro
> entirely?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor performance on all SSD cluster

2014-06-20 Thread Tyler Wilson
Greg,

Not a real fix for you but I too run a full-ssd cluster and am able to get
112MB/s with your command;

[root@plesk-test ~]# dd if=/dev/zero of=testfilasde bs=16k count=65535
oflag=direct
65535+0 records in
65535+0 records out
1073725440 bytes (1.1 GB) copied, 9.59092 s, 112 MB/s

This of course is in a VM, here is my ceph config

[global]
fsid = 
mon_initial_members = node-1 node-2 node-3
mon_host = 192.168.0.3 192.168.0.4 192.168.0.5
auth_supported = cephx
osd_journal_size = 2048
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 1024
public_network = 192.168.0.0/24
osd_mkfs_type = xfs
cluster_network = 192.168.1.0/24



On Fri, Jun 20, 2014 at 11:08 AM, Greg Poirier 
wrote:

> I recently created a 9-node Firefly cluster backed by all SSDs. We have
> had some pretty severe performance degradation when using O_DIRECT in our
> tests (as this is how MySQL will be interacting with RBD volumes, this
> makes the most sense for a preliminary test). Running the following test:
>
> dd if=/dev/zero of=testfilasde bs=16k count=65535 oflag=direct
>
> 779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s
>
> Shows us only about 1.5 MB/s throughput and 100 IOPS from the single dd
> thread. Running a second dd process does show increased throughput which is
> encouraging, but I am still concerned by the low throughput of a single
> thread w/ O_DIRECT.
>
> Two threads:
> 779829248 bytes (780 MB) copied, 604.333 s, 1.3 MB/s
> 126271488 bytes (126 MB) copied, 99.2069 s, 1.3 MB/s
>
> I am testing with an RBD volume mounted with the kernel module (I have
> also tested from within KVM, similar performance).
>
> If allow caching, we start to see reasonable numbers from a single dd
> process:
>
> dd if=/dev/zero of=testfilasde bs=16k count=65535
> 65535+0 records in
> 65535+0 records out
> 1073725440 bytes (1.1 GB) copied, 2.05356 s, 523 MB/s
>
> I can get >1GB/s from a single host with three threads.
>
> Rados bench produces similar results.
>
> Is there something I can do to increase the performance of O_DIRECT? I
> expect performance degradation, but so much?
>
> If I increase the blocksize to 4M, I'm able to get significantly higher
> throughput:
>
> 3833593856 bytes (3.8 GB) copied, 44.2964 s, 86.5 MB/s
>
> This still seems very low.
>
> I'm using the deadline scheduler in all places. With noop scheduler, I do
> not see a performance improvement.
>
> Suggestions?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Export-Diff With Children Snapshots

2014-06-06 Thread Tyler Wilson
Hey All,

Simple question, does 'rbd export-diff' work with children snapshot aka;

root:~# rbd children images/03cb46f7-64ab-4f47-bd41-e01ced45f0b4@snap
compute/2b65c0b9-51c3-4ab1-bc3c-6b734cc796b8_disk
compute/54f3b23c-facf-4a23-9eaa-9d221ddb7208_disk
compute/592065d1-264e-4f7d-8504-011c2ea3bce3_disk
compute/9ce6d6af-c4df-442c-b433-be2bb1cef9f6_disk
compute/f0714add-683a-4ba2-a6f3-ded7dbf193eb_disk

Could I export a diff from that image snapshot vs one of the compute disks?

Thanks for your help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PCI-E SSD Journal for SSD-OSD Disks

2014-05-15 Thread Tyler Wilson
Hey All,

Thanks for the quick responses! I have chosen the micron pci-e card due to
its benchmark results on
http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_review .
Per the vendor the card has
a 25PB life expectancy so I'm not terribly worried about it failing on me
too soon :)

Christian Balzer  writes:

>
> On Wed, 14 May 2014 19:28:17 -0500 Mark Nelson wrote:
>
> > On 05/14/2014 06:36 PM, Tyler Wilson wrote:
> > > Hey All,
> >
> > Hi!
> >
> > >
> > > I am setting up a new storage cluster that absolutely must have the
> > > best read/write sequential speed128k and the highest IOps at
4k
> > > read/write as possible.
> >
> > I assume random?
> >
> > >
> > > My current specs for each storage node are currently;
> > > CPU: 2x E5-2670V2
> > > Motherboard: SM X9DRD-EF
> > > OSD Disks: 20-30 Samsung 840 1TB
> > > OSD Journal(s): 1-2 Micron RealSSD P320h
> > > Network: 4x 10gb, Bridged
> I assume you mean 2x10Gb bonded for public and 2x10Gb for cluster network?
>
> The SSDs you specified would read at about 500MB/s, meaning that only 4 of
> them would already saturate your network uplink.
> For writes (assuming journal on SSDs, see below) you reach that point with
> just 8 SSDs.
>

the 4x 10gb will be ceph-storage only traffic with public and management
being on-board interfaces.
This is expandable to 80Gbps if needed.


> > > Memory: 32-96GB depending on need
> RAM is pretty cheap these days and a large pagecache on the storage nodes
> is always quite helpful.
>

Noted, I wasn't sure how Ceph used the linux memory cache or if it would
benefit us.

> > >
>
> How many of these nodes are you planning to deploy initially?
> As always and especially when going for performance, more and smaller
> nodes tend to be better, also less impact if one goes down.
> And in your case it is easier to balance storage and network bandwidth,
> see above.
>

2 storage nodes per location at start, these are serving OpenStack VM's so
whenever it gets utilized
enough to warrant more.

> > > Does anyone see any potential bottlenecks in the above specs? What
kind
> > > of improvements or configurations can we make on the OSD config side?
> > > We are looking to run this with 2 replication.
> >
> > Likely you'll run into latency due to context switching and lock
> > contention in the OSDs and maybe even some kernel slowness.
 Potentially
> > you could end up CPU limited too, even with E5-2670s given how fast all
> > of those SSDs are.  I'd suggest considering a chassis without an
> > expander backplane and using multiple controllers with the drives
> > directly attached.
> >
>
> Indeed, I'd be worried about that as well, same with the
> chassis/controller bit.
>


Thanks for the advise on the controller card, we will look into different
chassis options w/ the LSI
cards recommended on the InkTank docs.
Would running a different distribution affect this at all? Our target was
CentOS 6 however if a more
recent kernel would make a difference we could switch.

> > There's work going into improving things on the Ceph side but I don't
> > know how much of it has even hit our wip branches in github yet.  So
for
> > now ymmv, but there's a lot of work going on in this area as it's
> > something that lots of folks are interested in.
> >
> If you look at the current "Slow IOPS on RBD compared to journal and
> backing devices" thread and the Inktank document referenced in it
>
>
https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf

>
> you should probably assume no more than 800 random write IOPS and 4000
> random read IOPS per OSD (4KB block size).
> That later number I can also reproduce with my cluster.
>
> Now I expect those numbers to go up as Ceph is improved, but for the time
> being those limits might influence your choice of hardware.
>
> > I'd also suggest testing whether or not putting all of the journals on
> > the RealSSD cards actually helps you that much over just putting your
> > journals on the other SSDs.  The advantage here is that by putting
> > journals on the 2.5" SSDs, you don't lose a pile of OSDs if one of
those
> > PCIE cards fails.
> >
> More than seconded, I could only find READ values on the Micron site which
> makes me very suspicious, as the journal's main role is to be able to
> WRITE as fast as possible. Also all journals combined ought to be faster
> than your final storage.
> Lastly there was no endurance data on the Micron site either and

[ceph-users] PCI-E SSD Journal for SSD-OSD Disks

2014-05-14 Thread Tyler Wilson
Hey All,

I am setting up a new storage cluster that absolutely must have the best
read/write sequential speed @ 128k and the highest IOps at 4k read/write as
possible.

My current specs for each storage node are currently;
CPU: 2x E5-2670V2
Motherboard: SM X9DRD-EF
OSD Disks: 20-30 Samsung 840 1TB
OSD Journal(s): 1-2 Micron RealSSD P320h
Network: 4x 10gb, Bridged
Memory: 32-96GB depending on need

Does anyone see any potential bottlenecks in the above specs? What kind of
improvements or configurations can we make on the OSD config side? We are
looking to run this with 2 replication.

Thanks for your guys assistance with this.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com