[ceph-users] OSDs going down during radosbench benchmark

2016-09-12 Thread Deneau, Tom
Trying to understand why some OSDs (6 out of 21) went down in my cluster while 
running a CBT radosbench benchmark.  From the logs below, is this a networking 
problem between systems, or is it some kind of FileStore problem.

Looking at one crashed OSD log, I see the following crash error:

2016-09-09 21:30:29.757792 7efc6f5f1700 -1 FileStore: sync_entry timed out 
after 600 seconds.
 ceph version 10.2.1-13.el7cp (f15ca93643fee5f7d32e62c3e8a7016c1fc1e6f4)

just before that I see things like:

2016-09-09 21:18:07.391760 7efc755fd700 -1 osd.12 165 heartbeat_check: no reply 
from osd.6 since back 2016-09-09 21:17:47.261601 front 2016-09-09 
21:17:47.261601 (cutoff 2016-09-09 21:17:47.391758)

and also

2016-09-09 19:03:45.788327 7efc53905700  0 -- 10.0.1.2:6826/58682 >> 
10.0.1.1:6832/19713 pipe(0x7efc8bfbc800 sd=65 :52000 s=1 pgs=12 cs=1 l=0\
 c=0x7efc8bef5b00).connect got RESETSESSION

and many warnings for slow requests.


All the other osds that died seem to have died with:

2016-09-09 19:11:01.663262 7f2157e65700 -1 common/HeartbeatMap.cc: In function 
'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, 
time_t)' thread 7f2157e65700 time 2016-09-09 19:11:01.660671
common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")


-- Tom Deneau, AMD





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mounting a VM rbd image as a /dev/rbd0 device

2016-08-25 Thread Deneau, Tom
If I have an rbd image that is being used by a VM and I want to mount it
as a read-only /dev/rbd0 kernel device, is that possible?

When I try it I get:

mount: /dev/rbd0 is write-protected, mounting read-only
mount: wrong fs type, bad option, bad superblock on /dev/rbd0,
   missing codepage or helper program, or other error

The rbd image when viewed from the VM has a /dev/vda disk with 2 partitions

Number  Start   End SizeType File system  Flags
 1  1049kB  525MB   524MB   primary  xfs  boot
 2  525MB   12.9GB  12.4GB  primary   lvm

I wanted to view it thru the /dev/rbd0 mount because on one of my systems,
the VM is not booting from the image.

-- Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache command thru admin socket

2016-07-01 Thread Deneau, Tom
Thanks, Jason--

Turns out AppArmor was indeed enabled (I was not aware of that).
Disabled it and now I see the socket but it seems to only be there
temporarily while some client app is running.

The original reason I wanted to use this socket was that I am also
using an rbd images thru kvm and I wanted to be able to flush and 
invalidate the rbd cache as an experiment.

I would have thought the socket would get created and stay there as
long as kvm is active (since kvm is using librbd).  But even when I 
access the rbd disk from the VM, I don't see any socket created at all.

-- Tom


> -Original Message-
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Thursday, June 30, 2016 6:15 PM
> To: Deneau, Tom <tom.den...@amd.com>
> Cc: ceph-users <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] rbd cache command thru admin socket
> 
> Can you check the permissions on "/var/run/ceph/" and ensure that the user
> your client runs under has permissions to access the directory?
> If the permissions are OK, do you have SElinux or AppArmor enabled and
> enforcing?
> 
> On Thu, Jun 30, 2016 at 5:37 PM, Deneau, Tom <tom.den...@amd.com> wrote:
> > I was following the instructions in
> >
> > https://www.sebastien-han.fr/blog/2015/09/02/ceph-validate-that-the-rb
> > d-cache-is-active/
> >
> > because I wanted to look at some of the rbd cache state and possibly
> > flush and invalidate it
> >
> > My ceph.conf has
> > [client]
> > rbd default features = 1
> > rbd default format = 2
> > admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
> > log file = /var/log/ceph/
> >
> > I have a client only node (no osds) and on that node I ran fio with the
> librbd engine, which worked fine.  But I did not see anything in
> /var/run/ceph on that client node.  Is there something else I have to do?
> >
> > -- Tom Deneau, AMD
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd cache command thru admin socket

2016-06-30 Thread Deneau, Tom
I was following the instructions in
 
https://www.sebastien-han.fr/blog/2015/09/02/ceph-validate-that-the-rbd-cache-is-active/

because I wanted to look at some of the rbd cache state and possibly flush and 
invalidate it

My ceph.conf has
[client]
rbd default features = 1
rbd default format = 2
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/ceph/

I have a client only node (no osds) and on that node I ran fio with the librbd 
engine, which worked fine.  But I did not see anything in /var/run/ceph on that 
client node.  Is there something else I have to do?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw pool names

2016-06-10 Thread Deneau, Tom
Ah that makes sense.  The places where it was not adding the "default"
prefix were all pre-jewel.

-- Tom


> -Original Message-
> From: Yehuda Sadeh-Weinraub [mailto:yeh...@redhat.com]
> Sent: Friday, June 10, 2016 2:36 PM
> To: Deneau, Tom <tom.den...@amd.com>
> Cc: ceph-users <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] rgw pool names
> 
> On Fri, Jun 10, 2016 at 11:44 AM, Deneau, Tom <tom.den...@amd.com> wrote:
> > When I start radosgw, I create the pool .rgw.buckets manually to
> > control whether it is replicated or erasure coded and I let the other
> > pools be created automatically.
> >
> > However, I have noticed that sometimes the pools get created with the
> "default"
> > prefix, thus
> > rados lspools
> >   .rgw.root
> >   default.rgw.control
> >   default.rgw.data.root
> >   default.rgw.gc
> >   default.rgw.log
> >   .rgw.buckets  # the one I created
> >   default.rgw.users.uid
> >   default.rgw.users.keys
> >   default.rgw.meta
> >   default.rgw.buckets.index
> >   default.rgw.buckets.data  # the one actually being used
> >
> > What controls whether these pools have the "default" prefix or not?
> >
> 
> The prefix is the name of the zone ('default' by default). This was added
> for the jewel release, as well as dropping the requirement of having the
> pool names starts with a dot.
> 
> Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw pool names

2016-06-10 Thread Deneau, Tom
When I start radosgw, I create the pool .rgw.buckets manually to control
whether it is replicated or erasure coded and I let the other pools be
created automatically.

However, I have noticed that sometimes the pools get created with the "default"
prefix, thus
rados lspools
  .rgw.root
  default.rgw.control
  default.rgw.data.root
  default.rgw.gc
  default.rgw.log
  .rgw.buckets  # the one I created
  default.rgw.users.uid
  default.rgw.users.keys
  default.rgw.meta
  default.rgw.buckets.index
  default.rgw.buckets.data  # the one actually being used

What controls whether these pools have the "default" prefix or not?

-- Tom Deneau, AMD

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount -t ceph

2016-04-27 Thread Deneau, Tom
I was using SLES 12, SP1 which has 3.12.49 

It did have a /usr/sbin/mount.ceph command but using it gave 
  modprobe: FATAL: Module ceph not found.
  failed to load ceph kernel module (1)

-- Tom


> -Original Message-
> From: Gregory Farnum [mailto:gfar...@redhat.com]
> Sent: Wednesday, April 27, 2016 2:59 PM
> To: Deneau, Tom <tom.den...@amd.com>
> Cc: ceph-users <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] mount -t ceph
> 
> On Wed, Apr 27, 2016 at 2:55 PM, Deneau, Tom <tom.den...@amd.com> wrote:
> > What kernel versions are required to be able to use CephFS thru mount -t
> ceph?
> 
> The CephFS kernel client has been in for ages (2.6.34, I think?), but you
> want the absolute latest you can make happen if you're going to try it
> out.
> The actual mount command requires you have mount.ceph, which is in
> different places/availabilities depending on your distro.
> -Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mount -t ceph

2016-04-27 Thread Deneau, Tom
What kernel versions are required to be able to use CephFS thru mount -t ceph?

-- Tom Deneau
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] yum install ceph on RHEL 7.2

2016-03-09 Thread Deneau, Tom


> -Original Message-
> From: Ken Dreyer [mailto:kdre...@redhat.com]
> Sent: Tuesday, March 08, 2016 10:24 PM
> To: Shinobu Kinjo
> Cc: Deneau, Tom; ceph-users
> Subject: Re: [ceph-users] yum install ceph on RHEL 7.2
> 
> On Tue, Mar 8, 2016 at 4:11 PM, Shinobu Kinjo <shinobu...@gmail.com>
> wrote:
> > If you register subscription properly, you should be able to install
> > the Ceph without the EPEL.
> 
> The opposite is true (when installing upstream / ceph.com).
> 
> We rely on EPEL for several things, like leveldb and xmlstarlet.
> 
> - Ken

Ken --

What about when you just do yum install from the preconfigured repos?
Should EPEL be required for that?

-- Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] yum install ceph on RHEL 7.2

2016-03-08 Thread Deneau, Tom
Yes, that is what lsb_release is showing...

> -Original Message-
> From: Shinobu Kinjo [mailto:shinobu...@gmail.com]
> Sent: Tuesday, March 08, 2016 5:01 PM
> To: Deneau, Tom
> Cc: ceph-users
> Subject: Re: [ceph-users] yum install ceph on RHEL 7.2
> 
> On Wed, Mar 9, 2016 at 7:52 AM, Deneau, Tom <tom.den...@amd.com> wrote:
> > Just checking...
> >
> > On vanilla RHEL 7.2 (x64), should I be able to yum install ceph without
> adding the EPEL repository?
> 
> Are you talking about?
> 
> # lsb_release -a
>  ...
> Description:Red Hat Enterprise Linux Server release 7.2 (Maipo)
> Release:7.2
> Codename:Maipo
> 
> Cheers,
> S
> 
> > (looks like the version being installed is 0.94.6)
> >
> > -- Tom Deneau, AMD
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Email:
> shin...@linux.com
> GitHub:
> shinobu-x
> Blog:
> Life with Distributed Computational System based on OpenSource
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] yum install ceph on RHEL 7.2

2016-03-08 Thread Deneau, Tom
Just checking...

On vanilla RHEL 7.2 (x64), should I be able to yum install ceph without adding 
the EPEL repository?
(looks like the version being installed is 0.94.6)

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd kernel mapping on 3.13

2016-01-29 Thread Deneau, Tom
The commands shown below had successfully mapped rbd images in the past on 
kernel version 4.1.

Now I need to map one on a system running the 3.13 kernel.
Ceph version is 9.2.0.  Rados bench operations work with no problem.
I get the same error message whether I use format 1 or format 2 or 
--image-shared.
Is there something different I need to with the 3.13 kernel?

-- Tom

  # rbd create --size 1000 --image-format 1 rbd/rbddemo
  # rbd info rbddemo
rbd image 'rbddemo':
  size 1000 MB in 250 objects
  order 22 (4096 kB objects)
  block_name_prefix: rb.0.4f08.77bd73c7
  format: 1

  # rbd map rbd/rbddemo
rbd: sysfs write failed
rbd: map failed: (5) Input/output error

I
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd kernel mapping on 3.13

2016-01-29 Thread Deneau, Tom
Ah, yes I see this...
   feature set mismatch, my 4a042a42 < server's 104a042a42, missing 10
which looks like CEPH_FEATURE_CRUSH_V2

Is there any workaround for that?
Or what ceph version would I have to back up to?

The cbt librbdfio benchmark worked fine (once I had installed librbd-dev on the 
client).

-- Tom

> -Original Message-
> From: Ilya Dryomov [mailto:idryo...@gmail.com]
> Sent: Friday, January 29, 2016 4:53 PM
> To: Deneau, Tom
> Cc: ceph-users; c...@lists.ceph.com
> Subject: Re: [ceph-users] rbd kernel mapping on 3.13
> 
> On Fri, Jan 29, 2016 at 11:43 PM, Deneau, Tom <tom.den...@amd.com> wrote:
> > The commands shown below had successfully mapped rbd images in the past
> on kernel version 4.1.
> >
> > Now I need to map one on a system running the 3.13 kernel.
> > Ceph version is 9.2.0.  Rados bench operations work with no problem.
> > I get the same error message whether I use format 1 or format 2 or --
> image-shared.
> > Is there something different I need to with the 3.13 kernel?
> >
> > -- Tom
> >
> >   # rbd create --size 1000 --image-format 1 rbd/rbddemo
> >   # rbd info rbddemo
> > rbd image 'rbddemo':
> >   size 1000 MB in 250 objects
> >   order 22 (4096 kB objects)
> >   block_name_prefix: rb.0.4f08.77bd73c7
> >   format: 1
> >
> >   # rbd map rbd/rbddemo
> > rbd: sysfs write failed
> > rbd: map failed: (5) Input/output error
> 
> You are likely missing feature bits - 3.13 was released way before 9.2.0.
> The exact error is printed to the kernel log - do dmesg | tail or so.
> 
> Thanks,
> 
> Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] s3cmd --disable-multipart

2015-12-10 Thread Deneau, Tom
If using s3cmd to radosgw and using s3cmd's --disable-multipart option, is 
there any limit to the size of the object that can be stored thru radosgw?

Also, is there a recommendation for multipart chunk size for radosgw?

-- Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs per OSD

2015-11-05 Thread Deneau, Tom
I have the following 4 pools:

pool 1 'rep2host' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 88 flags hashpspool stripe_width 0
pool 17 'rep2osd' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 154 flags hashpspool stripe_width 0
pool 20 'ec104osd' erasure size 14 min_size 10 crush_ruleset 7 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 163 flags hashpspool stripe_width 
4160
pool 21 'ec32osd' erasure size 5 min_size 3 crush_ruleset 6 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 165 flags hashpspool stripe_width 
4128

with 15 up osds.

and ceph health tells me I have too many PGs per OSD (375 > 300)

I'm not sure where the 375 comes from, since there are 896 pgs and 15 osds = 
approx. 60 pgs per OSD.

-- Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure pool, ruleset-root

2015-09-17 Thread Deneau, Tom
I see that I can create a crush rule that only selects osds
from a certain node by this:
   ceph osd crush rule create-simple byosdn1 myhostname osd

and if I then create a replicated pool that uses that rule,
it does indeed select osds only from that node.

I would like to do a similar thing with an erasure pool.

When creating the ec-profile, I have successfully used
   ruleset-failure-domain=osd
but when I try to use
   ruleset-root=myhostname
and then use that profile to create an erasure pool,
the resulting pool doesn't seem to limit to that node.

What is the correct syntax for creating such an erasure pool?

-- Tom Deneau
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench seq throttling

2015-09-16 Thread Deneau, Tom


> -Original Message-
> From: Gregory Farnum [mailto:gfar...@redhat.com]
> Sent: Monday, September 14, 2015 5:32 PM
> To: Deneau, Tom
> Cc: ceph-users
> Subject: Re: [ceph-users] rados bench seq throttling
> 
> On Thu, Sep 10, 2015 at 1:02 PM, Deneau, Tom <tom.den...@amd.com> wrote:
> > Running 9.0.3 rados bench on a 9.0.3 cluster...
> > In the following experiments this cluster is only 2 osd nodes, 6 osds
> > each and a separate mon node (and a separate client running rados
> bench).
> >
> > I have two pools populated with 4M objects.  The pools are replicated
> > x2 with identical parameters.  The objects appear to be spread evenly
> across the 12 osds.
> >
> > In all cases I drop caches on all nodes before doing a rados bench seq
> test.
> > In all cases I run rados bench seq for identical times (30 seconds)
> > and in that time we do not run out of objects to read from the pool.
> >
> > I am seeing significant bandwidth differences between the following:
> >
> >* running a single instance of rados bench reading from one pool with
> 32 threads
> >  (bandwidth approx 300)
> >
> >* running two instances rados bench each reading from one of the two
> pools
> >  with 16 threads per instance (combined bandwidth approx. 450)
> >
> > I have already increased the following:
> >   objecter_inflight_op_bytes = 10485760
> >   objecter_inflight_ops = 8192
> >   ms_dispatch_throttle_bytes = 1048576000  #didn't seem to have any
> > effect
> >
> > The disks and network are not reaching anywhere near 100% utilization
> >
> > What is the best way to diagnose what is throttling things in the one-
> instance case?
> 
> Pretty sure the rados bench main threads are just running into their
> limits. There's some work that Piotr (I think?) has been doing to make it
> more efficient if you want to browse the PRs, but I don't think they're
> even in a dev release yet.
> -Greg

Some further experiments with numbers of rados-bench clients:
   * All of the following are reading 4M sized objects with dropped caches as
 described above:
   * When we run multiple clients, they are run on different pools but from
 the same separate client node, which is not anywhere near CPU or 
network-limited
* threads is the total across all clients, as is BW

Case 1: two node cluster, 3 osds on each node
total  BW  BW  BW 
threads   1 cli   2cli4cli
---   -     
  4174 185 194
  8214 273 301
 16198 309 399
 32226 309 409
 64246 341 421


Case 2: one node cluster, 6 osds on one node.
total  BW  BW  BW 
threads   1 cli   2cli4cli
---   -     
  4   339  262 236
  8   465  426 383
 16   467  433 353
 32   470  432 339
 64   471  429 345

So, from the above data, having multiple clients definitely helps
in the 2-node case (Case 1) but hurts in the single-node case.
Still interested in any tools that would help analyze this more deeply...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench seq throttling

2015-09-10 Thread Deneau, Tom
Running 9.0.3 rados bench on a 9.0.3 cluster...
In the following experiments this cluster is only 2 osd nodes, 6 osds each
and a separate mon node (and a separate client running rados bench).

I have two pools populated with 4M objects.  The pools are replicated x2
with identical parameters.  The objects appear to be spread evenly across the 
12 osds.

In all cases I drop caches on all nodes before doing a rados bench seq test.
In all cases I run rados bench seq for identical times (30 seconds) and in that 
time
we do not run out of objects to read from the pool.

I am seeing significant bandwidth differences between the following:

   * running a single instance of rados bench reading from one pool with 32 
threads
 (bandwidth approx 300)

   * running two instances rados bench each reading from one of the two pools
 with 16 threads per instance (combined bandwidth approx. 450)

I have already increased the following:
  objecter_inflight_op_bytes = 10485760
  objecter_inflight_ops = 8192
  ms_dispatch_throttle_bytes = 1048576000  #didn't seem to have any effect

The disks and network are not reaching anywhere near 100% utilization

What is the best way to diagnose what is throttling things in the one-instance 
case?

-- Tom Deneau, AMD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ensuring write activity is finished

2015-09-08 Thread Deneau, Tom
When measuring read bandwidth using rados bench, I've been doing the
following:
   * write some objects using rados bench write --no-cleanup
   * drop caches on the osd nodes
   * use rados bench seq to read.

I've noticed that on the first rados bench seq immediately following the rados 
bench write,
there is often activity on the journal partitions which must be a carry over 
from the rados
bench write.

What is the preferred way to ensure that all write activity is finished before 
starting
to use rados bench seq?

-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-03 Thread Deneau, Tom
Rewording to remove confusion...

Config 1: set up a cluster with 1 node with 6 OSDs
Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each

In each case I do the following:
   1) rados bench write --no-cleanup the same number of 4M size objects
   2) drop caches on all osd nodes
   3) rados bench seq  -t 4 to sequentially read the objects
  and record the read bandwidth

Rados bench is running on a separate client, not on an OSD node.
The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors.

With Config 1, I see approximately 70% more sequential read bandwidth than with 
Config 2.

In both cases the primary OSDs of the objecgts appear evenly distributed across 
OSDs.

Yes, replication factor is 2 but since we are only measuring read performance,
I don't think that matters. 

Question is whether there is a ceph parameter that might be throttling the
2 node configuation?

-- Tom

> -Original Message-
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: Wednesday, September 02, 2015 7:29 PM
> To: ceph-users
> Cc: Deneau, Tom
> Subject: Re: [ceph-users] osds on 2 nodes vs. on one node
> 
> 
> Hello,
> 
> On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
> 
> > In a small cluster I have 2 OSD nodes with identical hardware, each
> > with
> > 6 osds.
> >
> > * Configuration 1:  I shut down the osds on one node so I am using 6
> > OSDS on a single node
> >
> Shut down how?
> Just a "service blah stop" or actually removing them from the cluster aka
> CRUSH map?
> 
> > * Configuration 2:  I shut down 3 osds on each node so now I have 6
> > total OSDS but 3 on each node.
> >
> Same as above.
> And in this case even more relevant, because just shutting down random OSDs
> on both nodes would result in massive recovery action at best and more likely
> a broken cluster.
> 
> > I measure read performance using rados bench from a separate client node.
> Default parameters?
> 
> > The client has plenty of spare CPU power and the network and disk
> > utilization are not limiting factors. In all cases, the pool type is
> > replicated so we're just reading from the primary.
> >
> Replicated as in size 2?
> We can guess/assume that from your cluster size, but w/o you telling us or
> giving us all the various config/crush outputs that is only a guess.
> 
> > With Configuration 1, I see approximately 70% more bandwidth than with
> > configuration 2.
> 
> Never mind that bandwidth is mostly irrelevant in real life, which bandwidth,
> read or write?
> 
> > In general, any configuration where the osds span 2 nodes gets poorer
> > performance but in particular when the 2 nodes have equal amounts of
> > traffic.
> >
> 
> Again, guessing from what you're actually doing this isn't particular
> surprising.
> Because with a single node, default rules and replication of 2 your OSDs
> never have to replicate anything when it comes to writes.
> Whereas with 2 nodes replication happens and takes more time (latency) and
> might also saturate your network (we have of course no idea how your cluster
> looks like).
> 
> Christian
> 
> > Is there any ceph parameter that might be throttling the cases where
> > osds span 2 nodes?
> >
> > -- Tom Deneau, AMD
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds on 2 nodes vs. on one node

2015-09-03 Thread Deneau, Tom
After running some other experiments, I see now that the high single-node
bandwidth only occurs when ceph-mon is also running on that same node.
(In these small clusters I only had one ceph-mon running).
If I compare to a single-node where ceph-mon is not running, I see
basically identical performance to the two-node arrangement.

So now my question is:  Is it expected that there would be such
a large performance difference between using osds on a single node
where ceph-mon is running vs. using osds on a single node where
ceph-mon is not running?

-- Tom

> -Original Message-
> From: Deneau, Tom
> Sent: Thursday, September 03, 2015 10:39 AM
> To: 'Christian Balzer'; ceph-users
> Subject: RE: [ceph-users] osds on 2 nodes vs. on one node
> 
> Rewording to remove confusion...
> 
> Config 1: set up a cluster with 1 node with 6 OSDs Config 2: identical
> hardware, set up a cluster with 2 nodes with 3 OSDs each
> 
> In each case I do the following:
>1) rados bench write --no-cleanup the same number of 4M size objects
>2) drop caches on all osd nodes
>3) rados bench seq  -t 4 to sequentially read the objects
>   and record the read bandwidth
> 
> Rados bench is running on a separate client, not on an OSD node.
> The client has plenty of spare CPU power and the network and disk utilization
> are not limiting factors.
> 
> With Config 1, I see approximately 70% more sequential read bandwidth than
> with Config 2.
> 
> In both cases the primary OSDs of the objecgts appear evenly distributed
> across OSDs.
> 
> Yes, replication factor is 2 but since we are only measuring read
> performance, I don't think that matters.
> 
> Question is whether there is a ceph parameter that might be throttling the
> 2 node configuation?
> 
> -- Tom
> 
> > -Original Message-
> > From: Christian Balzer [mailto:ch...@gol.com]
> > Sent: Wednesday, September 02, 2015 7:29 PM
> > To: ceph-users
> > Cc: Deneau, Tom
> > Subject: Re: [ceph-users] osds on 2 nodes vs. on one node
> >
> >
> > Hello,
> >
> > On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
> >
> > > In a small cluster I have 2 OSD nodes with identical hardware, each
> > > with
> > > 6 osds.
> > >
> > > * Configuration 1:  I shut down the osds on one node so I am using 6
> > > OSDS on a single node
> > >
> > Shut down how?
> > Just a "service blah stop" or actually removing them from the cluster
> > aka CRUSH map?
> >
> > > * Configuration 2:  I shut down 3 osds on each node so now I have 6
> > > total OSDS but 3 on each node.
> > >
> > Same as above.
> > And in this case even more relevant, because just shutting down random
> > OSDs on both nodes would result in massive recovery action at best and
> > more likely a broken cluster.
> >
> > > I measure read performance using rados bench from a separate client node.
> > Default parameters?
> >
> > > The client has plenty of spare CPU power and the network and disk
> > > utilization are not limiting factors. In all cases, the pool type is
> > > replicated so we're just reading from the primary.
> > >
> > Replicated as in size 2?
> > We can guess/assume that from your cluster size, but w/o you telling
> > us or giving us all the various config/crush outputs that is only a guess.
> >
> > > With Configuration 1, I see approximately 70% more bandwidth than
> > > with configuration 2.
> >
> > Never mind that bandwidth is mostly irrelevant in real life, which
> > bandwidth, read or write?
> >
> > > In general, any configuration where the osds span 2 nodes gets
> > > poorer performance but in particular when the 2 nodes have equal
> > > amounts of traffic.
> > >
> >
> > Again, guessing from what you're actually doing this isn't particular
> > surprising.
> > Because with a single node, default rules and replication of 2 your
> > OSDs never have to replicate anything when it comes to writes.
> > Whereas with 2 nodes replication happens and takes more time (latency)
> > and might also saturate your network (we have of course no idea how
> > your cluster looks like).
> >
> > Christian
> >
> > > Is there any ceph parameter that might be throttling the cases where
> > > osds span 2 nodes?
> > >
> > > -- Tom Deneau, AMD
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osds on 2 nodes vs. on one node

2015-09-02 Thread Deneau, Tom
In a small cluster I have 2 OSD nodes with identical hardware, each with 6 osds.

* Configuration 1:  I shut down the osds on one node so I am using 6 OSDS on a 
single node

* Configuration 2:  I shut down 3 osds on each node so now I have 6 total OSDS 
but 3 on each node.

I measure read performance using rados bench from a separate client node.
The client has plenty of spare CPU power and the network and disk utilization 
are not limiting factors.
In all cases, the pool type is replicated so we're just reading from the 
primary.

With Configuration 1, I see approximately 70% more bandwidth than with 
configuration 2.
In general, any configuration where the osds span 2 nodes gets poorer 
performance but in particular
when the 2 nodes have equal amounts of traffic.

Is there any ceph parameter that might be throttling the cases where osds span 
2 nodes?

-- Tom Deneau, AMD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] a couple of radosgw questions

2015-08-31 Thread Deneau, Tom
I see that the objects that were deleted last Friday are indeed gone now (via 
gc I guess).
gc list does not show anything even after right after objects are deleted.
I couldn't get temp remove to do anything.

-- Tom


> -Original Message-
> From: Ben Hines [mailto:bhi...@gmail.com]
> Sent: Saturday, August 29, 2015 5:27 PM
> To: Brad Hubbard
> Cc: Deneau, Tom; ceph-users
> Subject: Re: [ceph-users] a couple of radosgw questions
> 
> I'm not the OP, but in my particular case, gc is proceeding normally
> (since 94.2, i think) -- i just have millions of older objects
> (months-old) which will not go away.
> 
> (see my other post --
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003967.html
>  )
> 
> -Ben
> 
> On Fri, Aug 28, 2015 at 5:14 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> > - Original Message -
> >> From: "Ben Hines" <bhi...@gmail.com>
> >> To: "Brad Hubbard" <bhubb...@redhat.com>
> >> Cc: "Tom Deneau" <tom.den...@amd.com>, "ceph-users" <ceph-us...@ceph.com>
> >> Sent: Saturday, 29 August, 2015 9:49:00 AM
> >> Subject: Re: [ceph-users] a couple of radosgw questions
> >>
> >> 16:22:38 root@sm-cephrgw4 /etc/ceph $ radosgw-admin temp remove
> >> unrecognized arg remove
> >> usage: radosgw-admin  [options...]
> >> commands:
> >> 
> >>   temp removeremove temporary objects that were created up
> to
> >>  specified date (and optional time)
> >
> > Looking into this ambiguity, thanks.
> >
> >>
> >>
> >> On Fri, Aug 28, 2015 at 4:24 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> >> > emove an object, it is no longer visible
> >> >> from the S3 API, but the objects
> >> >>that comprised it are still there in .rgw.buckets pool.  When do
> they
> >> >>get
> >> >>removed?
> >> >
> >> > Does the following command remove them?
> >> >
> >> > http://ceph.com/docs/master/radosgw/purge-temp/
> >>
> >
> > Does "radosgw-admin gc list" show anything?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] a couple of radosgw questions

2015-08-28 Thread Deneau, Tom
A couple of questions on the radosgw...

1.  I noticed when I use s3cmd to put a 10M object into a bucket in the rados 
object gateway,
I get the following objects created in .rgw.buckets:
 0.5M
   4M
   4M
 1.5M

I assume the 4M breakdown is controlled by rgw obj stripe size.  What 
causes the small initial 0.5M piece?

Also, is there any diagram showing which parts of this striping, if any, 
occur in parallel?


2. I noticed when I use s3cmd to remove an object, it is no longer visible from 
the S3 API, but the objects
   that comprised it are still there in .rgw.buckets pool.  When do they get 
removed?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench object not correct errors on v9.0.3

2015-08-26 Thread Deneau, Tom

 -Original Message-
 From: Dałek, Piotr [mailto:piotr.da...@ts.fujitsu.com]
 Sent: Wednesday, August 26, 2015 2:02 AM
 To: Sage Weil; Deneau, Tom
 Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
 Subject: RE: rados bench object not correct errors on v9.0.3
 
  -Original Message-
  From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
  ow...@vger.kernel.org] On Behalf Of Sage Weil
  Sent: Tuesday, August 25, 2015 7:43 PM
 
   I have built rpms from the tarball http://ceph.com/download/ceph-
  9.0.3.tar.bz2.
   Have done this for fedora 21 x86_64 and for aarch64.  On both
   platforms when I run a single node cluster with a few osds and run
   rados bench read tests (either seq or rand) I get occasional reports
   like
  
   benchmark_data_myhost_20729_object73 is not correct!
  
   I never saw these with similar rpm builds on these platforms from
   9.0.2
  sources.
  
   Also, if I go to an x86-64 system running Ubuntu trusty for which I
   am able to install prebuilt binary packages via
   ceph-deploy install --dev v9.0.3
  
   I do not see the errors there.
 
  Hrm.. haven't seen it on this end, but we're running/testing master
  and not
  9.0.2 specifically.  If you can reproduce this on master, that'd be very
 helpful!
 
  There have been some recent changes to rados bench... Piotr, does this
  seem like it might be caused by your changes?
 
 Yes. My PR #4690 (https://github.com/ceph/ceph/pull/4690) caused rados bench
 to be fast enough to sometimes run into race condition between librados's AIO
 and objbencher processing. That was fixed in PR #5152
 (https://github.com/ceph/ceph/pull/5152) which didn't make it into 9.0.3.
 Tom, you can confirm this by inspecting the contents of objects questioned
 (their contents should be perfectly fine and I in line with other objects).
 In the meantime you can either apply patch from PR #5152 on your own or use -
 -no-verify.
 
 With best regards / Pozdrawiam
 Piotr Dałek

Piotr --

Thank you.  Yes, when I looked at the contents of the objects they always
looked correct.  And yes a single object would sometimes report an error
and sometimes not.  So a race condition makes sense.

A couple of questions:

   * Why would I not see this behavior using the pre-built 9.0.3 binaries
 that get installed using ceph-deploy install --dev v9.0.3?  I would 
assume
 this is built from the same sources as the 9.0.3 tarball.

   * So I assume one should not compare pre 9.0.3 rados bench numbers with 
9.0.3 and after?
 The pull request https://github.com/ceph/ceph/pull/4690 did not mention the
 effect on final bandwidth numbers, did you notice what that effect was?

-- Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Sage Weil
 Sent: Monday, August 24, 2015 12:45 PM
 To: ceph-annou...@ceph.com; ceph-de...@vger.kernel.org; ceph-us...@ceph.com;
 ceph-maintain...@ceph.com
 Subject: v9.0.3 released
 
 This is the second to last batch of development work for the Infernalis
 cycle.  The most intrusive change is an internal (non user-visible) change to
 the OSD's ObjectStore interface.  Many fixes and improvements elsewhere
 across RGW, RBD, and another big pile of CephFS scrub/repair improvements.
 
 
 Getting Ceph
 
 
 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
 * For packages, see http://ceph.com/docs/master/install/get-packages
 * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-
 deploy
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html

I have built rpms from the tarball http://ceph.com/download/ceph-9.0.3.tar.bz2.
Have done this for fedora 21 x86_64 and for aarch64.  On both platforms when I 
run
a single node cluster with a few osds and run rados bench read tests
(either seq or rand) I get occasional reports like

benchmark_data_myhost_20729_object73 is not correct!

I never saw these with similar rpm builds on these platforms from 9.0.2 sources.

Also, if I go to an x86-64 system running Ubuntu trusty for which I am able to
install prebuilt binary packages via
ceph-deploy install --dev v9.0.3

I do not see the errors there.

Any suggestions welcome.

-- Tom Deneau, AMD



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


 -Original Message-
 From: Sage Weil [mailto:sw...@redhat.com]
 Sent: Tuesday, August 25, 2015 12:43 PM
 To: Deneau, Tom
 Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com;
 piotr.da...@ts.fujitsu.com
 Subject: Re: rados bench object not correct errors on v9.0.3
 
 On Tue, 25 Aug 2015, Deneau, Tom wrote:
   -Original Message-
   From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
   ow...@vger.kernel.org] On Behalf Of Sage Weil
   Sent: Monday, August 24, 2015 12:45 PM
   To: ceph-annou...@ceph.com; ceph-de...@vger.kernel.org;
   ceph-us...@ceph.com; ceph-maintain...@ceph.com
   Subject: v9.0.3 released
  
   This is the second to last batch of development work for the
   Infernalis cycle.  The most intrusive change is an internal (non
   user-visible) change to the OSD's ObjectStore interface.  Many fixes
   and improvements elsewhere across RGW, RBD, and another big pile of
 CephFS scrub/repair improvements.
  
  
   Getting Ceph
   
  
   * Git at git://github.com/ceph/ceph.git
   * Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
   * For packages, see http://ceph.com/docs/master/install/get-packages
   * For ceph-deploy, see
   http://ceph.com/docs/master/install/install-ceph-
   deploy
   --
   To unsubscribe from this list: send the line unsubscribe
   ceph-devel in the body of a message to majord...@vger.kernel.org
   More majordomo info at http://vger.kernel.org/majordomo-info.html
 
  I have built rpms from the tarball http://ceph.com/download/ceph-
 9.0.3.tar.bz2.
  Have done this for fedora 21 x86_64 and for aarch64.  On both
  platforms when I run a single node cluster with a few osds and run
  rados bench read tests (either seq or rand) I get occasional reports
  like
 
  benchmark_data_myhost_20729_object73 is not correct!
 
  I never saw these with similar rpm builds on these platforms from 9.0.2
 sources.
 
  Also, if I go to an x86-64 system running Ubuntu trusty for which I am
  able to install prebuilt binary packages via
  ceph-deploy install --dev v9.0.3
 
  I do not see the errors there.
 
 Hrm.. haven't seen it on this end, but we're running/testing master and not
 9.0.2 specifically.  If you can reproduce this on master, that'd be very
 helpful!
 
 There have been some recent changes to rados bench... Piotr, does this seem
 like it might be caused by your changes?
 
 sage
 

Just as a reminder this is with 9.0.3, not 9.0.2.

I just tried with the osds running on the fedora machine (with rpms that I 
built from the tarball)
and rados bench running on the Ubuntu machine (with pre-built binary packages)
and I do not see the errors with that combination.

Will see what happens with master.

-- Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Deneau, Tom
 Sent: Tuesday, August 25, 2015 1:24 PM
 To: Sage Weil
 Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com;
 piotr.da...@ts.fujitsu.com
 Subject: RE: rados bench object not correct errors on v9.0.3
 
 
 
  -Original Message-
  From: Sage Weil [mailto:sw...@redhat.com]
  Sent: Tuesday, August 25, 2015 12:43 PM
  To: Deneau, Tom
  Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com;
  piotr.da...@ts.fujitsu.com
  Subject: Re: rados bench object not correct errors on v9.0.3
 
  On Tue, 25 Aug 2015, Deneau, Tom wrote:
-Original Message-
From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: Monday, August 24, 2015 12:45 PM
To: ceph-annou...@ceph.com; ceph-de...@vger.kernel.org;
ceph-us...@ceph.com; ceph-maintain...@ceph.com
Subject: v9.0.3 released
   
This is the second to last batch of development work for the
Infernalis cycle.  The most intrusive change is an internal (non
user-visible) change to the OSD's ObjectStore interface.  Many
fixes and improvements elsewhere across RGW, RBD, and another big
pile of
  CephFS scrub/repair improvements.
   
   
Getting Ceph

   
* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
* For packages, see
http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see
http://ceph.com/docs/master/install/install-ceph-
deploy
--
To unsubscribe from this list: send the line unsubscribe
ceph-devel in the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
  
   I have built rpms from the tarball http://ceph.com/download/ceph-
  9.0.3.tar.bz2.
   Have done this for fedora 21 x86_64 and for aarch64.  On both
   platforms when I run a single node cluster with a few osds and run
   rados bench read tests (either seq or rand) I get occasional reports
   like
  
   benchmark_data_myhost_20729_object73 is not correct!
  
   I never saw these with similar rpm builds on these platforms from
   9.0.2
  sources.
  
   Also, if I go to an x86-64 system running Ubuntu trusty for which I
   am able to install prebuilt binary packages via
   ceph-deploy install --dev v9.0.3
  
   I do not see the errors there.
 
  Hrm.. haven't seen it on this end, but we're running/testing master
  and not
  9.0.2 specifically.  If you can reproduce this on master, that'd be
  very helpful!
 
  There have been some recent changes to rados bench... Piotr, does this
  seem like it might be caused by your changes?
 
  sage
 
 
 Just as a reminder this is with 9.0.3, not 9.0.2.
 
 I just tried with the osds running on the fedora machine (with rpms that I
 built from the tarball) and rados bench running on the Ubuntu machine (with
 pre-built binary packages) and I do not see the errors with that combination.
 
 Will see what happens with master.
 
 -- Tom


For making a tarball to build rpms from master, I did the following steps:
 
#  git checkout master
#  ./autogen.sh
#  ./configure
#  make dist-bzip2

then put the  .bz2 file in the rpmbuild/SOURCES and put the spec file in 
rpmbuild/SPECS

Are those the correct steps?  Asking because when I do rpmbuild from the above 
I eventually get

Processing files: ceph-9.0.3-0.fc21.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/ceph-9.0.3-0.fc21.x86_64/usr/sbin/ceph-disk-activate
error: File not found: 
/root/rpmbuild/BUILDROOT/ceph-9.0.3-0.fc21.x86_64/usr/sbin/ceph-disk-prepare

-- Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] load-gen throughput numbers

2015-07-22 Thread Deneau, Tom
If I run rados load-gen with the following parameters:
   --num-objects 50
   --max-ops 16
   --min-object-size 4M
   --max-object-size 4M
   --min-op-len 4M
   --max-op-len 4M
   --percent 100
   --target-throughput 2000

So every object is 4M in size and all the ops are reads of the entire 4M.
I would assume this is equivalent to running rados bench rand on that pool
if the pool has been previously filled with 50 4M objects.  And I am assuming
the --max-ops=16 is equivalent to having 16 concurrent threads in rados bench.
And I have set the target throughput higher than is possible with my network.

But when I run both rados load-gen and rados bench as described, I see that 
rados bench gets
about twice the throughput of rados load-gen.  Why would that be?

I see there is a --max-backlog parameter, is there some setting of that 
parameter
that would help the throughput?

-- Tom Deneau
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] load-gen throughput numbers

2015-07-22 Thread Deneau, Tom
Ah, I see that --max-backlog must be expressed in bytes/sec,
in spite of what the --help message says.

-- Tom

 -Original Message-
 From: Deneau, Tom
 Sent: Wednesday, July 22, 2015 5:09 PM
 To: 'ceph-users@lists.ceph.com'
 Subject: load-gen throughput numbers
 
 If I run rados load-gen with the following parameters:
--num-objects 50
--max-ops 16
--min-object-size 4M
--max-object-size 4M
--min-op-len 4M
--max-op-len 4M
--percent 100
--target-throughput 2000
 
 So every object is 4M in size and all the ops are reads of the entire 4M.
 I would assume this is equivalent to running rados bench rand on that pool
 if the pool has been previously filled with 50 4M objects.  And I am assuming
 the --max-ops=16 is equivalent to having 16 concurrent threads in rados
 bench.
 And I have set the target throughput higher than is possible with my network.
 
 But when I run both rados load-gen and rados bench as described, I see that
 rados bench gets
 about twice the throughput of rados load-gen.  Why would that be?
 
 I see there is a --max-backlog parameter, is there some setting of that
 parameter
 that would help the throughput?
 
 -- Tom Deneau
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any workaround for ImportError: No module named ceph_argparse?

2015-07-16 Thread Deneau, Tom
False alarm, things seem to be fine now.

-- Tom

 -Original Message-
 From: Deneau, Tom
 Sent: Wednesday, July 15, 2015 1:11 PM
 To: ceph-users@lists.ceph.com
 Subject: Any workaround for ImportError: No module named ceph_argparse?
 
 I just installed 9.0.2 on Trusty using ceph-deploy install --testing and I am
 hitting
 the ImportError:  No module named ceph_argparse issue.
 
 What is the best way to get around this issue and still run a version that is
 compatible with other (non-Ubuntu) nodes in the cluster that are running
 9.0.1?
 
 -- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Any workaround for ImportError: No module named ceph_argparse?

2015-07-15 Thread Deneau, Tom
I just installed 9.0.2 on Trusty using ceph-deploy install --testing and I am 
hitting
the ImportError:  No module named ceph_argparse issue.

What is the best way to get around this issue and still run a version that is
compatible with other (non-Ubuntu) nodes in the cluster that are running 9.0.1?

-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests going up and down

2015-07-14 Thread Deneau, Tom
I don't think there were any stale or unclean PGs,  (when there are,
I have seen health detail list them and it did not in this case).
I have since restarted the 2 osds and the health went immediately to HEALTH_OK.

-- Tom

 -Original Message-
 From: Will.Boege [mailto:will.bo...@target.com]
 Sent: Monday, July 13, 2015 10:19 PM
 To: Deneau, Tom; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] slow requests going up and down
 
 Does the ceph health detail show anything about stale or unclean PGs, or
 are you just getting the blocked ops messages?
 
 On 7/13/15, 5:38 PM, Deneau, Tom tom.den...@amd.com wrote:
 
 I have a cluster where over the weekend something happened and successive
 calls to ceph health detail show things like below.
 What does it mean when the number of blocked requests goes up and down
 like this?
 Some clients are still running successfully.
 
 -- Tom Deneau, AMD
 
 
 
 HEALTH_WARN 20 requests are blocked  32 sec; 2 osds have slow requests
 20 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 18 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 4 requests are blocked  32 sec; 2 osds have slow requests
 4 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 2 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 27 requests are blocked  32 sec; 2 osds have slow requests
 27 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 25 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 34 requests are blocked  32 sec; 2 osds have slow requests
 34 ops are blocked  536871 sec
 9 ops are blocked  536871 sec on osd.5
 25 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests going up and down

2015-07-13 Thread Deneau, Tom
I have a cluster where over the weekend something happened and successive calls 
to ceph health detail show things like below.
What does it mean when the number of blocked requests goes up and down like 
this?
Some clients are still running successfully.

-- Tom Deneau, AMD



HEALTH_WARN 20 requests are blocked  32 sec; 2 osds have slow requests
20 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
18 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 4 requests are blocked  32 sec; 2 osds have slow requests
4 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
2 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 27 requests are blocked  32 sec; 2 osds have slow requests
27 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
25 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 34 requests are blocked  32 sec; 2 osds have slow requests
34 ops are blocked  536871 sec
9 ops are blocked  536871 sec on osd.5
25 ops are blocked  536871 sec on osd.7
2 osds have slow requests
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados gateway to use ec pools

2015-06-19 Thread Deneau, Tom
what is the correct way to make radosgw create its pools as erasure coded pools?

-- Tom Deneau, AMD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] journal raw partition

2015-04-30 Thread Deneau, Tom
I am experimenting with different external journal partitions as raw partitions 
(no file system).

using 

ceph-deploy osd prepare foo:/mount-point-for-data-partition:journal-partition

followed by 
ceph-deploy osd activate (same arguments)

When the specified journal-partition is on an ssd drive I notice that
osd prepare reports
[WARNIN] DEBUG:ceph-disk:Journal /dev/sdb2 is a partition
[WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the 
same device as the osd data
WARNIN] INFO:ceph-disk:Running command: /usr/sbin/blkid -p -o udev /dev/sdb2
WARNIN] WARNING:ceph-disk:Journal /dev/sdb2 was not prepared with ceph-disk. 
Symlinking directly.
[WARNIN] DEBUG:ceph-disk:Preparing osd data dir /var/local//dev/sdc1
[WARNIN] DEBUG:ceph-disk:Creating symlink /var/local//dev/sdc1/journal - 
/dev/sdb2

and the later ceph-deploy osd activate works fine

But when the journal-partition is a small partition on the beginning of the 
data drive
osd prepare reports
[WARNIN] DEBUG:ceph-disk:Journal is file /dev/sdc2   
=
[WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the 
same device as the osd data
[WARNIN] DEBUG:ceph-disk:Preparing osd data dir /var/local//dev/sdc1
[WARNIN] DEBUG:ceph-disk:Creating symlink /var/local//dev/sdc1/journal - 
/dev/sdc2

and then the later ceph-deploy osd activate fails with

[WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph 
--mkfs --mkkey -i 0 --monmap /var/local//dev/sd\
c1/activate.monmap --osd-data /var/local//dev/sdc1 --osd-journal 
/var/local//dev/sdc1/journal --osd-uuid 05b3933e-9ac4-453a-9072-c7ebf242ba7\
0 --keyring /var/local//dev/sdc1/keyring
[WARNIN] 2015-04-30 13:09:30.869745 3ff9df5cec0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
[WARNIN] 2015-04-30 13:09:30.869810 3ff9df5cec0 -1 journal check: ondisk fsid 
---- doesn't match expected 
05b3933e-9ac4-453a-9072-c7ebf242ba70, invalid (someone else's?) journal
[WARNIN] 2015-04-30 13:09:30.869863 3ff9df5cec0 -1 
filestore(/var/local//dev/sdc1) mkjournal error creating journal on 
/var/local//dev/sdc1/journal: (22) Invalid argument
[WARNIN] 2015-04-30 13:09:30.869883 3ff9df5cec0 -1 OSD::mkfs: ObjectStore::mkfs 
failed with error -22
[WARNIN] 2015-04-30 13:09:30.869934 3ff9df5cec0 -1  ** ERROR: error creating 
empty object store in /var/local//dev/sdc1: (22) Invalid argument

I'm assuming the problem started with osd prepare thinking that /dev/sdc2 was a 
file rather than a partition.
Is there some partition table thing I am missing here?

parted /dev/sdc print gives

Disk /dev/sdc: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End SizeFile system  Name Flags
 2  1049kB  5000MB  4999MB   primary
 1  5000MB  2000GB  1995GB  xfs  primary

Not sure if it is related but I do know that in the past I had created a
single partition on /dev/sdc and used that as an xfs data partition.

-- Tom Deneau, AMD



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] switching journal location

2015-04-16 Thread Deneau, Tom
If my cluster is quiet and on one node I want to switch the location of the 
journal from
the default location to a file on an SSD drive (or vice versa), what is the
quickest way to do that?  Can I make a soft link to the new location and
do it without restarting the OSDs?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] installing and updating while leaving osd drive data intact

2015-04-09 Thread Deneau, Tom
Referencing this old thread below, I am wondering what is the proper way
to install say new versions of ceph and start up daemons but keep
all the data on the osd drives.

I had been using ceph-deploy new which I guess creates a new cluster fsid.
Normally for my testing I had been starting with clean osd drives but
I would also like to be able to restart and leave the osd drives as is.

-- Tom


 Hi,
 I have faced a similar issue. This happens if the ceph disks aren't
 purged/cleaned completely. Clear of the contents in the /dev/sdb1 device.
 There is a file named ceph_fsid in the disk which would  have the old
 cluster's fsid. This needs to be deleted for it to work.

 Hope it helps.

 Sharmila


On Mon, May 26, 2014 at 2:52 PM, JinHwan Hwang calanchue at gmail.com wrote:

 I'm trying to install ceph 0.80.1 on ubuntu 14.04. All other things goes
 well except 'activate osd' phase. It tells me they can't find proper fsid
 when i do 'activate osd'. This is not my first time of installing ceph, and
 all the process i did was ok when i did on other(though they were ubuntu
 12.04 , virtual machines, ceph-emperor)

 ceph at ceph-mon:~$ ceph-deploy osd activate ceph-osd0:/dev/sdb1
 ceph-osd0:/dev/sdc1 ceph-osd1:/dev/sdb1 ceph-osd1:/dev/sdc1
 ...
 [ceph-osd0][WARNIN] ceph-disk: Error: No cluster conf found in /etc/ceph
 with fsid 05b994a0-20f9-48d7-8d34-107ffcb39e5b
 ..
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] object size in rados bench write

2015-04-08 Thread Deneau, Tom
I've noticed when I use large object sizes like 100M with rados bench write, I 
get 
rados -p data2 bench 60 write --no-cleanup -b 100M
 Maintaining 16 concurrent writes of 104857600 bytes for up to 60 seconds or 0 
objects
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1   3 3 0 0 0 - 0
 2   5 5 0 0 0 - 0
 3   8 8 0 0 0 - 0
 4  1010 0 0 0 - 0
 5  1313 0 0 0 - 0
 6  1515 0 0 0 - 0
error during benchmark: -5
error 5: (5) Input/output error

An object_size of 32M works fine and the cluster seems otherwise fine.

Seems related to this issue 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-March/028288.html
But I didn't see a resolution for that.

Is there a timeout that is kicking in?

-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object size in rados bench write

2015-04-08 Thread Deneau, Tom
Ah, I see there is an osd parameter for this

osd max write size
Description:The maximum size of a write in megabytes.
Default:90

 -Original Message-
 From: Deneau, Tom
 Sent: Wednesday, April 08, 2015 3:57 PM
 To: 'ceph-users@lists.ceph.com'
 Subject: object size in rados bench write
 
 I've noticed when I use large object sizes like 100M with rados bench write,
 I get
 rados -p data2 bench 60 write --no-cleanup -b 100M
  Maintaining 16 concurrent writes of 104857600 bytes for up to 60 seconds or
 0 objects
sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1   3 3 0 0 0 - 0
  2   5 5 0 0 0 - 0
  3   8 8 0 0 0 - 0
  4  1010 0 0 0 - 0
  5  1313 0 0 0 - 0
  6  1515 0 0 0 - 0
 error during benchmark: -5
 error 5: (5) Input/output error
 
 An object_size of 32M works fine and the cluster seems otherwise fine.
 
 Seems related to this issue http://lists.ceph.com/pipermail/ceph-users-
 ceph.com/2014-March/028288.html
 But I didn't see a resolution for that.
 
 Is there a timeout that is kicking in?
 
 -- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench seq read with single thread

2015-04-08 Thread Deneau, Tom
Say I have a single node cluster with 5 disks.
And using dd iflag=direct on that node, I can see disk read bandwidth at 160 
MB/s

I populate a pool with 4MB objects.
And then on that same single node, I run
$ drop-caches using /proc/sys/vm/drop_caches
$ rados -p mypool bench nn seq -t 1

What bandwidth should I expect for the rados bench seq command here?
(I am seeing approximately 70 MB/s with -t 1)

-- Tom Deneau


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] clients and monitors

2015-03-25 Thread Deneau, Tom
A couple of client-monitor questions:

1) When a client contacts a monitor to get the cluster map, how does it
   decide which monitor to try to contact?

2) Having gotten the cluster map, assuming a client wants to do multiple reads 
and writes, 
   does the client have to re-contact the monitor to get the latest cluster map
   for each operation?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] who is using radosgw with civetweb?

2015-02-26 Thread Deneau, Tom
Robert --

We are still having trouble with this.

Can you share your [client.radosgw.gateway] section of ceph.conf and
were there any other special things to be aware of?

-- Tom

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Robert LeBlanc
Sent: Thursday, February 26, 2015 12:27 PM
To: Sage Weil
Cc: Ceph-User; ceph-devel
Subject: Re: [ceph-users] who is using radosgw with civetweb?

Thanks, we were able to get it up and running very quickly. If it performs 
well, I don't see any reason to use Apache+fast_cgi. I don't have any problems 
just focusing on civetweb.

On Wed, Feb 25, 2015 at 2:49 PM, Sage Weil sw...@redhat.com wrote:
 On Wed, 25 Feb 2015, Robert LeBlanc wrote:
 We tried to get radosgw working with Apache + mod_fastcgi, but due to 
 the changes in radosgw, Apache, mode_*cgi, etc and the documentation 
 lagging and not having a lot of time to devote to it, we abandoned it.
 Where it the documentation for civetweb? If it is appliance like and 
 easy to set-up, we would like to try it to offer some feedback on 
 your question.

 In giant and hammer, it is enabled by default on port 7480.  On 
 firefly, you need to add the line

  rgw frontends = fastcgi, civetweb port=7480

 to ceph.conf (you can of course adjust the port number if you like) 
 and radosgw will run standalone w/ no apache or anything else.

 sage



 Thanks,
 Robert LeBlanc

 On Wed, Feb 25, 2015 at 12:31 PM, Sage Weil sw...@redhat.com wrote:
  Hey,
 
  We are considering switching to civetweb (the embedded/standalone 
  rgw web
  server) as the primary supported RGW frontend instead of the 
  current apache + mod-fastcgi or mod-proxy-fcgi approach.  
  Supported here means both the primary platform the upstream 
  development focuses on and what the downstream Red Hat product will 
  officially support.
 
  How many people are using RGW standalone using the embedded 
  civetweb server instead of apache?  In production?  At what scale?  
  What
  version(s) (civetweb first appeared in firefly and we've backported 
  most fixes).
 
  Have you seen any problems?  Any other feedback?  The hope is to 
  (vastly) simplify deployment.
 
  Thanks!
  sage
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mixed ceph versions

2015-02-25 Thread Deneau, Tom
I need to set up a cluster where the rados client (for running rados
bench) may be on a different architecture and hence running a different
ceph version from the osd/mon nodes.  Is there a list of which ceph
versions work together for a situation like this?

-- Tom



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure coded pool

2015-02-20 Thread Deneau, Tom
Is it possible to run an erasure coded pool using default k=2, m=2 profile on a 
single node?
(this is just for functionality testing). The single node has 3 OSDs. 
Replicated pools run fine.

ceph.conf does contain:
   osd crush chooseleaf type = 0


-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com