CBT on an existing cluster

2016-01-05 Thread Deneau, Tom
Having trouble getting a reply from c...@cbt.com so trying ceph-devel list...

To get familiar with CBT, I first wanted to use it on an existing cluster.
(i.e., not have CBT do any cluster setup).

Is there a .yaml example that illustrates how to use cbt to run for example, 
its radosbench benchmark on an existing cluster?

-- Tom Deneau, AMD

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ceph-mon terminated with status 28

2015-12-15 Thread Deneau, Tom
Brad --

The issue is in tracker now..
http://tracker.ceph.com/issues/14088

-- Tom

> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, December 14, 2015 3:47 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: ceph-mon terminated with status 28
> 
> - Original Message -
> > From: "Tom Deneau" 
> > To: "Brad Hubbard" 
> > Cc: ceph-devel@vger.kernel.org
> > Sent: Tuesday, 15 December, 2015 3:21:27 AM
> > Subject: RE: ceph-mon terminated with status 28
> >
> > Thanks, Brad.  That was the problem.
> 
> Np.
> 
> >
> > Is there a reason why we don't log more descriptive info for this kind
> of
> > failure?
> 
> I guess it may not have been anticipated that init would swallow these
> types of
> errors early in the process and just report the return code.
> 
> If you wouldn't mind opening a tracker for "Fatal errors at start-up are
> not
> logged", or something similar,  I can take a look at getting some
> meaningful log
> entries reported during these early failures.
> 
> Let me know the tracker number.
> 
> Cheers,
> Brad
> 
> >
> > -- Tom
> >
> > > -Original Message-
> > > From: Brad Hubbard [mailto:bhubb...@redhat.com]
> > > Sent: Sunday, December 13, 2015 4:19 PM
> > > To: Deneau, Tom
> > > Cc: ceph-devel@vger.kernel.org
> > > Subject: Re: ceph-mon terminated with status 28
> > >
> > > - Original Message -
> > > > From: "Tom Deneau" 
> > > > To: ceph-devel@vger.kernel.org
> > > > Sent: Sunday, 13 December, 2015 11:49:16 PM
> > > > Subject: ceph-mon terminated with status 28
> > > >
> > > > I am trying to understand the following failure:
> > > >
> > > > A small cluster was running fine, and then was left unused for a
> while.
> > > > When I went to try to use it again, the mon socket wasn't there and
> I
> > > > could see that ceph-mon was not running.  I saw the lines below at
> the
> > > > end of dmesg output.
> > > > When I tried to restart ceph-mon using sudo start ceph-mon
> id=monhost,
> > > > I got the same set of errors newly appended to dmesg output.
> > > >
> > > > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log,
> > > > just the recording of new mon processes starting.
> > > >
> > > > In this particular small cluster, the mon process was running on the
> > > > same node with 7 osd processes.  sudo initctl list shows that the
> osd
> > > > procs are still up, although logging the fact that they can't
> > > > communicate with the mon socket.
> > > >
> > > > Is there someplace else I should look for more details as to why mon
> > > > is down and can't be restarted?
> > > >
> > > > -- Tom Deneau
> > > >
> > > > dmesg output:
> > > > --
> > > >  init: ceph-mon (ceph/monhost) main process (16538) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16227) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16546) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16548) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16556) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) main process ended, respawning
> > > >  init: ceph-create-keys main process (16558) killed by TERM signal
> > > >  init: ceph-mon (ceph/monhost) main process (16566) terminated with
> > > > status 28
> > > >  init: ceph-mon (ceph/monhost) respawning too fast, stopped
> > > >  init: ceph-create-keys main process (16568) killed by TERM signal
> > >
> > > It looks like it's complaining about lack of space?
> > >
> > > src/ceph_mon.cc:
> > >
> > > 204 int main(int argc, const char **argv)·
> > > 205 {
> > > 8<
> > > 475   {
> > > 476 // check fs stats. don't start if it's critically close to
> full.
> > > 477 ceph_data_stats_t stats;
> > > 478 int err = get_fs_sta

RE: ceph-mon terminated with status 28

2015-12-14 Thread Deneau, Tom
Thanks, Brad.  That was the problem.

Is there a reason why we don't log more descriptive info for this kind of 
failure?

-- Tom

> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Sunday, December 13, 2015 4:19 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: ceph-mon terminated with status 28
> 
> - Original Message -
> > From: "Tom Deneau" 
> > To: ceph-devel@vger.kernel.org
> > Sent: Sunday, 13 December, 2015 11:49:16 PM
> > Subject: ceph-mon terminated with status 28
> >
> > I am trying to understand the following failure:
> >
> > A small cluster was running fine, and then was left unused for a while.
> > When I went to try to use it again, the mon socket wasn't there and I
> > could see that ceph-mon was not running.  I saw the lines below at the
> > end of dmesg output.
> > When I tried to restart ceph-mon using sudo start ceph-mon id=monhost,
> > I got the same set of errors newly appended to dmesg output.
> >
> > I don't see anything more descriptive in /var/log/ceph/ceph-mon.log,
> > just the recording of new mon processes starting.
> >
> > In this particular small cluster, the mon process was running on the
> > same node with 7 osd processes.  sudo initctl list shows that the osd
> > procs are still up, although logging the fact that they can't
> > communicate with the mon socket.
> >
> > Is there someplace else I should look for more details as to why mon
> > is down and can't be restarted?
> >
> > -- Tom Deneau
> >
> > dmesg output:
> > --
> >  init: ceph-mon (ceph/monhost) main process (16538) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16227) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16546) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16548) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16556) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) main process ended, respawning
> >  init: ceph-create-keys main process (16558) killed by TERM signal
> >  init: ceph-mon (ceph/monhost) main process (16566) terminated with
> > status 28
> >  init: ceph-mon (ceph/monhost) respawning too fast, stopped
> >  init: ceph-create-keys main process (16568) killed by TERM signal
> 
> It looks like it's complaining about lack of space?
> 
> src/ceph_mon.cc:
> 
> 204 int main(int argc, const char **argv)·
> 205 {
> 8<
> 475   {
> 476 // check fs stats. don't start if it's critically close to full.
> 477 ceph_data_stats_t stats;
> 478 int err = get_fs_stats(stats, g_conf->mon_data.c_str());
> 479 if (err < 0) {
> 480   cerr << "error checking monitor data's fs stats: " <<
> cpp_strerror(err)
> 481<< std::endl;
> 482   exit(-err);
> 483 }
> 484 if (stats.avail_percent <= g_conf->mon_data_avail_crit) {
> 485   cerr << "error: monitor data filesystem reached concerning
> levels of"
> 486<< " available storage space (available: "
> 487<< stats.avail_percent << "% " <<
> prettybyte_t(stats.byte_avail)
> 488<< ")\nyou may adjust 'mon data avail crit' to a lower
> value"
> 489<< " to make this go away (default: " << g_conf-
> >mon_data_avail_crit
> 490<< "%)\n" << std::endl;
> 491   exit(ENOSPC);
> 492 }
> 
> #define ENOSPC  28  /* No space left on device */
> 
> Try starting ceph-mon from the command line and see if you get the above
> message.
> 
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >


ceph-mon terminated with status 28

2015-12-13 Thread Deneau, Tom
I am trying to understand the following failure:

A small cluster was running fine, and then was left unused for a while.
When I went to try to use it again, the mon socket wasn't there and I could see 
that
ceph-mon was not running.  I saw the lines below at the end of dmesg output.
When I tried to restart ceph-mon using sudo start ceph-mon id=monhost,
I got the same set of errors newly appended to dmesg output.

I don't see anything more descriptive in /var/log/ceph/ceph-mon.log, just
the recording of new mon processes starting.

In this particular small cluster, the mon process was running on the same
node with 7 osd processes.  sudo initctl list shows that the osd procs are still
up, although logging the fact that they can't communicate with the mon socket.

Is there someplace else I should look for more details as to why mon is down
and can't be restarted?

-- Tom Deneau

dmesg output:
--
 init: ceph-mon (ceph/monhost) main process (16538) terminated with status 28
 init: ceph-mon (ceph/monhost) main process ended, respawning
 init: ceph-create-keys main process (16227) killed by TERM signal
 init: ceph-mon (ceph/monhost) main process (16546) terminated with status 28
 init: ceph-mon (ceph/monhost) main process ended, respawning
 init: ceph-create-keys main process (16548) killed by TERM signal
 init: ceph-mon (ceph/monhost) main process (16556) terminated with status 28
 init: ceph-mon (ceph/monhost) main process ended, respawning
 init: ceph-create-keys main process (16558) killed by TERM signal
 init: ceph-mon (ceph/monhost) main process (16566) terminated with status 28
 init: ceph-mon (ceph/monhost) respawning too fast, stopped
 init: ceph-create-keys main process (16568) killed by TERM signal

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tarball for 10.0.0

2015-11-30 Thread Deneau, Tom
I did not see the source tarball for 10.0.0 at 
http://download.ceph.com/tarballs/ceph-10.0.0.tar.gz

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


keyring issues, 9.1.0

2015-10-22 Thread Deneau, Tom
My current situation as I upgrade to v9.1.0 is that client.admin keyring seems 
to work fine, for instance for ceph status command.  But commands that use 
client.bootstrap-osd  such as

/usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise 
a428120d-99ec-4a73-999f-75d8a6bfcb2e

are getting "EACCES: access denied"

with log entries in ceph.audit.log such as

2015-10-22 13:50:24.070249 mon.0 10.0.2.132:6789/0 33 : audit [INF] 
from='client.? 10.0.2.132:0/263577121' entity='client.bootstrap-osd' 
cmd=[{"prefix": "osd create", "uuid": "a428120d-99ec-4a73-999f-75d8a6bfcb2e"}]: 
 access denied

I tried setting
debug auth = 0
in ceph.conf but couldn't tell anything from that output.

Is there anything special I should look for here?
Note: I do have /var/lib/ceph and subdirectories owned by ceph:ceph

-- Tom


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


symbol lookup error v9.1.0

2015-10-16 Thread Deneau, Tom
On an ubuntu trusty system,
   * I installed v9.1.0 and could bring up a single node cluster with it.
   * I did a git checkout of v9.1.0, followed by ./autogen.sh; ./configure; make

Then when I try to run for example the rados I just built using
 "./src/.libs/rados -v"

I get
./src/.libs/rados: symbol lookup error: ./src/.libs/rados: undefined symbol: 
_ZN8librados20ObjectWriteOperation9copy_fromERKSsRKNS_5IoCtxEmj

which decodes to
 librados::ObjectWriteOperation::copy_from(std::string const&, librados::IoCtx 
const&, unsigned long, unsigned int)

Is there any reason why this symbol should not be resolved by the installed 
libraries of v9.1.0?

-- Tom Deneau
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: osd activation under 9.1.0

2015-10-16 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Friday, October 16, 2015 4:35 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: osd activation under 9.1.0
> 
> On Fri, 16 Oct 2015, Deneau, Tom wrote:
> > Using 9.1.0 I am getting the error shown below at ceph-deploy osd
> activate time.
> >
> > + ceph-deploy --overwrite-conf osd activate Intel-2P-Sandy-Bridge-
> 04:/var/local//dev/sdf2:/dev/sdf1
> > ...
> > [][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster
> ceph --mkfs --mkkey -i 4 --monmap /var/local/\
> > /dev/sdf2/activate.monmap --osd-data /var/local//dev/sdf2 --osd-journal
> /var/local//dev/sdf2/journal --osd-uuid 204865df-8dbf-4f26-91f2-
> 5dfa7c3a49f8 --keyring /var/local//dev/sdf2/keyring --setuser ceph --
> setgroup ceph
> > [][WARNIN] 2015-10-16 13:13:41.464615 7f3f40642940 -1
> filestore(/var/local//dev/sdf2) mkjournal error creating journ\
> > al on /var/local//dev/sdf2/journal: (13) Permission denied
> > [][WARNIN] 2015-10-16 13:13:41.464635 7f3f40642940 -1 OSD::mkfs:
> ObjectStore::mkfs failed with error -13
> > [][WARNIN] 2015-10-16 13:13:41.464669 7f3f40642940 -1  ** ERROR: error
> creating empty object store in /var/local//de\
> > v/sdf2: (13) Permission denied
> > [][WARNIN] Traceback (most recent call last):
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 3576, in 
> > [][WARNIN] main(sys.argv[1:])
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 3530, in main
> > [][WARNIN] args.func(args)
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 2432, in main_activate
> > [][WARNIN] init=args.mark_init,
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 2258, in activate_dir
> > [][WARNIN] (osd_id, cluster) = activate(path, activate_key_template,
> init)
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 2360, in activate
> > [][WARNIN] keyring=keyring,
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 1950, in mkfs
> > [][WARNIN] '--setgroup', get_ceph_user(),
> > [][WARNIN]   File "/usr/sbin/ceph-disk", line 349, in command_check_call
> > [][WARNIN] return subprocess.check_call(arguments)
> > [][WARNIN]   File "/usr/lib/python2.7/subprocess.py", line 540, in
> check_call
> > [][WARNIN] raise CalledProcessError(retcode, cmd)
> > [][WARNIN] subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd',
> '--cluster', 'ceph', '--mkfs', '--mkkey', '
> > -i', '4', '--monmap', '/var/local//dev/sdf2/activate.monmap', '--osd-
> data', '/var/local//dev/sdf2', '--osd-journal',
> '/var/local//dev/sdf2/journal', '--osd-uuid', '204865df-8dbf-4f26-91f2-
> 5dfa7c3a49f8', '--keyring', '/var/local//dev/sdf2/keyring', '--setuser',
> 'ceph', '--setgroup\
> > ', 'ceph']' returned non-zero exit status 1
> > [][ERROR ] RuntimeError: command returned non-zero exit status: 1
> > [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk
> -v activate --mark-init upstart --mount /var/local//dev/sdf2
> >
> > When I look at the data disk, I see the following.
> >
> >   -rw-r--r-- 1 root ceph   210 Oct 16 13:13 activate.monmap
> >   -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 ceph_fsid
> >   drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 current
> >   -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 fsid
> >   lrwxrwxrwx 1 root ceph 9 Oct 16 13:13 journal -> /dev/sdf1
> 
> My guess is that /dev/sdf1 needs to be chowned to user ceph.
> 
> sage
> 

Yes, that seemed to do the trick
I had done that chown with the data mount point /var/local/dev/sdf2
but had not done it with the journal partition.

-- Tom

> 
> >   -rw-r--r-- 1 ceph ceph21 Oct 16 13:13 magic
> >   -rw-r--r-- 1 ceph ceph 4 Oct 16 13:13 store_version
> >   -rw-r--r-- 1 ceph ceph53 Oct 16 13:13 superblock
> >   -rw-r--r-- 1 ceph ceph 2 Oct 16 13:13 whoami
> >
> > (The parent directory has
> >   drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 sdf2)
> >
> > I had been creating the partitions myself and then passing them to ceph-
> deploy osd prepare and osd activate.
> > Which worked fine before 9.1.0.
> > Is there some extra permissions setup I need to do for 9.1.0?
> >
> > Alternatively, is there a single-node setup script for 9.1.0 that I can
> look at?
> >
> > -- Tom Deneau
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


osd activation under 9.1.0

2015-10-16 Thread Deneau, Tom
Using 9.1.0 I am getting the error shown below at ceph-deploy osd activate time.

+ ceph-deploy --overwrite-conf osd activate 
Intel-2P-Sandy-Bridge-04:/var/local//dev/sdf2:/dev/sdf1
...
[][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph 
--mkfs --mkkey -i 4 --monmap /var/local/\
/dev/sdf2/activate.monmap --osd-data /var/local//dev/sdf2 --osd-journal 
/var/local//dev/sdf2/journal --osd-uuid 204865df-8dbf-4f26-91f2-5dfa7c3a49f8 
--keyring /var/local//dev/sdf2/keyring --setuser ceph --setgroup ceph
[][WARNIN] 2015-10-16 13:13:41.464615 7f3f40642940 -1 
filestore(/var/local//dev/sdf2) mkjournal error creating journ\
al on /var/local//dev/sdf2/journal: (13) Permission denied
[][WARNIN] 2015-10-16 13:13:41.464635 7f3f40642940 -1 OSD::mkfs: 
ObjectStore::mkfs failed with error -13
[][WARNIN] 2015-10-16 13:13:41.464669 7f3f40642940 -1  ** ERROR: error creating 
empty object store in /var/local//de\
v/sdf2: (13) Permission denied
[][WARNIN] Traceback (most recent call last):
[][WARNIN]   File "/usr/sbin/ceph-disk", line 3576, in 
[][WARNIN] main(sys.argv[1:])
[][WARNIN]   File "/usr/sbin/ceph-disk", line 3530, in main
[][WARNIN] args.func(args)
[][WARNIN]   File "/usr/sbin/ceph-disk", line 2432, in main_activate
[][WARNIN] init=args.mark_init,
[][WARNIN]   File "/usr/sbin/ceph-disk", line 2258, in activate_dir
[][WARNIN] (osd_id, cluster) = activate(path, activate_key_template, init)
[][WARNIN]   File "/usr/sbin/ceph-disk", line 2360, in activate
[][WARNIN] keyring=keyring,
[][WARNIN]   File "/usr/sbin/ceph-disk", line 1950, in mkfs
[][WARNIN] '--setgroup', get_ceph_user(),
[][WARNIN]   File "/usr/sbin/ceph-disk", line 349, in command_check_call
[][WARNIN] return subprocess.check_call(arguments)
[][WARNIN]   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
[][WARNIN] raise CalledProcessError(retcode, cmd)
[][WARNIN] subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', 
'--cluster', 'ceph', '--mkfs', '--mkkey', '
-i', '4', '--monmap', '/var/local//dev/sdf2/activate.monmap', '--osd-data', 
'/var/local//dev/sdf2', '--osd-journal', '/var/local//dev/sdf2/journal', 
'--osd-uuid', '204865df-8dbf-4f26-91f2-5dfa7c3a49f8', '--keyring', 
'/var/local//dev/sdf2/keyring', '--setuser', 'ceph', '--setgroup\
', 'ceph']' returned non-zero exit status 1
[][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v 
activate --mark-init upstart --mount /var/local//dev/sdf2

When I look at the data disk, I see the following.  

  -rw-r--r-- 1 root ceph   210 Oct 16 13:13 activate.monmap
  -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 ceph_fsid
  drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 current
  -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 fsid
  lrwxrwxrwx 1 root ceph 9 Oct 16 13:13 journal -> /dev/sdf1
  -rw-r--r-- 1 ceph ceph21 Oct 16 13:13 magic
  -rw-r--r-- 1 ceph ceph 4 Oct 16 13:13 store_version
  -rw-r--r-- 1 ceph ceph53 Oct 16 13:13 superblock
  -rw-r--r-- 1 ceph ceph 2 Oct 16 13:13 whoami

(The parent directory has
  drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 sdf2)

I had been creating the partitions myself and then passing them to ceph-deploy 
osd prepare and osd activate.
Which worked fine before 9.1.0.
Is there some extra permissions setup I need to do for 9.1.0?

Alternatively, is there a single-node setup script for 9.1.0 that I can look at?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: v9.1.0 Infernalis release candidate released

2015-10-14 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Wednesday, October 14, 2015 4:30 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: v9.1.0 Infernalis release candidate released
> 
> On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > > -Original Message-
> > > From: Sage Weil [mailto:s...@newdream.net]
> > > Sent: Wednesday, October 14, 2015 3:59 PM
> > > To: Deneau, Tom
> > > Cc: ceph-devel@vger.kernel.org
> > > Subject: RE: v9.1.0 Infernalis release candidate released
> > >
> > > On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > > > Trying to bring up a cluster using the pre-built binary packages
> > > > on
> > > Ubuntu Trusty:
> > > > Installed using "ceph-deploy install --dev infernalis `hostname`"
> > > >
> > > > This install seemed to work but then when I later tried
> > > >ceph-deploy --overwrite-conf mon create-initial it failed with
> > > > [][INFO  ] Running command: ceph --cluster=ceph --admin-daemon
> > > > /var/run/ceph/ceph-mon.myhost.asok \ mon_status [][ERROR ]
> > > > admin_socket: exception getting command descriptions: [Errno 2] No
> > > > such file or directory
> > > >
> > > > and indeed /var/run/ceph was empty.
> > > >
> > > > I wasn't sure if this was due to an existing user named ceph (I
> > > > hadn't
> > > > checked) but I did a userdel of ceph and ceph-deploy uninstall and
> > > reinstall.
> > > >
> > > > Now the install part is getting an error near where it tries to
> > > > create
> > > the ceph user.
> > > >
> > > > [][DEBUG ] Adding system user cephdone [][DEBUG ] Setting
> > > > system user ceph properties..Processing triggers for libc-bin
> > > > (2.19-0ubuntu6.6)
> > > ...
> > > > [][WARNIN] usermod: user 'ceph' does not exist
> > > >
> > > > Any suggestions for recovering from this situation?
> > >
> > > I'm guessing this is.. trusty?  Did you remove the package, then
> > > veirfy the user is deleted, then (re)install?  You may need to do a
> > > dpkg purge (not just uninstall/remove) to make it forget its state...
> > >
> > > I'm re-running the ceph-deploy test suite (centos7, trusty) to make
> > > sure nothing is awry...
> > >
> > >   http://pulpito.ceph.com/sage-2015-10-14_13:55:41-ceph-deploy-
> > > infernalis---basic-vps/
> > >
> > > sage
> > >
> >
> > Yes, did the steps above including purge.
> > Could I just manually create the ceph user to get around this?
> 
> You could, but since the above tests just passed, I'm super curious why
> it's failing for you.  This is the relevant piece of code:
> 
>   https://github.com/ceph/ceph/blob/infernalis/debian/ceph-
> common.postinst#L60
> 
> After it fails, is ceph in /etc/passwd?  Is there anything in
> /etc/default/ceph that could be clobbering the defaults?
> 
> sage
> 
> 
Ah, I see part of the problem was that the old user ceph was also part of group 
ceph
and I had not done a groupdel of group ceph.  Having done that, the user 
creation during install
now works and I see in /etc/passwd
  ceph:x:64045:64045:Ceph storage service:/var/lib/ceph:/bin/false


I am still getting this error however from " ceph-deploy --overwrite-conf mon 
create-initial"

[INFO  ] Running command: ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.Intel-2P-Sandy-Bridge-04.asok 
mon_status
[ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No 
such file or directory

I see /var/run/ceph directory is there and owned by user ceph but it is empty.

I will keep poking around.

-- Tom


> 
> >
> > -- Tom
> >
> > >
> > > > -- Tom
> > > >
> > > > > -Original Message-
> > > > > From: Sage Weil [mailto:s...@newdream.net]
> > > > > Sent: Wednesday, October 14, 2015 12:40 PM
> > > > > To: Deneau, Tom
> > > > > Cc: ceph-devel@vger.kernel.org
> > > > > Subject: RE: v9.1.0 Infernalis release candidate released
> > > > >
> > > > > On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > > > > > I tried an rpmbuild on Fedora21 from the tarball which seemed
> > > > > > to work
> > > > > ok.
> > > > > > But having trouble doing "ceph-deploy --overwrite-conf mon
> 

RE: v9.1.0 Infernalis release candidate released

2015-10-14 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Wednesday, October 14, 2015 3:59 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: v9.1.0 Infernalis release candidate released
> 
> On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > Trying to bring up a cluster using the pre-built binary packages on
> Ubuntu Trusty:
> > Installed using "ceph-deploy install --dev infernalis `hostname`"
> >
> > This install seemed to work but then when I later tried
> >ceph-deploy --overwrite-conf mon create-initial it failed with
> > [][INFO  ] Running command: ceph --cluster=ceph --admin-daemon
> > /var/run/ceph/ceph-mon.myhost.asok \ mon_status [][ERROR ]
> > admin_socket: exception getting command descriptions: [Errno 2] No
> > such file or directory
> >
> > and indeed /var/run/ceph was empty.
> >
> > I wasn't sure if this was due to an existing user named ceph (I hadn't
> > checked) but I did a userdel of ceph and ceph-deploy uninstall and
> reinstall.
> >
> > Now the install part is getting an error near where it tries to create
> the ceph user.
> >
> > [][DEBUG ] Adding system user cephdone [][DEBUG ] Setting system
> > user ceph properties..Processing triggers for libc-bin (2.19-0ubuntu6.6)
> ...
> > [][WARNIN] usermod: user 'ceph' does not exist
> >
> > Any suggestions for recovering from this situation?
> 
> I'm guessing this is.. trusty?  Did you remove the package, then veirfy
> the user is deleted, then (re)install?  You may need to do a dpkg purge
> (not just uninstall/remove) to make it forget its state...
> 
> I'm re-running the ceph-deploy test suite (centos7, trusty) to make sure
> nothing is awry...
> 
>   http://pulpito.ceph.com/sage-2015-10-14_13:55:41-ceph-deploy-
> infernalis---basic-vps/
> 
> sage
> 

Yes, did the steps above including purge.
Could I just manually create the ceph user to get around this?

-- Tom

> 
> > -- Tom
> >
> > > -Original Message-
> > > From: Sage Weil [mailto:s...@newdream.net]
> > > Sent: Wednesday, October 14, 2015 12:40 PM
> > > To: Deneau, Tom
> > > Cc: ceph-devel@vger.kernel.org
> > > Subject: RE: v9.1.0 Infernalis release candidate released
> > >
> > > On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > > > I tried an rpmbuild on Fedora21 from the tarball which seemed to
> > > > work
> > > ok.
> > > > But having trouble doing "ceph-deploy --overwrite-conf mon create-
> > > initial" with 9.1.0".
> > > > This is using ceph-deploy version 1.5.24.
> > > > Is this part of the "needs Fedora 22 or later" story?
> > >
> > > Yeah I think so, but it's probably mostly a "tested fc22 and it
> worked"
> > > situation.  THis is probably what is failing:
> > >
> > > https://github.com/ceph/ceph-
> > > deploy/blob/master/ceph_deploy/hosts/fedora/__init__.py#L21
> > >
> > > So maybe the specfile isn't using systemd for fc21?
> > >
> > > sage
> > >
> > >
> > > >
> > > > -- Tom
> > > >
> > > > [myhost][DEBUG ] create a done file to avoid re-doing the mon
> > > > deployment [myhost][DEBUG ] create the init path if it does not
> > > > exist [myhost][DEBUG ] locating the `service` executable...
> > > > [myhost][INFO  ] Running command: /usr/sbin/service ceph -c
> > > > /etc/ceph/ceph.conf start mon.myhost [myhost][WARNIN] The service
> > > > command supports only basic LSB actions (start, stop, restart,
> > > > try-
> > > restart, reload, force-reload, sta\ tus). For other actions, please
> > > try to use systemctl.
> > > > [myhost][ERROR ] RuntimeError: command returned non-zero exit
> status:
> > > > 2 [ceph_deploy.mon][ERROR ] Failed to execute command:
> > > > /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.myhost
> > > > [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > > > > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > > > > Sent: Tuesday, October 13, 2015 4:02 PM
> > > > > To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org; ceph-
> > > > > us...@ceph.com; ceph-maintain...@ceph.com
> > > > > Subje

RE: v9.1.0 Infernalis release candidate released

2015-10-14 Thread Deneau, Tom
Trying to bring up a cluster using the pre-built binary packages on Ubuntu 
Trusty:
Installed using "ceph-deploy install --dev infernalis `hostname`"

This install seemed to work but then when I later tried
   ceph-deploy --overwrite-conf mon create-initial
it failed with
[][INFO  ] Running command: ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.myhost.asok \
mon_status
[][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No 
such file or directory

and indeed /var/run/ceph was empty.

I wasn't sure if this was due to an existing user named ceph (I hadn't checked) 
but I did a userdel of ceph
and ceph-deploy uninstall and reinstall.

Now the install part is getting an error near where it tries to create the ceph 
user.

[][DEBUG ] Adding system user cephdone
[][DEBUG ] Setting system user ceph properties..Processing triggers for 
libc-bin (2.19-0ubuntu6.6) ...
[][WARNIN] usermod: user 'ceph' does not exist

Any suggestions for recovering from this situation?

-- Tom

> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Wednesday, October 14, 2015 12:40 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: v9.1.0 Infernalis release candidate released
> 
> On Wed, 14 Oct 2015, Deneau, Tom wrote:
> > I tried an rpmbuild on Fedora21 from the tarball which seemed to work
> ok.
> > But having trouble doing "ceph-deploy --overwrite-conf mon create-
> initial" with 9.1.0".
> > This is using ceph-deploy version 1.5.24.
> > Is this part of the "needs Fedora 22 or later" story?
> 
> Yeah I think so, but it's probably mostly a "tested fc22 and it worked"
> situation.  THis is probably what is failing:
> 
> https://github.com/ceph/ceph-
> deploy/blob/master/ceph_deploy/hosts/fedora/__init__.py#L21
> 
> So maybe the specfile isn't using systemd for fc21?
> 
> sage
> 
> 
> >
> > -- Tom
> >
> > [myhost][DEBUG ] create a done file to avoid re-doing the mon
> > deployment [myhost][DEBUG ] create the init path if it does not exist
> > [myhost][DEBUG ] locating the `service` executable...
> > [myhost][INFO  ] Running command: /usr/sbin/service ceph -c
> > /etc/ceph/ceph.conf start mon.myhost [myhost][WARNIN] The service
> > command supports only basic LSB actions (start, stop, restart, try-
> restart, reload, force-reload, sta\ tus). For other actions, please try to
> use systemctl.
> > [myhost][ERROR ] RuntimeError: command returned non-zero exit status:
> > 2 [ceph_deploy.mon][ERROR ] Failed to execute command:
> > /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.myhost
> > [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors
> >
> >
> > > -Original Message-
> > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > > Sent: Tuesday, October 13, 2015 4:02 PM
> > > To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org; ceph-
> > > us...@ceph.com; ceph-maintain...@ceph.com
> > > Subject: v9.1.0 Infernalis release candidate released
> > >
> > > This is the first Infernalis release candidate.  There have been
> > > some major changes since hammer, and the upgrade process is non-
> trivial.
> > > Please read carefully.
> > >
> > > Getting the release candidate
> > > -
> > >
> > > The v9.1.0 packages are pushed to the development release
> repositories::
> > >
> > >   http://download.ceph.com/rpm-testing
> > >   http://download.ceph.com/debian-testing
> > >
> > > For for info, see::
> > >
> > >   http://docs.ceph.com/docs/master/install/get-packages/
> > >
> > > Or install with ceph-deploy via::
> > >
> > >   ceph-deploy install --testing HOST
> > >
> > > Known issues
> > > 
> > >
> > > * librbd and librados ABI compatibility is broken.  Be careful
> > >   installing this RC on client machines (e.g., those running qemu).
> > >   It will be fixed in the final v9.2.0 release.
> > >
> > > Major Changes from Hammer
> > > -
> > >
> > > * *General*:
> > >   * Ceph daemons are now managed via systemd (with the exception of
> > > Ubuntu Trusty, which still uses upstart).
> > >   * Ceph daemons run as 'ceph' user instead root.
> > >   * On Red Hat distros, there is also an SELinux policy.
> > > * *RADOS*:
> > >   * The RADOS cache 

RE: v9.1.0 Infernalis release candidate released

2015-10-14 Thread Deneau, Tom
I tried an rpmbuild on Fedora21 from the tarball which seemed to work ok.
But having trouble doing "ceph-deploy --overwrite-conf mon create-initial" with 
9.1.0".
This is using ceph-deploy version 1.5.24.
Is this part of the "needs Fedora 22 or later" story?

-- Tom

[myhost][DEBUG ] create a done file to avoid re-doing the mon deployment
[myhost][DEBUG ] create the init path if it does not exist
[myhost][DEBUG ] locating the `service` executable...
[myhost][INFO  ] Running command: /usr/sbin/service ceph -c /etc/ceph/ceph.conf 
start mon.myhost
[myhost][WARNIN] The service command supports only basic LSB actions (start, 
stop, restart, try-restart, reload, force-reload, sta\
tus). For other actions, please try to use systemctl.
[myhost][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy.mon][ERROR ] Failed to execute command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.myhost
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors


> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Tuesday, October 13, 2015 4:02 PM
> To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org; ceph-
> us...@ceph.com; ceph-maintain...@ceph.com
> Subject: v9.1.0 Infernalis release candidate released
> 
> This is the first Infernalis release candidate.  There have been some
> major changes since hammer, and the upgrade process is non-trivial.
> Please read carefully.
> 
> Getting the release candidate
> -
> 
> The v9.1.0 packages are pushed to the development release repositories::
> 
>   http://download.ceph.com/rpm-testing
>   http://download.ceph.com/debian-testing
> 
> For for info, see::
> 
>   http://docs.ceph.com/docs/master/install/get-packages/
> 
> Or install with ceph-deploy via::
> 
>   ceph-deploy install --testing HOST
> 
> Known issues
> 
> 
> * librbd and librados ABI compatibility is broken.  Be careful
>   installing this RC on client machines (e.g., those running qemu).
>   It will be fixed in the final v9.2.0 release.
> 
> Major Changes from Hammer
> -
> 
> * *General*:
>   * Ceph daemons are now managed via systemd (with the exception of
> Ubuntu Trusty, which still uses upstart).
>   * Ceph daemons run as 'ceph' user instead root.
>   * On Red Hat distros, there is also an SELinux policy.
> * *RADOS*:
>   * The RADOS cache tier can now proxy write operations to the base
> tier, allowing writes to be handled without forcing migration of
> an object into the cache.
>   * The SHEC erasure coding support is no longer flagged as
> experimental. SHEC trades some additional storage space for faster
> repair.
>   * There is now a unified queue (and thus prioritization) of client
> IO, recovery, scrubbing, and snapshot trimming.
>   * There have been many improvements to low-level repair tooling
> (ceph-objectstore-tool).
>   * The internal ObjectStore API has been significantly cleaned up in
> order
> to faciliate new storage backends like NewStore.
> * *RGW*:
>   * The Swift API now supports object expiration.
>   * There are many Swift API compatibility improvements.
> * *RBD*:
>   * The ``rbd du`` command shows actual usage (quickly, when
> object-map is enabled).
>   * The object-map feature has seen many stability improvements.
>   * Object-map and exclusive-lock features can be enabled or disabled
> dynamically.
>   * You can now store user metadata and set persistent librbd options
> associated with individual images.
>   * The new deep-flatten features allows flattening of a clone and all
> of its snapshots.  (Previously snapshots could not be flattened.)
>   * The export-diff command command is now faster (it uses aio).  There is
> also
> a new fast-diff feature.
>   * The --size argument can be specified with a suffix for units
> (e.g., ``--size 64G``).
>   * There is a new ``rbd status`` command that, for now, shows who has
> the image open/mapped.
> * *CephFS*:
>   * You can now rename snapshots.
>   * There have been ongoing improvements around administration,
> diagnostics,
> and the check and repair tools.
>   * The caching and revocation of client cache state due to unused
> inodes has been dramatically improved.
>   * The ceph-fuse client behaves better on 32-bit hosts.
> 
> Distro compatibility
> 
> 
> We have decided to drop support for many older distributions so that we
> can move to a newer compiler toolchain (e.g., C++11).  Although it is
> still possible to build Ceph on older distributions by installing
> backported development tools, we are not building and publishing release
> packages for ceph.com.
> 
> In particular,
> 
> * CentOS 7 or later; we have dropped support for CentOS 6 (and other
>   RHEL 6 derivatives, like Scientific Linux 6).
> * Debian Jessie 8.x or later; Debian Wheezy 7.x's g++ has i

RE: throttles

2015-10-13 Thread Deneau, Tom
I remember previously there were some options that could be reset
thru the admin socket and some that required an osd restart.
Do the ones below require an osd restart?

-- Tom

> -Original Message-
> From: Somnath Roy [mailto:somnath@sandisk.com]
> Sent: Tuesday, October 13, 2015 10:57 AM
> To: Deneau, Tom; Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: throttles
> 
> BTW, you can completely turn off these throttles ( other than the
> filestore throttle ) by setting the value to 0.
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> Sent: Tuesday, October 13, 2015 8:55 AM
> To: Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: throttles
> 
> > -Original Message-
> > From: Sage Weil [mailto:s...@newdream.net]
> > Sent: Tuesday, October 13, 2015 7:44 AM
> > To: Deneau, Tom
> > Cc: ceph-devel@vger.kernel.org
> > Subject: Re: throttles
> >
> > On Mon, 12 Oct 2015, Deneau, Tom wrote:
> > > Looking at the perf counters on my osds, I see wait counts for the
> > > following throttle related perf counters:  (This is from trying to
> > > benchmark using multiple rados bench client processes).
> > >
> > >throttle-filestore_bytes
> >
> > OPTION(filestore_queue_max_ops, OPT_INT, 50)
> > OPTION(filestore_queue_max_bytes, OPT_INT, 100 << 20)
> >
> > >throttle-msgr_dispatch_throttler-client
> >
> > OPTION(ms_dispatch_throttle_bytes, OPT_U64, 100 << 20)
> >
> > >throttle-osd_client_bytes
> > >throttle-osd_client_messages
> >
> > OPTION(osd_client_message_size_cap, OPT_U64, 500*1024L*1024L) //
> > client data allowed in-memory (in bytes)
> > OPTION(osd_client_message_cap, OPT_U64, 100)  // num client
> > messages allowed in-memory
> >
> > > What are the config variables that would allow me to experiment with
> > these throttle limits?
> > > (When I look at the output from --admin-daemon osd.xx.asok config
> > > show, it's not clear which items these correspond to).
> >
> > These are all involved in slowing down clients to the rate of the
> > storage...
> >
> > sage
> 
> Thanks, Sage and Somnath --
> 
> The reason I was interested is that I am seeing a levelling off of read
> bandwidth while the sar data shows we are not anywhere near the limits for
> cpu utilization (client or osd node) nor for disk utilization (no disk
> above 40% utilization) nor for network utilization (this is with 10Gig
> Ethernet).
> 
> -- Tom
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please
> notify the sender by telephone or e-mail (as shown above) immediately and
> destroy any and all copies of this message in your possession (whether
> hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: throttles

2015-10-13 Thread Deneau, Tom
> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Tuesday, October 13, 2015 7:44 AM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: throttles
> 
> On Mon, 12 Oct 2015, Deneau, Tom wrote:
> > Looking at the perf counters on my osds, I see wait counts for the
> > following throttle related perf counters:  (This is from trying to
> > benchmark using multiple rados bench client processes).
> >
> >throttle-filestore_bytes
> 
> OPTION(filestore_queue_max_ops, OPT_INT, 50)
> OPTION(filestore_queue_max_bytes, OPT_INT, 100 << 20)
> 
> >throttle-msgr_dispatch_throttler-client
> 
> OPTION(ms_dispatch_throttle_bytes, OPT_U64, 100 << 20)
> 
> >throttle-osd_client_bytes
> >throttle-osd_client_messages
> 
> OPTION(osd_client_message_size_cap, OPT_U64, 500*1024L*1024L) // client
> data allowed in-memory (in bytes)
> OPTION(osd_client_message_cap, OPT_U64, 100)  // num client
> messages allowed in-memory
> 
> > What are the config variables that would allow me to experiment with
> these throttle limits?
> > (When I look at the output from --admin-daemon osd.xx.asok config
> > show, it's not clear which items these correspond to).
> 
> These are all involved in slowing down clients to the rate of the
> storage...
> 
> sage

Thanks, Sage and Somnath --

The reason I was interested is that I am seeing a levelling
off of read bandwidth while the sar data shows we are not anywhere
near the limits for cpu utilization (client or osd node) nor
for disk utilization (no disk above 40% utilization) nor for
network utilization (this is with 10Gig Ethernet).

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


throttles

2015-10-12 Thread Deneau, Tom
Looking at the perf counters on my osds, I see wait counts for the following
throttle related perf counters:  (This is from trying to benchmark using
multiple rados bench client processes).

   throttle-filestore_bytes
   throttle-msgr_dispatch_throttler-client
   throttle-osd_client_bytes
   throttle-osd_client_messages

What are the config variables that would allow me to experiment with these 
throttle limits?
(When I look at the output from --admin-daemon osd.xx.asok config show, it's
not clear which items these correspond to).

-- Tom Deneau


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dump_historic_ops, slow requests

2015-10-12 Thread Deneau, Tom
I have a small ceph cluster (3 nodes, 5 osds each, journals all just partitions
on the spinner disks) and I have noticed that when I hit it with a bunch of
rados bench clients all doing writes of large (40M objects) with --no-cleanup,
the rados bench commands seem to finish OK but I often get health warnings like
HEALTH_WARN 4 requests are blocked > 32 sec;
2 osds have slow requests 3 ops are blocked > 32.768 sec on 
osd.9
1 ops are blocked > 32.768 sec on osd.10
2 osds have slow requests
After a couple of minutes, health goes to HEALTH_OK.

But if I go to the node containing osd.10 for example and do dump_historic_ops
I do get lots of around 20-sec durations but nothing over 32 sec.

The 20-sec or so ops are always  "ack+ondisk+write+known_if_redirected"
with type_data = "commit sent: apply or cleanup"
and the following are typical event timings

   initiated: 14:06:58.205937
  reached_pg: 14:07:01.823288, gap=  3617.351
 started: 14:07:01.823359, gap= 0.071
   waiting for subops from 3: 14:07:01.855259, gap=31.900
 commit_queued_for_journal_write: 14:07:03.132697, gap=  1277.438
  write_thread_in_journal_buffer: 14:07:03.143356, gap=10.659
 journaled_completion_queued: 14:07:04.175863, gap=  1032.507
   op_commit: 14:07:04.585040, gap=   409.177
  op_applied: 14:07:04.589751, gap= 4.711
sub_op_commit_rec from 3: 14:07:14.682925, gap= 10093.174
 commit_sent: 14:07:14.683081, gap= 0.156
done: 14:07:14.683119, gap= 0.038

Should I expect to see a historic op with duration greater than 32 sec?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: perf counters from a performance discrepancy

2015-10-08 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: Wednesday, October 07, 2015 9:48 PM
> To: Deneau, Tom
> Cc: Mark Nelson; Gregory Farnum; ceph-devel@vger.kernel.org
> Subject: RE: perf counters from a performance discrepancy
> 
> > I finally got around to looking at the dump_historic_ops output for
> > the 1-client and 2-client cases.
> > As you recall these are all read-ops.  so the events in the dump are
> >initiated
> >reached_pg
> >started
> >done
> >
> > The pattern I see for most of the slow ops recorded in the dump is:
> >
> >* In the 1-client case the typical slow op has duration between 50-65
> ms
> >  and usually most of this is the interval between reached_pg and
> started.
> >
> >* In the 2-client case the typical slow op has duration between 95-
> 120 ms
> >  and again usually most of this is the interval between reached_pg
> >  and started.
> >
> > Could someone describe what the interval between reached_pg and
> > started means?
> 
> I think the slow part is probably find_object_context() (although to be
> fair tons of stuff happens here, see do_op()).  You could test this theory
> or otherwise narrow this down with additional event markers lke
> 
> diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index
> d6f3084..6faccc2 100644
> --- a/src/osd/ReplicatedPG.cc
> +++ b/src/osd/ReplicatedPG.cc
> @@ -1691,10 +1691,12 @@ void ReplicatedPG::do_op(OpRequestRef& op)
>  return;
>}
> 
> +  op->mark_event("about to find");
>int r = find_object_context(
>  oid, &obc, can_create,
>  m->has_flag(CEPH_OSD_FLAG_MAP_SNAP_CLONE),
>  &missing_oid);
> +  op->mark_event("found");
> 
>if (r == -EAGAIN) {
>  // If we're not the primary of this OSD, and we have
> 
> 
> sage

Sage --

Is it likely that find_object_context would take longer when there are two 
clients
each using their own pool (compared to one client using one pool)?

And would two clients using the same pool spend less time in 
find_object_context?

-- Tom
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: perf counters from a performance discrepancy

2015-10-07 Thread Deneau, Tom
> -Original Message-
> From: Deneau, Tom
> Sent: Wednesday, September 23, 2015 3:05 PM
> To: 'Mark Nelson'; Gregory Farnum; Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: perf counters from a performance discrepancy
> 
> 
> 
> > -Original Message-
> > From: Mark Nelson [mailto:mnel...@redhat.com]
> > Sent: Wednesday, September 23, 2015 1:43 PM
> > To: Gregory Farnum; Sage Weil
> > Cc: Deneau, Tom; ceph-devel@vger.kernel.org
> > Subject: Re: perf counters from a performance discrepancy
> >
> >
> >
> > On 09/23/2015 01:25 PM, Gregory Farnum wrote:
> > > On Wed, Sep 23, 2015 at 11:19 AM, Sage Weil  wrote:
> > >> On Wed, 23 Sep 2015, Deneau, Tom wrote:
> > >>> Hi all --
> > >>>
> > >>> Looking for guidance with perf counters...
> > >>> I am trying to see whether the perf counters can tell me anything
> > >>> about the following discrepancy
> > >>>
> > >>> I populate a number of 40k size objects in each of two pools,
> > >>> poolA
> > and poolB.
> > >>> Both pools cover osds on a single node, 5 osds total.
> > >>>
> > >>> * Config 1 (1p):
> > >>>* use single rados bench client with 32 threads to do seq
> > >>> read
> > of 2 objects from poolA.
> > >>>
> > >>> * Config 2 (2p):
> > >>>* use two concurrent rados bench clients (running on same
> > client node) with 16 threads each,
> > >>> one reading 1 objects from poolA,
> > >>> one reading 1 objects from poolB,
> > >>>
> > >>> So in both configs, we have 32 threads total and the number of
> > >>> objects
> > read is the same.
> > >>> Note: in all cases, we drop the caches before doing the seq reads
> > >>>
> > >>> The combined bandwidth (MB/sec) for the 2 clients in config 2 is
> > >>> about 1/3 of the bandwidth for the single client in config 1.
> > >>
> > >> How were the object written?  I assume the cluster is backed by
> > >> spinning disks?
> > >>
> > >> I wonder if this is a disk layout issue.  If the 20,000 objects are
> > >> written in order, they willb e roughly sequential on disk, and the
> > >> 32 thread case will read them in order.  In the 2x 10,000 case, the
> > >> two clients are reading two sequences of objects written at
> > >> different times, and the disk arms will be swinging around more.
> > >>
> > >> My guess is that if the reads were reading the objects in a random
> > >> order the performance would be the same... I'm not sure that rados
> > >> bench does that though?
> > >>
> > >> sage
> > >>
> > >>>
> > >>>
> > >>> I gathered perf counters before and after each run and looked at
> > >>> the difference of the before and after counters for both the 1p
> > >>> and 2p cases.  Here are some things I noticed that are different
> > >>> between the two runs.  Can someone take a look and let me know
> > >>> whether any of these differences are significant.  In particular,
> > >>> for the
> > throttle-msgr_dispatch_throttler ones, since I don't know the detailed
> > definitions of these fields.
> > >>> Note: these are the numbers for one of the 5 osds, the other osds
> > >>> are
> > similar...
> > >>>
> > >>> * The field osd/loadavg is always about 3 times higher on the 2p c
> > >>>
> > >>> some latency-related counters
> > >>> --
> > >>> osd/op_latency/sum 1p=6.24801117205061, 2p=579.722513078945
> > >>> osd/op_process_latency/sum 1p=3.48506945394911,
> > >>> 2p=42.6278494549915 osd/op_r_latency/sum 1p=6.2480111719924,
> > >>> 2p=579.722513079003 osd/op_r_process_latency/sum
> > >>> 1p=3.48506945399276,
> > >>> 2p=42.6278494550061
> > >
> > > So, yep, the individual read ops are taking much longer in the
> > > two-client case. Naively that's pretty odd.
> > >
> > >>>
> > >>>
> > >>> and some throttle-msgr_dispatch_throttler related counters
> > >>> ---

RE: perf counters from a performance discrepancy

2015-09-23 Thread Deneau, Tom


> -Original Message-
> From: Gregory Farnum [mailto:gfar...@redhat.com]
> Sent: Wednesday, September 23, 2015 3:39 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: perf counters from a performance discrepancy
> 
> On Wed, Sep 23, 2015 at 9:33 AM, Deneau, Tom  wrote:
> > Hi all --
> >
> > Looking for guidance with perf counters...
> > I am trying to see whether the perf counters can tell me anything
> > about the following discrepancy
> >
> > I populate a number of 40k size objects in each of two pools, poolA and
> poolB.
> > Both pools cover osds on a single node, 5 osds total.
> >
> >* Config 1 (1p):
> >   * use single rados bench client with 32 threads to do seq read of
> 2 objects from poolA.
> >
> >* Config 2 (2p):
> >   * use two concurrent rados bench clients (running on same client
> node) with 16 threads each,
> >one reading 1 objects from poolA,
> >one reading 1 objects from poolB,
> >
> > So in both configs, we have 32 threads total and the number of objects
> read is the same.
> > Note: in all cases, we drop the caches before doing the seq reads
> >
> > The combined bandwidth (MB/sec) for the 2 clients in config 2 is about
> > 1/3 of the bandwidth for the single client in config 1.
> >
> >
> > I gathered perf counters before and after each run and looked at the
> > difference of the before and after counters for both the 1p and 2p
> > cases.  Here are some things I noticed that are different between the
> > two runs.  Can someone take a look and let me know whether any of
> > these differences are significant.  In particular, for the throttle-
> msgr_dispatch_throttler ones, since I don't know the detailed definitions
> of these fields.
> > Note: these are the numbers for one of the 5 osds, the other osds are
> similar...
> >
> > * The field osd/loadavg is always about 3 times higher on the 2p c
> >
> > some latency-related counters
> > --
> > osd/op_latency/sum 1p=6.24801117205061, 2p=579.722513078945
> > osd/op_process_latency/sum 1p=3.48506945394911, 2p=42.6278494549915
> > osd/op_r_latency/sum 1p=6.2480111719924, 2p=579.722513079003
> > osd/op_r_process_latency/sum 1p=3.48506945399276, 2p=42.6278494550061
> 
> So if you've got 20k objects and 5 OSDs then each OSD is getting ~4k reads
> during this test. Which if I'm reading these properly means OSD-side
> latency is something like 1.5 milliseconds for the single client and...144
> milliseconds for the two-client case! You might try dumping some of the
> historic ops out of the admin socket and seeing where the time is getting
> spent (is it all on disk accesses?). And trying to reproduce something
> like this workload on your disks without Ceph involved.
> -Greg

Greg --

Not sure how much it matters but in looking at the pools more closely I
was getting mixed up with an earlier experiment with pools that just used 5 
osds.
The pools for this example actually distributed across 15 osds on 3 nodes.

What is the recommended command for dumping historic ops out of the admin 
socket?

-- Tom






RE: perf counters from a performance discrepancy

2015-09-23 Thread Deneau, Tom
I will be out of office for a week but will put this on the list of things to 
try when I get back.

-- Tom

> -Original Message-
> From: Samuel Just [mailto:sj...@redhat.com]
> Sent: Wednesday, September 23, 2015 3:28 PM
> To: Deneau, Tom
> Cc: Mark Nelson; Gregory Farnum; Sage Weil; ceph-devel@vger.kernel.org
> Subject: Re: perf counters from a performance discrepancy
> 
> Just to eliminate a variable, can you reproduce this on master, first with
> the simple messenger, and then with the async messenger? (make sure to
> switch the messengers on all daemons and clients, just put it in the
> [global] section on all configs).
> -Sam
> 
> On Wed, Sep 23, 2015 at 1:05 PM, Deneau, Tom  wrote:
> >
> >
> >> -Original Message-
> >> From: Mark Nelson [mailto:mnel...@redhat.com]
> >> Sent: Wednesday, September 23, 2015 1:43 PM
> >> To: Gregory Farnum; Sage Weil
> >> Cc: Deneau, Tom; ceph-devel@vger.kernel.org
> >> Subject: Re: perf counters from a performance discrepancy
> >>
> >>
> >>
> >> On 09/23/2015 01:25 PM, Gregory Farnum wrote:
> >> > On Wed, Sep 23, 2015 at 11:19 AM, Sage Weil 
> wrote:
> >> >> On Wed, 23 Sep 2015, Deneau, Tom wrote:
> >> >>> Hi all --
> >> >>>
> >> >>> Looking for guidance with perf counters...
> >> >>> I am trying to see whether the perf counters can tell me anything
> >> >>> about the following discrepancy
> >> >>>
> >> >>> I populate a number of 40k size objects in each of two pools,
> >> >>> poolA
> >> and poolB.
> >> >>> Both pools cover osds on a single node, 5 osds total.
> >> >>>
> >> >>> * Config 1 (1p):
> >> >>>* use single rados bench client with 32 threads to do seq
> >> >>> read
> >> of 2 objects from poolA.
> >> >>>
> >> >>> * Config 2 (2p):
> >> >>>* use two concurrent rados bench clients (running on same
> >> client node) with 16 threads each,
> >> >>> one reading 1 objects from poolA,
> >> >>> one reading 1 objects from poolB,
> >> >>>
> >> >>> So in both configs, we have 32 threads total and the number of
> >> >>> objects
> >> read is the same.
> >> >>> Note: in all cases, we drop the caches before doing the seq reads
> >> >>>
> >> >>> The combined bandwidth (MB/sec) for the 2 clients in config 2 is
> >> >>> about 1/3 of the bandwidth for the single client in config 1.
> >> >>
> >> >> How were the object written?  I assume the cluster is backed by
> >> >> spinning disks?
> >> >>
> >> >> I wonder if this is a disk layout issue.  If the 20,000 objects
> >> >> are written in order, they willb e roughly sequential on disk, and
> >> >> the 32 thread case will read them in order.  In the 2x 10,000
> >> >> case, the two clients are reading two sequences of objects written
> >> >> at different times, and the disk arms will be swinging around more.
> >> >>
> >> >> My guess is that if the reads were reading the objects in a random
> >> >> order the performance would be the same... I'm not sure that rados
> >> >> bench does that though?
> >> >>
> >> >> sage
> >> >>
> >> >>>
> >> >>>
> >> >>> I gathered perf counters before and after each run and looked at
> >> >>> the difference of the before and after counters for both the 1p
> >> >>> and 2p cases.  Here are some things I noticed that are different
> >> >>> between the two runs.  Can someone take a look and let me know
> >> >>> whether any of these differences are significant.  In particular,
> >> >>> for the
> >> throttle-msgr_dispatch_throttler ones, since I don't know the
> >> detailed definitions of these fields.
> >> >>> Note: these are the numbers for one of the 5 osds, the other osds
> >> >>> are
> >> similar...
> >> >>>
> >> >>> * The field osd/loadavg is always about 3 times higher on the 2p
> >> >>> c
> >> >>>
> >> >>> some latency-related counte

RE: perf counters from a performance discrepancy

2015-09-23 Thread Deneau, Tom


> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: Wednesday, September 23, 2015 1:43 PM
> To: Gregory Farnum; Sage Weil
> Cc: Deneau, Tom; ceph-devel@vger.kernel.org
> Subject: Re: perf counters from a performance discrepancy
> 
> 
> 
> On 09/23/2015 01:25 PM, Gregory Farnum wrote:
> > On Wed, Sep 23, 2015 at 11:19 AM, Sage Weil  wrote:
> >> On Wed, 23 Sep 2015, Deneau, Tom wrote:
> >>> Hi all --
> >>>
> >>> Looking for guidance with perf counters...
> >>> I am trying to see whether the perf counters can tell me anything
> >>> about the following discrepancy
> >>>
> >>> I populate a number of 40k size objects in each of two pools, poolA
> and poolB.
> >>> Both pools cover osds on a single node, 5 osds total.
> >>>
> >>> * Config 1 (1p):
> >>>* use single rados bench client with 32 threads to do seq read
> of 2 objects from poolA.
> >>>
> >>> * Config 2 (2p):
> >>>* use two concurrent rados bench clients (running on same
> client node) with 16 threads each,
> >>> one reading 1 objects from poolA,
> >>> one reading 1 objects from poolB,
> >>>
> >>> So in both configs, we have 32 threads total and the number of objects
> read is the same.
> >>> Note: in all cases, we drop the caches before doing the seq reads
> >>>
> >>> The combined bandwidth (MB/sec) for the 2 clients in config 2 is
> >>> about 1/3 of the bandwidth for the single client in config 1.
> >>
> >> How were the object written?  I assume the cluster is backed by
> >> spinning disks?
> >>
> >> I wonder if this is a disk layout issue.  If the 20,000 objects are
> >> written in order, they willb e roughly sequential on disk, and the 32
> >> thread case will read them in order.  In the 2x 10,000 case, the two
> >> clients are reading two sequences of objects written at different
> >> times, and the disk arms will be swinging around more.
> >>
> >> My guess is that if the reads were reading the objects in a random
> >> order the performance would be the same... I'm not sure that rados
> >> bench does that though?
> >>
> >> sage
> >>
> >>>
> >>>
> >>> I gathered perf counters before and after each run and looked at the
> >>> difference of the before and after counters for both the 1p and 2p
> >>> cases.  Here are some things I noticed that are different between
> >>> the two runs.  Can someone take a look and let me know whether any
> >>> of these differences are significant.  In particular, for the
> throttle-msgr_dispatch_throttler ones, since I don't know the detailed
> definitions of these fields.
> >>> Note: these are the numbers for one of the 5 osds, the other osds are
> similar...
> >>>
> >>> * The field osd/loadavg is always about 3 times higher on the 2p c
> >>>
> >>> some latency-related counters
> >>> --
> >>> osd/op_latency/sum 1p=6.24801117205061, 2p=579.722513078945
> >>> osd/op_process_latency/sum 1p=3.48506945394911, 2p=42.6278494549915
> >>> osd/op_r_latency/sum 1p=6.2480111719924, 2p=579.722513079003
> >>> osd/op_r_process_latency/sum 1p=3.48506945399276,
> >>> 2p=42.6278494550061
> >
> > So, yep, the individual read ops are taking much longer in the
> > two-client case. Naively that's pretty odd.
> >
> >>>
> >>>
> >>> and some throttle-msgr_dispatch_throttler related counters
> >>> --
> >>> throttle-msgr_dispatch_throttler-client/get 1p=1337, 2p=1339, diff=2
> >>> throttle-msgr_dispatch_throttler-client/get_sum 1p=222877,
> >>> 2p=223088, diff=211 throttle-msgr_dispatch_throttler-client/put
> >>> 1p=1337, 2p=1339, diff=2
> >>> throttle-msgr_dispatch_throttler-client/put_sum 1p=222877,
> >>> 2p=223088, diff=211
> >>> throttle-msgr_dispatch_throttler-hb_back_server/get 1p=58, 2p=134,
> >>> diff=76 throttle-msgr_dispatch_throttler-hb_back_server/get_sum
> >>> 1p=2726, 2p=6298, diff=3572
> >>> throttle-msgr_dispatch_throttler-hb_back_server/put 1p=58, 2p=134,
> >>> diff=76 throttle-msgr_dispatch_throttler-hb_back_server/put_sum
> >>> 1p=2726, 2p=6298, dif

perf counters from a performance discrepancy

2015-09-23 Thread Deneau, Tom
Hi all --

Looking for guidance with perf counters...
I am trying to see whether the perf counters can tell me anything about the 
following discrepancy

I populate a number of 40k size objects in each of two pools, poolA and poolB.
Both pools cover osds on a single node, 5 osds total.

   * Config 1 (1p): 
  * use single rados bench client with 32 threads to do seq read of 2 
objects from poolA.

   * Config 2 (2p):
  * use two concurrent rados bench clients (running on same client node) 
with 16 threads each,
   one reading 1 objects from poolA,
   one reading 1 objects from poolB,

So in both configs, we have 32 threads total and the number of objects read is 
the same.
Note: in all cases, we drop the caches before doing the seq reads

The combined bandwidth (MB/sec) for the 2 clients in config 2 is about 1/3 of 
the bandwidth for
the single client in config 1.


I gathered perf counters before and after each run and looked at the difference 
of
the before and after counters for both the 1p and 2p cases.  Here are some 
things I noticed
that are different between the two runs.  Can someone take a look and let me 
know
whether any of these differences are significant.  In particular, for the
throttle-msgr_dispatch_throttler ones, since I don't know the detailed 
definitions of these fields.
Note: these are the numbers for one of the 5 osds, the other osds are similar...

* The field osd/loadavg is always about 3 times higher on the 2p c

some latency-related counters
--
osd/op_latency/sum 1p=6.24801117205061, 2p=579.722513078945
osd/op_process_latency/sum 1p=3.48506945394911, 2p=42.6278494549915
osd/op_r_latency/sum 1p=6.2480111719924, 2p=579.722513079003
osd/op_r_process_latency/sum 1p=3.48506945399276, 2p=42.6278494550061


and some throttle-msgr_dispatch_throttler related counters
--
throttle-msgr_dispatch_throttler-client/get 1p=1337, 2p=1339, diff=2
throttle-msgr_dispatch_throttler-client/get_sum 1p=222877, 2p=223088, diff=211
throttle-msgr_dispatch_throttler-client/put 1p=1337, 2p=1339, diff=2
throttle-msgr_dispatch_throttler-client/put_sum 1p=222877, 2p=223088, diff=211
throttle-msgr_dispatch_throttler-hb_back_server/get 1p=58, 2p=134, diff=76
throttle-msgr_dispatch_throttler-hb_back_server/get_sum 1p=2726, 2p=6298, 
diff=3572
throttle-msgr_dispatch_throttler-hb_back_server/put 1p=58, 2p=134, diff=76
throttle-msgr_dispatch_throttler-hb_back_server/put_sum 1p=2726, 2p=6298, 
diff=3572
throttle-msgr_dispatch_throttler-hb_front_server/get 1p=58, 2p=134, diff=76
throttle-msgr_dispatch_throttler-hb_front_server/get_sum 1p=2726, 2p=6298, 
diff=3572
throttle-msgr_dispatch_throttler-hb_front_server/put 1p=58, 2p=134, diff=76
throttle-msgr_dispatch_throttler-hb_front_server/put_sum 1p=2726, 2p=6298, 
diff=3572
throttle-msgr_dispatch_throttler-hbclient/get 1p=168, 2p=252, diff=84
throttle-msgr_dispatch_throttler-hbclient/get_sum 1p=7896, 2p=11844, diff=3948
throttle-msgr_dispatch_throttler-hbclient/put 1p=168, 2p=252, diff=84
throttle-msgr_dispatch_throttler-hbclient/put_sum 1p=7896, 2p=11844, diff=3948

-- Tom Deneau, AMD

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: rados bench object not correct errors on v9.0.3

2015-08-26 Thread Deneau, Tom

> -Original Message-
> From: Dałek, Piotr [mailto:piotr.da...@ts.fujitsu.com]
> Sent: Wednesday, August 26, 2015 2:02 AM
> To: Sage Weil; Deneau, Tom
> Cc: ceph-devel@vger.kernel.org; ceph-us...@ceph.com
> Subject: RE: rados bench object not correct errors on v9.0.3
> 
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > Sent: Tuesday, August 25, 2015 7:43 PM
> 
> > > I have built rpms from the tarball http://ceph.com/download/ceph-
> > 9.0.3.tar.bz2.
> > > Have done this for fedora 21 x86_64 and for aarch64.  On both
> > > platforms when I run a single node "cluster" with a few osds and run
> > > rados bench read tests (either seq or rand) I get occasional reports
> > > like
> > >
> > > benchmark_data_myhost_20729_object73 is not correct!
> > >
> > > I never saw these with similar rpm builds on these platforms from
> > > 9.0.2
> > sources.
> > >
> > > Also, if I go to an x86-64 system running Ubuntu trusty for which I
> > > am able to install prebuilt binary packages via
> > > ceph-deploy install --dev v9.0.3
> > >
> > > I do not see the errors there.
> >
> > Hrm.. haven't seen it on this end, but we're running/testing master
> > and not
> > 9.0.2 specifically.  If you can reproduce this on master, that'd be very
> helpful!
> >
> > There have been some recent changes to rados bench... Piotr, does this
> > seem like it might be caused by your changes?
> 
> Yes. My PR #4690 (https://github.com/ceph/ceph/pull/4690) caused rados bench
> to be fast enough to sometimes run into race condition between librados's AIO
> and objbencher processing. That was fixed in PR #5152
> (https://github.com/ceph/ceph/pull/5152) which didn't make it into 9.0.3.
> Tom, you can confirm this by inspecting the contents of objects questioned
> (their contents should be perfectly fine and I in line with other objects).
> In the meantime you can either apply patch from PR #5152 on your own or use -
> -no-verify.
> 
> With best regards / Pozdrawiam
> Piotr Dałek

Piotr --

Thank you.  Yes, when I looked at the contents of the objects they always
looked correct.  And yes a single object would sometimes report an error
and sometimes not.  So a race condition makes sense.

A couple of questions:

   * Why would I not see this behavior using the pre-built 9.0.3 binaries
 that get installed using "ceph-deploy install --dev v9.0.3"?  I would 
assume
 this is built from the same sources as the 9.0.3 tarball.

   * So I assume one should not compare pre 9.0.3 rados bench numbers with 
9.0.3 and after?
 The pull request https://github.com/ceph/ceph/pull/4690 did not mention the
 effect on final bandwidth numbers, did you notice what that effect was?

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> Sent: Tuesday, August 25, 2015 1:24 PM
> To: Sage Weil
> Cc: ceph-devel@vger.kernel.org; ceph-us...@ceph.com;
> piotr.da...@ts.fujitsu.com
> Subject: RE: rados bench object not correct errors on v9.0.3
> 
> 
> 
> > -Original Message-
> > From: Sage Weil [mailto:sw...@redhat.com]
> > Sent: Tuesday, August 25, 2015 12:43 PM
> > To: Deneau, Tom
> > Cc: ceph-devel@vger.kernel.org; ceph-us...@ceph.com;
> > piotr.da...@ts.fujitsu.com
> > Subject: Re: rados bench object not correct errors on v9.0.3
> >
> > On Tue, 25 Aug 2015, Deneau, Tom wrote:
> > > > -Original Message-
> > > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > > > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > > > Sent: Monday, August 24, 2015 12:45 PM
> > > > To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org;
> > > > ceph-us...@ceph.com; ceph-maintain...@ceph.com
> > > > Subject: v9.0.3 released
> > > >
> > > > This is the second to last batch of development work for the
> > > > Infernalis cycle.  The most intrusive change is an internal (non
> > > > user-visible) change to the OSD's ObjectStore interface.  Many
> > > > fixes and improvements elsewhere across RGW, RBD, and another big
> > > > pile of
> > CephFS scrub/repair improvements.
> > > >
> > > >
> > > > Getting Ceph
> > > > 
> > > >
> > > > * Git at git://github.com/ceph/ceph.git
> > > > * Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
> > > > * For packages, see
> > > > http://ceph.com/docs/master/install/get-packages
> > > > * For ceph-deploy, see
> > > > http://ceph.com/docs/master/install/install-ceph-
> > > > deploy
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > ceph-devel" in the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > > I have built rpms from the tarball http://ceph.com/download/ceph-
> > 9.0.3.tar.bz2.
> > > Have done this for fedora 21 x86_64 and for aarch64.  On both
> > > platforms when I run a single node "cluster" with a few osds and run
> > > rados bench read tests (either seq or rand) I get occasional reports
> > > like
> > >
> > > benchmark_data_myhost_20729_object73 is not correct!
> > >
> > > I never saw these with similar rpm builds on these platforms from
> > > 9.0.2
> > sources.
> > >
> > > Also, if I go to an x86-64 system running Ubuntu trusty for which I
> > > am able to install prebuilt binary packages via
> > > ceph-deploy install --dev v9.0.3
> > >
> > > I do not see the errors there.
> >
> > Hrm.. haven't seen it on this end, but we're running/testing master
> > and not
> > 9.0.2 specifically.  If you can reproduce this on master, that'd be
> > very helpful!
> >
> > There have been some recent changes to rados bench... Piotr, does this
> > seem like it might be caused by your changes?
> >
> > sage
> >
> 
> Just as a reminder this is with 9.0.3, not 9.0.2.
> 
> I just tried with the osds running on the fedora machine (with rpms that I
> built from the tarball) and rados bench running on the Ubuntu machine (with
> pre-built binary packages) and I do not see the errors with that combination.
> 
> Will see what happens with master.
> 
> -- Tom
>

For making a tarball to build rpms from master, I did the following steps:
 
#  git checkout master
#  ./autogen.sh
#  ./configure
#  make dist-bzip2

then put the  .bz2 file in the rpmbuild/SOURCES and put the spec file in 
rpmbuild/SPECS

Are those the correct steps?  Asking because when I do rpmbuild from the above 
I eventually get

Processing files: ceph-9.0.3-0.fc21.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/ceph-9.0.3-0.fc21.x86_64/usr/sbin/ceph-disk-activate
error: File not found: 
/root/rpmbuild/BUILDROOT/ceph-9.0.3-0.fc21.x86_64/usr/sbin/ceph-disk-prepare

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: Tuesday, August 25, 2015 12:43 PM
> To: Deneau, Tom
> Cc: ceph-devel@vger.kernel.org; ceph-us...@ceph.com;
> piotr.da...@ts.fujitsu.com
> Subject: Re: rados bench object not correct errors on v9.0.3
> 
> On Tue, 25 Aug 2015, Deneau, Tom wrote:
> > > -Original Message-
> > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > > Sent: Monday, August 24, 2015 12:45 PM
> > > To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org;
> > > ceph-us...@ceph.com; ceph-maintain...@ceph.com
> > > Subject: v9.0.3 released
> > >
> > > This is the second to last batch of development work for the
> > > Infernalis cycle.  The most intrusive change is an internal (non
> > > user-visible) change to the OSD's ObjectStore interface.  Many fixes
> > > and improvements elsewhere across RGW, RBD, and another big pile of
> CephFS scrub/repair improvements.
> > >
> > >
> > > Getting Ceph
> > > 
> > >
> > > * Git at git://github.com/ceph/ceph.git
> > > * Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
> > > * For packages, see http://ceph.com/docs/master/install/get-packages
> > > * For ceph-deploy, see
> > > http://ceph.com/docs/master/install/install-ceph-
> > > deploy
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > ceph-devel" in the body of a message to majord...@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > I have built rpms from the tarball http://ceph.com/download/ceph-
> 9.0.3.tar.bz2.
> > Have done this for fedora 21 x86_64 and for aarch64.  On both
> > platforms when I run a single node "cluster" with a few osds and run
> > rados bench read tests (either seq or rand) I get occasional reports
> > like
> >
> > benchmark_data_myhost_20729_object73 is not correct!
> >
> > I never saw these with similar rpm builds on these platforms from 9.0.2
> sources.
> >
> > Also, if I go to an x86-64 system running Ubuntu trusty for which I am
> > able to install prebuilt binary packages via
> > ceph-deploy install --dev v9.0.3
> >
> > I do not see the errors there.
> 
> Hrm.. haven't seen it on this end, but we're running/testing master and not
> 9.0.2 specifically.  If you can reproduce this on master, that'd be very
> helpful!
> 
> There have been some recent changes to rados bench... Piotr, does this seem
> like it might be caused by your changes?
> 
> sage
> 

Just as a reminder this is with 9.0.3, not 9.0.2.

I just tried with the osds running on the fedora machine (with rpms that I 
built from the tarball)
and rados bench running on the Ubuntu machine (with pre-built binary packages)
and I do not see the errors with that combination.

Will see what happens with master.

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rados bench object not correct errors on v9.0.3

2015-08-25 Thread Deneau, Tom


> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Monday, August 24, 2015 12:45 PM
> To: ceph-annou...@ceph.com; ceph-devel@vger.kernel.org; ceph-us...@ceph.com;
> ceph-maintain...@ceph.com
> Subject: v9.0.3 released
> 
> This is the second to last batch of development work for the Infernalis
> cycle.  The most intrusive change is an internal (non user-visible) change to
> the OSD's ObjectStore interface.  Many fixes and improvements elsewhere
> across RGW, RBD, and another big pile of CephFS scrub/repair improvements.
> 
> 
> Getting Ceph
> 
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://ceph.com/download/ceph-9.0.3.tar.gz
> * For packages, see http://ceph.com/docs/master/install/get-packages
> * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-
> deploy
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

I have built rpms from the tarball http://ceph.com/download/ceph-9.0.3.tar.bz2.
Have done this for fedora 21 x86_64 and for aarch64.  On both platforms when I 
run
a single node "cluster" with a few osds and run rados bench read tests
(either seq or rand) I get occasional reports like

benchmark_data_myhost_20729_object73 is not correct!

I never saw these with similar rpm builds on these platforms from 9.0.2 sources.

Also, if I go to an x86-64 system running Ubuntu trusty for which I am able to
install prebuilt binary packages via
ceph-deploy install --dev v9.0.3

I do not see the errors there.

Any suggestions welcome.

-- Tom Deneau, AMD



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


registering for tracker.ceph.com

2015-07-23 Thread Deneau, Tom
I wanted to register for tracker.ceph.com to enter a few issues but never
got the confirming email and my registration is now in some stuck state
(not complete but name/email in use so can't re-register).  Any suggestions?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


building just src/tools/rados

2015-07-22 Thread Deneau, Tom
Is there a make command that would build just the src/tools or even just 
src/tools/rados ?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


9.0.2 test/perf_local.cc on non-x86 architectures

2015-07-21 Thread Deneau, Tom
I was trying to do an rpmbuild of v9.0.2 for aarch64 and got the following 
error:

test/perf_local.cc: In function 'double div32()':
test/perf_local.cc:396:31: error: impossible constraint in 'asm'
  "cc");

Probably should have an if defined (__i386__) around it.

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: osd suicide timeout

2015-07-13 Thread Deneau, Tom
Greg --

Not sure how to tell whether rebalancing occurred at that time.
I do see in other osd logs complaints that they do not get a reply from
osd.8 starting around 15:52 on that day.

I see the deep-scrub of pool 14 but that was almost 30 minutes earlier.

-- Tom


> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Monday, July 13, 2015 11:45 AM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: osd suicide timeout
> 
> heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x3ff6eb0efd0' had suicide
> timed out after 150
> 
> So that's the OSD's "op thread", which is the one that does most of the work.
> You often see the FileStore::op_tp when it's the disk or filesystem breaking,
> but I do see the line
> 
> waiting 51 > 50 ops || 57248008 > 104857600
> 
> which leaves me feeling pretty confident that the disk is just getting more
> work than it can keep up with. It looks like there was some rebalancing
> happening around this time?
> -Greg
> 
> 
> 
> On Mon, Jul 13, 2015 at 5:04 PM, Deneau, Tom  wrote:
> > Greg --
> >
> > Thanks.  I put the osd.log file at
> >
> > https://drive.google.com/file/d/0B_rfwWh40kPwQjZ3OXdjLUZNRVU/view?usp=
> > sharing
> >
> > I noticed the following from journalctl output around that time, so other
> nodes were complaining they could not reach osd.8.
> >
> > Jul 09 15:53:04 seattle-04-ausisv bash[8486]: 2015-07-09
> > 15:53:03.905386 3ffa0d9efd0 -1 osd.9 2487 heartbeat_check: no reply
> > from osd.8 since back 2015-07-09 15:52:43.256581 front 2015-07-09
> > 15:52:43.256581 (cutoff 2015-07-09 15:52:43.905384) Jul 09 15:53:06
> > seattle-04-ausisv bash[1060]: 2015-07-09 15:53:06.784069 3ff916fefd0
> > -1 osd.7 2487 heartbeat_check: no reply from osd.8 since back
> > 2015-07-09 15:52:46.474273 front 2015-07-09 15:52:46.474273 (cutoff
> > 2015-07-09 15:52:46.784066)
> >
> > and here is some sar data for the disk that osd.8 was controlling
> > (sde1=journal partition, sde2=data partition)
> >
> > 03:40:02 PM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
> await svctm %util
> > 03:50:17 PM  sde1  9.67  0.00  19046.40   1970.32  0.35
> 35.86 30.62 29.60
> > 03:50:17 PM  sde2 60.47   1524.27  14597.67266.63 24.45
> 404.30  8.54 51.67
> > 03:50:32 PM  sde1 12.13  0.00  18158.93   1496.62  0.25
> 20.66 17.58 21.33
> > 03:50:32 PM  sde2 28.00   1550.93  17958.33696.76 10.54
> 376.50 13.52 37.87
> > 03:50:47 PM  sde1 12.73  0.00  25446.40   1998.41  0.31
> 24.19 22.30 28.40
> > 03:50:47 PM  sde2 51.60338.67  18091.73357.18 13.05
> 252.91  8.02 41.40
> > 03:51:02 PM  sde1 12.27  0.00  18790.40   1531.83  0.31
> 25.33 18.53 22.73
> > 03:51:02 PM  sde2 33.13   2635.20  18026.67623.60  5.02
> 151.57 10.99 36.40
> > 03:51:17 PM  sde1 10.13  0.00  14557.87   1436.63  0.16
> 16.18 12.76 12.93
> > 03:51:17 PM  sde2 46.73   1107.73  12067.00281.91  8.55
> 182.88  5.46 25.53
> > 03:51:32 PM  sde1 11.93  0.00  18594.13   1558.17  0.35
> 29.27 16.42 19.60
> > 03:51:32 PM  sde2 22.20555.20  18834.33873.40  4.24
> 191.08 13.51 30.00
> > 03:51:47 PM  sde1 18.00  0.00  13926.40773.69  0.19
> 10.78 10.07 18.13
> > 03:51:47 PM  sde2 47.27   1652.80  10775.53262.94 12.24
> 259.01  6.66 31.47
> > 03:52:02 PM  sde1 21.60  0.00  10845.87502.12  0.24
> 11.08  9.75 21.07
> > 03:52:02 PM  sde2 34.33   1652.80   9089.13312.87  7.43
> 216.41  8.45 29.00
> > 03:52:17 PM  sde1 19.87  0.00  20198.40   1016.70  0.33
> 16.85 13.46 26.73
> > 03:52:17 PM  sde2 35.60   2752.53  16355.53536.74 11.90
> 333.33 10.90 38.80
> > 03:52:32 PM  sde1 22.54  0.00   8434.04374.18  0.15
> 6.67  6.17 13.90
> > 03:52:32 PM  sde2 35.84   2738.30   4586.30204.38  2.01
> 28.11  6.53 23.40
> > 03:52:47 PM  sde1  0.00  0.00  0.00  0.00  0.00
> 0.00  0.00  0.00
> > 03:52:47 PM  sde2 13.37 35.83   1101.80 85.09  1.87
> 218.65  5.75  7.69
> > 03:53:02 PM  sde1  0.00  0.00  0.00  0.00  0.00
> 0.00  0.00  0.00
> > 03:53:02 PM  sde2  0.00  

RE: osd suicide timeout

2015-07-13 Thread Deneau, Tom
Greg --

Thanks.  I put the osd.log file at

https://drive.google.com/file/d/0B_rfwWh40kPwQjZ3OXdjLUZNRVU/view?usp=sharing

I noticed the following from journalctl output around that time, so other nodes 
were complaining they could not reach osd.8.

Jul 09 15:53:04 seattle-04-ausisv bash[8486]: 2015-07-09 15:53:03.905386 
3ffa0d9efd0 -1 osd.9 2487 heartbeat_check: no reply from osd.8 since back 
2015-07-09 15:52:43.256581 front 2015-07-09 15:52:43.256581 (cutoff 2015-07-09 
15:52:43.905384)
Jul 09 15:53:06 seattle-04-ausisv bash[1060]: 2015-07-09 15:53:06.784069 
3ff916fefd0 -1 osd.7 2487 heartbeat_check: no reply from osd.8 since back 
2015-07-09 15:52:46.474273 front 2015-07-09 15:52:46.474273 (cutoff 2015-07-09 
15:52:46.784066)

and here is some sar data for the disk that osd.8 was controlling (sde1=journal 
partition, sde2=data partition)

03:40:02 PM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
await svctm %util
03:50:17 PM  sde1  9.67  0.00  19046.40   1970.32  0.35 
35.86 30.62 29.60
03:50:17 PM  sde2 60.47   1524.27  14597.67266.63 24.45
404.30  8.54 51.67
03:50:32 PM  sde1 12.13  0.00  18158.93   1496.62  0.25 
20.66 17.58 21.33
03:50:32 PM  sde2 28.00   1550.93  17958.33696.76 10.54
376.50 13.52 37.87
03:50:47 PM  sde1 12.73  0.00  25446.40   1998.41  0.31 
24.19 22.30 28.40
03:50:47 PM  sde2 51.60338.67  18091.73357.18 13.05
252.91  8.02 41.40
03:51:02 PM  sde1 12.27  0.00  18790.40   1531.83  0.31 
25.33 18.53 22.73
03:51:02 PM  sde2 33.13   2635.20  18026.67623.60  5.02
151.57 10.99 36.40
03:51:17 PM  sde1 10.13  0.00  14557.87   1436.63  0.16 
16.18 12.76 12.93
03:51:17 PM  sde2 46.73   1107.73  12067.00281.91  8.55
182.88  5.46 25.53
03:51:32 PM  sde1 11.93  0.00  18594.13   1558.17  0.35 
29.27 16.42 19.60
03:51:32 PM  sde2 22.20555.20  18834.33873.40  4.24
191.08 13.51 30.00
03:51:47 PM  sde1 18.00  0.00  13926.40773.69  0.19 
10.78 10.07 18.13
03:51:47 PM  sde2 47.27   1652.80  10775.53262.94 12.24
259.01  6.66 31.47
03:52:02 PM  sde1 21.60  0.00  10845.87502.12  0.24 
11.08  9.75 21.07
03:52:02 PM  sde2 34.33   1652.80   9089.13312.87  7.43
216.41  8.45 29.00
03:52:17 PM  sde1 19.87  0.00  20198.40   1016.70  0.33 
16.85 13.46 26.73
03:52:17 PM  sde2 35.60   2752.53  16355.53536.74 11.90
333.33 10.90 38.80
03:52:32 PM  sde1 22.54  0.00   8434.04374.18  0.15  
6.67  6.17 13.90
03:52:32 PM  sde2 35.84   2738.30   4586.30204.38  2.01 
28.11  6.53 23.40
03:52:47 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:52:47 PM  sde2 13.37 35.83   1101.80 85.09  1.87
218.65  5.75  7.69
03:53:02 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:02 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:17 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:17 PM  sde2  0.13  0.00  0.20  1.50  0.00 
20.00 20.00  0.27
03:53:32 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:32 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:47 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:47 PM  sde2  0.13  0.00  0.20  1.50  0.00  
5.00  5.00  0.07
03:54:02 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:02 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:17 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:17 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:32 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:32 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00

-- Tom

> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Monday, July 13, 2015 5:07 AM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: osd suicide timeout
> 
> On Fri, Jul 10, 2015 at 10:45 PM, Deneau, Tom  wrote:
> > I have an osd log file from an osd that h

osd suicide timeout

2015-07-10 Thread Deneau, Tom
I have an osd log file from an osd that hit a suicide timeout (with the 
previous 1 events logged). 
(On this node I have also seen this suicide timeout happen once before and also 
a sync_entry timeout.

I can see that 6 minutes or so before that osd died, other osds on the same 
node were logging
messages such as
heartbeat_check: no reply from osd.8
so it appears that osd8 stopped responding quite some time before it died.

I'm wondering if there is enough information in the osd8 log file to deduce why 
osd 8 stopped responding?
I don't know enough to figure it out myself.

Is there any expert who would be willing to take a look at the log file?

-- Tom Deneau, AMD

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: load-gen from an osd node

2015-07-08 Thread Deneau, Tom
With the newer rados, just running  rados -p somepool load-gen, we get

preparing 200 objects
load-gen will run 60 seconds
1: throughput=0MB/sec pending data=0
2: throughput=0MB/sec pending data=0
3: throughput=0MB/sec pending data=0
4: throughput=0MB/sec pending data=0
5: throughput=0MB/sec pending data=0
6: throughput=0MB/sec pending data=0

But I see the change that caused this behavior was explicitly initializing 
lg.max_ops to 0.
Before this, max_ops was just whatever happened to be in memory.

So the newer load-gen works fine as long as you explicitly set --max-ops on the 
command line.

-- Tom Deneau


> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Wednesday, July 01, 2015 11:05 AM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: load-gen from an osd node
> 
> Hmm, the only changes I see between those two versions are some pretty
> precise cleanups which shouldn't cause this. But it means that a bisect or
> determined look should be easy. Can you create a ticket which includes the
> exact output you're seeing and the exact versions you're running?
> -Greg
> 
> On Mon, Jun 29, 2015 at 11:27 PM, Deneau, Tom  wrote:
> > Oh, I just noticed that the client nodes I spoke of where load-gen
> > actually worked were running 0.94, not 9.0.1.  And when I upgrade them to
> 9.0.1, load-gen no longer works.
> >
> > So more likely this is just a problem with newer rados load-gens
> >
> > -- Tom
> >
> >
> >
> >> -Original Message-
> >> From: Deneau, Tom
> >> Sent: Friday, June 26, 2015 7:48 PM
> >> To: ceph-devel
> >> Subject: load-gen from an osd node
> >>
> >> I am running 9.0.1 and I noticed when I run rados load-gen from one
> >> of the osd nodes, it creates the objects but then always reports a
> >> throughput of 0 MB/sec.
> >>
> >> But if I run it from a separate client node, it works fine.
> >> Why would this be?
> >>
> >> I'm not sure but I thought in earlier versions load-gen could be run
> >> from an osd node.
> >>
> >> -- Tom Deneau
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html


osd aborts, sync entry timeout and suicide timeout

2015-07-06 Thread Deneau, Tom
I had a small (4 nodes, 19 OSDs) cluster that I was running a sort of
stress test on over the weekend.  Let's call the 4 nodes, A, B, C and
D.  (Node A had the monitor running on it).

Anyway, node C died with a hardware problem, and, I think at about
that same time two of the 5 osds on node B aborted with asserts.  The
other 3 OSDS on node B carried on without problem as did the OSDS on
nodes A and D.  And the client tests continued to run without error.

I attached the stack traces below from the aborting OSDs below.  If
necessary, I can send the full osd logs (which include the dump of the
1 most recent events).

I don't know enough about the ceph internals to know what these aborts
really mean.  Have others seen these kinds of aborts before?  (I would
assume these kinds of aborts are not normal).  Are they an indication
of some kind of ceph configuration problem or build problem?  As can
be seen I am running 9.0.1 which I built from sources for the aarch64
platform.

-- Tom Deneau, AMD


Aborting OSD #1

2015-07-03 20:27:47.013337 3ff7255efd0 -1 FileStore: sync_entry timed out after 
600 seconds.
 ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64)
 1: (Context::complete(int)+0x1c) [0x6cdbe4]
 2: (SafeTimer::timer_thread()+0x320) [0xbfed58]
 3: (SafeTimerThread::entry()+0x10) [0xc00af8]
 4: (()+0x6ed4) [0x3ff7f456ed4]
 5: (()+0xe08b0) [0x3ff7f0408b0]

2015-07-03 20:27:47.022743 3ff7255efd0 -1 os/FileStore.cc: In function 'virtual 
void SyncEntryTimeout::finish(int)' thread 3ff7255efd0 time 2015-07-03 
20:27:47.013404
os/FileStore.cc: 3524: FAILED assert(0)

 ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8c) 
[0xc14b9c]
 2: (SyncEntryTimeout::finish(int)+0xc4) [0x91552c]
 3: (Context::complete(int)+0x1c) [0x6cdbe4]
 4: (SafeTimer::timer_thread()+0x320) [0xbfed58]
 5: (SafeTimerThread::entry()+0x10) [0xc00af8]
 6: (()+0x6ed4) [0x3ff7f456ed4]
 7: (()+0xe08b0) [0x3ff7f0408b0]


Aborting OSD #2

2015-07-03 20:28:14.693496 3ff6d2eefd0 -1 common/HeartbeatMap.cc: In function 
'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, 
time_t)' thread 3ff6d2eefd0 time 2015-07-03 20:28:14.665989
common/HeartbeatMapX.cc: 79: FAILED assert(0 == "hit suicide timeout")

 ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8c) 
[0xc14b9c]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, 
long)+0x328) [0xb5afc0]
 3: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, 
long)+0x19c) [0xb5b2b4]
 4: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x6fc) [0xc06fdc]
 5: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xc07a98]
 6: (()+0x6ed4) [0x3ff85696ed4]
 7: (()+0xe08b0) [0x3ff852808b0]


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: load-gen from an osd node

2015-06-29 Thread Deneau, Tom
Oh, I just noticed that the client nodes I spoke of where load-gen actually 
worked were
running 0.94, not 9.0.1.  And when I upgrade them to 9.0.1, load-gen no longer 
works.

So more likely this is just a problem with newer rados load-gens

-- Tom



> -Original Message-
> From: Deneau, Tom
> Sent: Friday, June 26, 2015 7:48 PM
> To: ceph-devel
> Subject: load-gen from an osd node
> 
> I am running 9.0.1 and I noticed when I run rados load-gen from one of the
> osd nodes, it creates the objects but then always reports a throughput of 0
> MB/sec.
> 
> But if I run it from a separate client node, it works fine.
> Why would this be?
> 
> I'm not sure but I thought in earlier versions load-gen could be run from an
> osd node.
> 
> -- Tom Deneau
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pgs stuck undersized and degraded

2015-06-29 Thread Deneau, Tom
On a very small (3 node) cluster, I have one pool with a replication size of 3 
that is showing some stuck PGs.
This pool has 64 pgs and the other pgs in the pool seem fine, mapped to 3 osds 
each.
And all the pgs in other pools are also fine.
Why would these pgs be stuck with 2 ?
The osd crush chooseleaf type is 1 for host and the osd tree is shown below.

-- Tom Deneau

pg_stat state   up  up_primary  acting  acting_primary
--
13.34   active+undersized+degraded  [0,7]   0   [0,7]   0
13.3a   active+undersized+degraded  [2,8]   2   [2,8]   2
13.aactive+undersized+degraded  [8,2]   8   [8,2]   8
13.eactive+undersized+degraded  [0,8]   0   [0,8]   0
13.3c   active+undersized+degraded  [2,5]   2   [2,5]   2
13.22   active+undersized+degraded  [8,2]   8   [8,2]   8
13.1b   active+undersized+degraded  [2,8]   2   [2,8]   2
13.21   active+undersized+degraded  [8,0]   8   [8,0]   8
13.1e   active+undersized+degraded  [8,3]   8   [8,3]   8
13.1f   active+undersized+degraded  [4,6]   4   [4,6]   4
13.2a   active+remapped[7,4]7   [7,4,0] 7
13.33   active+undersized+degraded  [7,2]   7   [7,2]   7
13.0active+undersized+degraded  [0,7]   0   [0,7]   0

ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.94997 root default
-2 1.7 host aus01
 0 0.45000 osd.0   up  1.0  1.0
 2 0.45000 osd.2   up  1.0  1.0
 3 0.45000 osd.3   up  1.0  1.0
 4 0.45000 osd.4   up  1.0  1.0
-3 1.7 host aus05
 5 0.45000 osd.5   up  1.0  1.0
 6 0.45000 osd.6   up  1.0  1.0
 7 0.45000 osd.7   up  1.0  1.0
 8 0.45000 osd.8   up  1.0  1.0
-4 1.34999 host aus06
 9 0.45000 osd.9   up  1.0  1.0
10 0.45000 osd.10  up  1.0  1.0
11 0.45000 osd.11down0  1.0
 1   0 osd.1 down0  1.0

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


load-gen from an osd node

2015-06-26 Thread Deneau, Tom
I am running 9.0.1 and I noticed when I run rados load-gen from one of the osd 
nodes,
it creates the objects but then always reports a throughput of 0 MB/sec.

But if I run it from a separate client node, it works fine.
Why would this be?

I'm not sure but I thought in earlier versions load-gen could be run from an 
osd node.

-- Tom Deneau
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: deleting objects from a pool

2015-06-25 Thread Deneau, Tom
Igor --

Good command to know, but this is still very slow on an erasure pool.
For example, on my cluster it took 10 seconds with rados bench to write 10,000 
40K size objects to an ecpool.
And it took almost 6 minutes to delete them using the command below.

-- Tom


> -Original Message-
> From: Podoski, Igor [mailto:igor.podo...@ts.fujitsu.com]
> Sent: Thursday, June 25, 2015 1:06 AM
> To: Deneau, Tom; Dałek, Piotr; ceph-devel
> Subject: RE: deleting objects from a pool
> 
> Hi,
> 
> It appears, that cleanup can be used as a purge:
> 
> rados -p  cleanup  --prefix ""
> 
> Regards,
> Igor.
> 
> 
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> Sent: Wednesday, June 24, 2015 10:22 PM
> To: Dałek, Piotr; ceph-devel
> Subject: RE: deleting objects from a pool
> 
> I've noticed that deleting objects from a basic k=2 m=1 erasure pool is much
> much slower than deleting a similar number of objects from a replicated size
> 3 pool (so the same number of files to be deleted).   It looked like the ec
> pool object deletion was almost 20x slower.  Is there a lot more work to be
> done to delete an ec pool object?
> 
> -- Tom
> 
> 
> 
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Dalek, Piotr
> > Sent: Wednesday, June 24, 2015 11:56 AM
> > To: ceph-devel
> > Subject: Re: deleting objects from a pool
> >
> > > -Original Message-
> > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > > ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> > > Sent: Wednesday, June 24, 2015 6:44 PM
> > >
> > > I have benchmarking situations where I want to leave a pool around
> > > but delete a lot of objects from the pool.  Is there any really fast
> > > way to do
> > that?
> > > I noticed rados rmpool is fast but I don't want to remove the pool.
> > >
> > > I have been spawning multiple threads, each deleting a subset of the
> > objects
> > > (which I believe is what rados bench write does) but even that can
> > > be very slow.
> >
> > For now, apart from "rados -p  cleanup" (which doesn't purge
> > the pool, but merely removes objects written during last benchmark
> > run), the only option is by brute force:
> >
> > for i in $(rados -p  ls); do (rados -p  rm $i
> > &>/dev/null &); done;
> >
> > There's no "purge pool" command in rados -- not yet, at least. I was
> > thinking about one, but never really had time to implement one.
> >
> > With best regards / Pozdrawiam
> > Piotr Dałek
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: deleting objects from a pool

2015-06-24 Thread Deneau, Tom
I've noticed that deleting objects from a basic k=2 m=1 erasure pool is much 
much slower than deleting a similar number of objects from a replicated size 3 
pool (so the same number of files to be deleted).   It looked like the ec pool 
object deletion was almost 20x slower.  Is there a lot more work to be done to 
delete an ec pool object?

-- Tom



> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Dalek, Piotr
> Sent: Wednesday, June 24, 2015 11:56 AM
> To: ceph-devel
> Subject: Re: deleting objects from a pool
> 
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> > Sent: Wednesday, June 24, 2015 6:44 PM
> >
> > I have benchmarking situations where I want to leave a pool around but
> > delete a lot of objects from the pool.  Is there any really fast way to do
> that?
> > I noticed rados rmpool is fast but I don't want to remove the pool.
> >
> > I have been spawning multiple threads, each deleting a subset of the
> objects
> > (which I believe is what rados bench write does) but even that can be very
> > slow.
> 
> For now, apart from "rados -p  cleanup" (which doesn't purge the
> pool, but merely removes objects written during last benchmark run), the only
> option is by brute force:
> 
> for i in $(rados -p  ls); do (rados -p  rm $i &>/dev/null
> &); done;
> 
> There's no "purge pool" command in rados -- not yet, at least. I was thinking
> about one, but never really had time to implement one.
> 
> With best regards / Pozdrawiam
> Piotr Dałek
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


deleting objects from a pool

2015-06-24 Thread Deneau, Tom
I have benchmarking situations where I want to leave a pool around but
delete a lot of objects from the pool.  Is there any really fast way to do that?
I noticed rados rmpool is fast but I don't want to remove the pool.

I have been spawning multiple threads, each deleting a subset of the objects
(which I believe is what rados bench write does) but even that can be very slow.

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


erasure pool with isa plugin

2015-06-22 Thread Deneau, Tom
If one has a cluster with some nodes that can run with the ISA plugin
and some that cannot, is there a way to define a pool such that the
ISA-capable nodes can use the ISA plugin and the others can use say
the jerasure plugin?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in


RE: osd pool erasure code stripe width

2015-06-19 Thread Deneau, Tom
So if I am looking at the performance of a system with
ceph_erasure_code_benchmark, does this mean I should concentrate
on the performance with --size 4096 (or whatever stripe width my
ec pools are going to use)?

-- Tom

> -Original Message-
> From: Loic Dachary [mailto:l...@dachary.org]
> Sent: Friday, June 19, 2015 5:47 PM
> To: Deneau, Tom; ceph-devel
> Subject: Re: osd pool erasure code stripe width
> 
> Hi Tom,
> 
> A stripe width of 4KB (the default) means the object is encoded 4KB at a
> time. It does not show in the file written to disk.
> 
> Cheers
> 
> On 19/06/2015 22:11, Deneau, Tom wrote:
> > I am trying to understand the use of "osd pool erasure code stripe width"
> > For example, I have a single-node system with a k=2,m=1 ec pool and I
> > write a single 40M object to this pool using rados bench.
> > But when I look on the disk, I still see only the 3 20M pieces for this
> object.
> > Where does the striping get used?
> >
> > -- Tom Deneau, AMD
> >
> >
> > Description:
> > Sets the desired size, in bytes, of an object stripe on every erasure coded
> pools. Every object if size S will be stored as N stripes and each stripe
> will be encoded/decoded individually.
> > Type:
> > Unsigned 32-bit Integer
> > Default:
> > 4096
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in


osd pool erasure code stripe width

2015-06-19 Thread Deneau, Tom
I am trying to understand the use of "osd pool erasure code stripe width"
For example, I have a single-node system with a k=2,m=1 ec pool
and I write a single 40M object to this pool using rados bench.
But when I look on the disk, I still see only the 3 20M pieces for this object.
Where does the striping get used?

-- Tom Deneau, AMD


Description:
Sets the desired size, in bytes, of an object stripe on every erasure coded 
pools. Every object if size S will be stored as N stripes and each stripe will 
be encoded/decoded individually.
Type:
Unsigned 32-bit Integer
Default:
4096


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in


RE: rados bench throughput with no disk or network activity

2015-05-28 Thread Deneau, Tom


> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Thursday, May 28, 2015 6:18 PM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: rados bench throughput with no disk or network activity
> 
> On Thu, May 28, 2015 at 4:09 PM, Deneau, Tom  wrote:
> > I've noticed that
> >* with a single node cluster with 4 osds
> >* and running rados bench rand on that same node so no network traffic
> >* with a number of objects small enough so that everything is in
> > the cache so no disk traffic
> >
> > we still peak out at about 1600 MB/sec.
> >
> > And the cpu is 40% idle. (and a good chunk of the cpu activity is the
> > rados benchmark itself)
> >
> > What is likely causing the throttling here?
> 
> Well, rados bench itself is essentially single-threaded, so if it's using
> 100% CPU that's probably the bottleneck you're hitting.
> 
> Otherwise, by default it will limit itself to 100MB of outstanding IO
> (there's an objecter config value you can change for this; it's been
> discussed recently) and that might not be enough given the latencies of
> hopping packets across different CPUs, and the OSDs have a slightly-
> embarrassing amount of CPU computation and thread hopping they have to
> perform on every op (around half a millisecond's worth on each read, I
> think?).
> -Gerg

Right.  I was involved in the objecter config discussion :) and 
I have set the limits higher.   And this 1600 MB/sec limit seems to be
the same whatever the size of the objects.

rados bench is using about 30% of the cpu and the total cpu usage is about 60%
(the rest being mostly from the 4 osds).

Hmm, I just tried running 4 copies of rados bench rand, and I can get a little 
bit
higher combined totals, but not much higher maybe 1800 MB/sec.
 
-- Tom

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

rados bench throughput with no disk or network activity

2015-05-28 Thread Deneau, Tom
I've noticed that
   * with a single node cluster with 4 osds
   * and running rados bench rand on that same node so no network traffic
   * with a number of objects small enough so that everything is in the cache 
so no disk traffic

we still peak out at about 1600 MB/sec.

And the cpu is 40% idle. (and a good chunk of the cpu activity is the rados 
benchmark itself)

What is likely causing the throttling here?

-- Tom Deneau, AMD




--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: journal writes when running rados bench seq on ec pool

2015-05-20 Thread Deneau, Tom
Hi --

I never saw an explanation for the writes that are occurring during ecpool 
reads.

-- Tom


> -Original Message-
> From: Somnath Roy [mailto:somnath@sandisk.com]
> Sent: Wednesday, May 13, 2015 6:48 PM
> To: Deneau, Tom; ceph-devel
> Subject: RE: journal writes when running rados bench seq on ec pool
> 
> Tom,
> Good that you brought this up !
> I was also seeing the small writes during reads but forgot to mention in my
> last report on EC.
> Basically, Ceph code base seems to be issuing small writes during reads and
> it is basically going to both data/jounal (considering filestore backend)
> ..But, since data partition is syncing on every 5 sec , it is not obvious
> there.
> 
> Loic,
> Could you please explain a bit more on this ? What exactly we are writing
> during reads ?
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Deneau, Tom
> Sent: Wednesday, May 13, 2015 4:40 PM
> To: ceph-devel
> Subject: journal writes when running rados bench seq on ec pool
> 
> I am running a rados bench seq read benchmark on an erasure-coded pool (k=2,
> m=1),
> and recording the disk activity.   I notice that I always see a small number
> of writes
> to the journal partitions during the seq read run.  (I always drop the caches
> on the osd nodes before starting the seq read run).
> 
> I never see this journal write behavior with a replicated pool and I didn't
> think there would be any journal writes with either kind of pool on a read
> benchmark.
> 
> The write activity is low enough that it's probably not that important
> performance-wise but I as just curious as to what would cause this.
> 
> -- Tom Deneau
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies or
> electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ceph tell osd bench

2015-04-23 Thread Deneau, Tom
> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Thursday, April 23, 2015 12:37 PM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: ceph tell osd bench
> 
> On Thu, Apr 23, 2015 at 6:58 AM, Deneau, Tom  wrote:
> > While running ceph tell osd bench and playing around with the total_bytes
> and block_size parameters,
> > I have noticed that if the total_bytes written is less than about 0.5G, the
> bytes/sec is much higher.
> > Why is that?
> 
> It's probably only writing the data into the journal at that size. I'm
> a bit surprised because I thought the osd bench took care to only
> provide stable numbers, but maybe it's just using well-chosen defaults
> and at half a gig you're running well below them.
> -Greg

so is an osd write (as used by tell osd bench) considered complete when
it is written to the journal?

-- Tom



ceph tell osd bench

2015-04-23 Thread Deneau, Tom
While running ceph tell osd bench and playing around with the total_bytes and 
block_size parameters,
I have noticed that if the total_bytes written is less than about 0.5G, the 
bytes/sec is much higher.
Why is that?

-- Tom Deneau
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hashing variation in rados bench runs

2015-04-20 Thread Deneau, Tom
I have been trying to run rados bench runs and I've noticed a lot of variations 
from run to run.
The runs generally write data with --no-cleanup then read it back (seq), 
dropping the caches in between
I admit this is on a single node "cluster" with 5 data disks so maybe not 
realistic but...

In my runs I also collect disk activity traces.  When I look at the seq read 
scores, I've noticed
the low scoring runs always have a "hot" disk which maxes out while others 
might be at 30% to 40% usage.
Whereas in the high scoring runs the disk activity is much more evenly 
distributed.
I realize the hashing of objects to primary osds depends on the object names 
which are different for each run
(in rados bench, the object names include the pid).  But I was surprised at the 
sometimes marked unevenness
in the hashing.

Have others seen this and is there a good workaround?

-- Tom Deneau, AMD
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ms_crc_data false

2015-04-09 Thread Deneau, Tom


> -Original Message-
> From: Gregory Farnum [mailto:g...@gregs42.com]
> Sent: Wednesday, April 08, 2015 6:26 PM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: ms_crc_data false
> 
> On Wed, Apr 8, 2015 at 3:38 PM, Deneau, Tom  wrote:
> > With 0.93, I tried
> > ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--
> ms_crc_header=false'
> >
> > and saw the changes reflected in ceph admin-daemon
> >
> > But having done that, perf top still shows time being spent in crc32
> routines.
> > Is there some other parameter that needs changing?
> 
> You can change this config value, but unfortunately it won't have any effect
> on a running daemon. You'll need to specify it in a config and restart.
> -Greg

Thanks, Greg.

Here are some observations on my single node system 
after changing the ceph.conf and restarting all daemons...

loop rados bench seq reading 4M objects (1 thread) (dropping caches before each 
invocation)
   * BW went up with ms crc xxx = false
   * but still see signficant crc32 line attributed to ceph-osd
  * no writes happening
  * note: with ms crc xxx = true, there were 2 crc32 lines in perf top, the 
2nd
one attributed to librados, that one is gone with ms crc xxx = false

loop rados get single largeObject (1.6G) /dev/null (dropping caches before each 
invocation)
   * no change in BW from ms crc xxx = true
   * no crc32 line in perf top
  * whereas with ms crc xxx=true there were 2 lines one for ceph-osd, one 
for librados

-- Tom Deneau




N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: ms_crc_data false

2015-04-08 Thread Deneau, Tom
> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Wednesday, April 08, 2015 5:40 PM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: ms_crc_data false
> 
> On Wed, 8 Apr 2015, Deneau, Tom wrote:
> > With 0.93, I tried
> > ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--
> ms_crc_header=false'
> >
> > and saw the changes reflected in ceph admin-daemon
> >
> > But having done that, perf top still shows time being spent in crc32
> routines.
> > Is there some other parameter that needs changing?
> 
> The osd still does a CRC for the purposes of write journaling.  This can't be
> disabled currently.  You shouldn't see this come up on reads...
> 
> sage

Hmm, I am doing rados bench seq so only reads.

Is this in 0.93 or do I need 0.94?

-- Tom
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ms_crc_data false

2015-04-08 Thread Deneau, Tom
With 0.93, I tried 
ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--ms_crc_header=false'

and saw the changes reflected in ceph admin-daemon

But having done that, perf top still shows time being spent in crc32 routines.
Is there some other parameter that needs changing?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rpmbuild version

2015-03-28 Thread Deneau, Tom
Starting from a ceph git checkout I want to create a source tar.bz2 to feed to 
rpmbuild.
I see there is a "make dist-bzip2" to make the actual source tar but if I want
the built binaries to identify themselves as a custom version, what is the best 
way to do that?

-- Tom Deneau

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: seg fault in ceph-osd on aarch64

2015-03-26 Thread Deneau, Tom
any suggestions for stress tests, etc that might make this happen sooner?

-- Tom

> -Original Message-
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Thursday, March 26, 2015 12:17 PM
> To: Deneau, Tom
> Cc: ceph-devel
> Subject: Re: seg fault in ceph-osd on aarch64
> 
> On Thu, 26 Mar 2015, Deneau, Tom wrote:
> > I've been exercising the the 64-bit arm (aarch64) version of ceph.
> > This is from self-built rpms from the v0.93 snapshot.
> > The "cluster" is a single system with 6 hard drives, one osd each.
> > I've been letting it run with some rados bench and rados load-gen
> > loops and running bonnie++ on an rbd mount.
> >
> > Occasionally (in the latest case after 2 days) I've seen ceph-osd
> > crashes like the one shown below.  (showing last 10 events as well).
> > If I am reading the objdump correctly this is from the while loop in
> > the following code in Pipe::connect
> >
> > I assume this is not seen on ceph builds from other architectures?
> >
> > What is the recommended way to get more information on this osd crash?
> > (looks like osd log levels are 0/5)
> 
> In this case, debug ms = 20 should tell us what we need!
> 
> Thanks-
> sage
> 
> 
> >
> > -- Tom Deneau, AMD
> >
> >
> >
> >   if (reply.tag == CEPH_MSGR_TAG_SEQ) {
> > ldout(msgr->cct,10) << "got CEPH_MSGR_TAG_SEQ, reading acked_seq
> and writing in_seq" << dendl;
> > uint64_t newly_acked_seq = 0;
> > if (tcp_read((char*)&newly_acked_seq, sizeof(newly_acked_seq)) < 0)
> {
> >   ldout(msgr->cct,2) << "connect read error on newly_acked_seq" <<
> dendl;
> >   goto fail_locked;
> > }
> > ldout(msgr->cct,2) << " got newly_acked_seq " << newly_acked_seq
> ><< " vs out_seq " << out_seq << dendl;
> > while (newly_acked_seq > out_seq) {
> >   Message *m = _get_next_outgoing();
> >   assert(m);
> >   ldout(msgr->cct,2) << " discarding previously sent " << m-
> >get_seq()
> >  << " " << *m << dendl;
> >   assert(m->get_seq() <= newly_acked_seq);
> >   m->put();
> >   ++out_seq;
> > }
> > if (tcp_write((char*)&in_seq, sizeof(in_seq)) < 0) {
> >   ldout(msgr->cct,2) << "connect write error on in_seq" << dendl;
> >   goto fail_locked;
> > }
> >   }
> >
> >
> >
> >
> >   -10> 2015-03-25 09:41:11.950684 3ff8f05f010  5 -- op tracker -- seq:
> > 3499479, time: 2015-03-25 09:41:11.950683, event: done, op: osd_op(c\
> > lient.8322.0:1640 benchmark_data_b0c-upstairs_5647_object343 [read
> 0~4194304] 1.5c587e9e ack+read+known_if_redirected e316)
> > -9> 2015-03-25 09:41:11.951356 3ff8659f010  1 --
> > 10.236.136.224:6804/4928 <== client.8322 10.236.136.224:0/1020871 256
> >  osd_op(clien\
> > t.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read
> > 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316) v5 
> > 201+0+0 (280\
> > 2495612 0 0) 0x1e67cd80 con 0x71f4c80
> > -8> 2015-03-25 09:41:11.951397 3ff8659f010  5 -- op tracker --
> > seq: 3499480, time: 2015-03-25 09:41:11.951205, event: header_read,
> > op: o\
> > sd_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read
> 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
> > -7> 2015-03-25 09:41:11.951411 3ff8659f010  5 -- op tracker --
> > seq: 3499480, time: 2015-03-25 09:41:11.951214, event: throttled, op:
> > osd\
> > _op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read
> 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
> > -6> 2015-03-25 09:41:11.951420 3ff8659f010  5 -- op tracker --
> > seq: 3499480, time: 2015-03-25 09:41:11.951351, event: all_read, op:
> > osd_\
> > op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read
> 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
> > -5> 2015-03-25 09:41:11.951429 3ff8659f010  5 -- op tracker --
> > seq: 3499480, time: 0.00, event: dispatched, op:
> > osd_op(client.8322.0\
> > :1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304]
> 1.f2b5749d ack+read+known_if_redirected e316)
> > -4> 2015-03-25 09:41:11.951561 3ff9205f010  5 -- op trac

seg fault in ceph-osd on aarch64

2015-03-26 Thread Deneau, Tom
I've been exercising the the 64-bit arm (aarch64) version of ceph.
This is from self-built rpms from the v0.93 snapshot.
The "cluster" is a single system with 6 hard drives, one osd each.
I've been letting it run with some rados bench and rados load-gen loops
and running bonnie++ on an rbd mount.

Occasionally (in the latest case after 2 days) I've seen ceph-osd crashes
like the one shown below.  (showing last 10 events as well).
If I am reading the objdump correctly this is from the while loop
in the following code in Pipe::connect

I assume this is not seen on ceph builds from other architectures?

What is the recommended way to get more information on this osd crash?
(looks like osd log levels are 0/5)

-- Tom Deneau, AMD



  if (reply.tag == CEPH_MSGR_TAG_SEQ) {
ldout(msgr->cct,10) << "got CEPH_MSGR_TAG_SEQ, reading acked_seq and 
writing in_seq" << dendl;
uint64_t newly_acked_seq = 0;
if (tcp_read((char*)&newly_acked_seq, sizeof(newly_acked_seq)) < 0) {
  ldout(msgr->cct,2) << "connect read error on newly_acked_seq" << 
dendl;
  goto fail_locked;
}
ldout(msgr->cct,2) << " got newly_acked_seq " << newly_acked_seq
   << " vs out_seq " << out_seq << dendl;
while (newly_acked_seq > out_seq) {
  Message *m = _get_next_outgoing();
  assert(m);
  ldout(msgr->cct,2) << " discarding previously sent " << m->get_seq()
 << " " << *m << dendl;
  assert(m->get_seq() <= newly_acked_seq);
  m->put();
  ++out_seq;
}
if (tcp_write((char*)&in_seq, sizeof(in_seq)) < 0) {
  ldout(msgr->cct,2) << "connect write error on in_seq" << dendl;
  goto fail_locked;
}
  }




  -10> 2015-03-25 09:41:11.950684 3ff8f05f010  5 -- op tracker -- seq: 3499479, 
time: 2015-03-25 09:41:11.950683, event: done, op: osd_op(c\
lient.8322.0:1640 benchmark_data_b0c-upstairs_5647_object343 [read 0~4194304] 
1.5c587e9e ack+read+known_if_redirected e316)
-9> 2015-03-25 09:41:11.951356 3ff8659f010  1 -- 10.236.136.224:6804/4928 
<== client.8322 10.236.136.224:0/1020871 256  osd_op(clien\
t.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 
1.f2b5749d ack+read+known_if_redirected e316) v5  201+0+0 (280\
2495612 0 0) 0x1e67cd80 con 0x71f4c80
-8> 2015-03-25 09:41:11.951397 3ff8659f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.951205, event: header_read, op: o\
sd_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 
0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
-7> 2015-03-25 09:41:11.951411 3ff8659f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.951214, event: throttled, op: osd\
_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 
0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
-6> 2015-03-25 09:41:11.951420 3ff8659f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.951351, event: all_read, op: osd_\
op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 
0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
-5> 2015-03-25 09:41:11.951429 3ff8659f010  5 -- op tracker -- seq: 
3499480, time: 0.00, event: dispatched, op: osd_op(client.8322.0\
:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d 
ack+read+known_if_redirected e316)
-4> 2015-03-25 09:41:11.951561 3ff9205f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.951560, event: reached_pg, op: os\
d_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 
0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
-3> 2015-03-25 09:41:11.951627 3ff9205f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.951627, event: started, op: osd_o\
p(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 
0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
-2> 2015-03-25 09:41:11.961959 3ff9205f010  1 -- 10.236.136.224:6804/4928 
--> 10.236.136.224:0/1020871 -- osd_op_reply(1642 benchmark_da\
ta_b0c-upstairs_5647_object411 [read 0~4194304] v0'0 uv2 ondisk = 0) v6 -- ?+0 
0x3b39340 con 0x71f4c80
-1> 2015-03-25 09:41:11.962043 3ff9205f010  5 -- op tracker -- seq: 
3499480, time: 2015-03-25 09:41:11.962043, event: done, op: osd_op(c\
lient.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 
1.f2b5749d ack+read+known_if_redirected e316)
 0> 2015-03-25 09:41:12.030725 3ff8619f010 -1 *** Caught signal 
(Segmentation fault) **
 in thread 3ff8619f010

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xacf140]
 2: [0x3ffa9520510]
 3: (Pipe::connect()+0x301c) [0xc8c37c]
 4: (Pipe::Writer::entry()+0x10) [0xc96b9c]
 5: (Thread::entry_wrapper()+0x50) [0xba3bec]
 6: (()+0x6f30) [0x3ffa9116f30]
 7: (()+0xdd910) [0x3ffa8d8d910]
 NOTE: a copy of the executable, or `objdump -rdS ` i

RE: packages on download.ceph.com

2015-03-11 Thread Deneau, Tom


> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: Monday, March 09, 2015 4:05 PM
> To: Danny Al-Gaaf
> Cc: Mark Nelson; Deneau, Tom; ceph-devel
> Subject: Re: packages on download.ceph.com
> 
> On Mon, 9 Mar 2015, Danny Al-Gaaf wrote:
> > Am 09.03.2015 um 20:35 schrieb Mark Nelson:
> > > On 03/09/2015 02:06 PM, Deneau, Tom wrote:
> > >> I'm trying to gather information on what it would take to get
> > >> packages for an architecture other than x86_64 up on
> > >> http://download.ceph.com
> 
> I'm guessing you're interested in 64-bit arm (aarch64?)?
> 
> I would love to have packages built for all architectures people are
> interested in.  In practice, there are two limitations: build hardware and
> the maintenance overhead.  For example, we have all this armv7l gear but
> stopped doing builds because it was time consuming to keep it running.
> The moment someone signs up to do that management we can set up vpn access to
> the lab and add it back it.  The other problem we hit was that the armv7l
> builds took so much longer than the x86_64 ones and the current release
> process is easier when it's all done it one go.  I suspect what we'd end up
> with is a situation where the packages for some architectures get posted
> before others (which probably isn't a big deal).  Alfredo, you should chime
> in if there are other reasons why this would make things harder.
> 
> > It would be much easier to use OpenBuildService [1] for package build.
> > It supports many distributions and architectures.
> >
> > If you don't care that it's openSUSE infrastructure you/we could use
> > build.opensuse.org to build packages e.g. for RHEL/Centos/Fedora,
> > openSUSE/SLES, Debian, Ubuntu and others (I did so in the past.)
> >
> > At least openSUSE/SLES packages could be also build on armv7l and e.g.
> > ppc/s390x ... for other distros we have to check.
> >
> > The question is: should we build packages (and which) or is this more
> > a task for the distributions?
> 
> I tried OBS way back when but found it difficult to use and not particularly
> flexible.  My main concern is that we will run into problems and not have the
> ability to address them (for example, missing or broken distro dependencies
> or something like that).
> 
> More generally, I think it has been hugely valuable to have up to date
> packages on ceph.com as the distros tend to relatively slow to release
> things.  Users also typically choose between several different major ceph
> releases.  I'm pretty hesitant to abandon this...
> 
> sage

Sage --

Yes, my original question was for aarch64.

On that topic, since Ceph is part of Red Hat,
can the Ceph developers lobby for getting some version of Ceph
included in the first RHEL aarch64 release?  My understanding is that currently
Ceph packages are not planned to be included in the default package lists.

-- Tom


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: packages on download.ceph.com

2015-03-09 Thread Deneau, Tom


> -Original Message-
> From: Danny Al-Gaaf [mailto:danny.al-g...@bisect.de]
> Sent: Monday, March 09, 2015 3:46 PM
> To: Mark Nelson; Deneau, Tom; ceph-devel
> Subject: Re: packages on download.ceph.com
> 
> Am 09.03.2015 um 20:35 schrieb Mark Nelson:
> >
> >
> > On 03/09/2015 02:06 PM, Deneau, Tom wrote:
> >> I'm trying to gather information on what it would take to get packages
> >> for an architecture other than x86_64 up on http://download.ceph.com
> >
> > What we've done in the past for certain non-X86 architectures (such as
> > ARM) is to get build nodes in place that can be used with our gitbuilder
> > setup to continuously make development builds:
> >
> > http://www.ceph.com/gitbuilder.cgi
> >
> > Which get put here:
> >
> > http://gitbuilder.ceph.com/
> >
> > Usually the problem is two fold:  Someone's time to set it up, and
> > making sure we have enough (or fast enough) build systems for that
> > architecture to keep up.
> 
> It would be much easier to use OpenBuildService [1] for package build.
> It supports many distributions and architectures.
> 
> If you don't care that it's openSUSE infrastructure you/we could use
> build.opensuse.org to build packages e.g. for RHEL/Centos/Fedora,
> openSUSE/SLES, Debian, Ubuntu and others (I did so in the past.)
> 
> At least openSUSE/SLES packages could be also build on armv7l and e.g.
> ppc/s390x ... for other distros we have to check.
> 
> The question is: should we build packages (and which) or is this more a
> task for the distributions?
> 
> Danny
> 
> [1] http://openbuildservice.org
> [2] https://build.opensuse.org

Building packages should indeed be a task for the distributions but some 
distros are
slow to add new packages so this would be a stopgap measure until they do.

-- Tom
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


packages on download.ceph.com

2015-03-09 Thread Deneau, Tom
I'm trying to gather information on what it would take to get packages
for an architecture other than x86_64 up on http://download.ceph.com 

-- Tom Deneau, AMD
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ceph-users] who is using radosgw with civetweb?

2015-02-26 Thread Deneau, Tom
Robert --

We are still having trouble with this.

Can you share your [client.radosgw.gateway] section of ceph.conf and
were there any other special things to be aware of?

-- Tom

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Robert LeBlanc
Sent: Thursday, February 26, 2015 12:27 PM
To: Sage Weil
Cc: Ceph-User; ceph-devel
Subject: Re: [ceph-users] who is using radosgw with civetweb?

Thanks, we were able to get it up and running very quickly. If it performs 
well, I don't see any reason to use Apache+fast_cgi. I don't have any problems 
just focusing on civetweb.

On Wed, Feb 25, 2015 at 2:49 PM, Sage Weil  wrote:
> On Wed, 25 Feb 2015, Robert LeBlanc wrote:
>> We tried to get radosgw working with Apache + mod_fastcgi, but due to 
>> the changes in radosgw, Apache, mode_*cgi, etc and the documentation 
>> lagging and not having a lot of time to devote to it, we abandoned it.
>> Where it the documentation for civetweb? If it is appliance like and 
>> easy to set-up, we would like to try it to offer some feedback on 
>> your question.
>
> In giant and hammer, it is enabled by default on port 7480.  On 
> firefly, you need to add the line
>
>  rgw frontends = fastcgi, civetweb port=7480
>
> to ceph.conf (you can of course adjust the port number if you like) 
> and radosgw will run standalone w/ no apache or anything else.
>
> sage
>
>
>>
>> Thanks,
>> Robert LeBlanc
>>
>> On Wed, Feb 25, 2015 at 12:31 PM, Sage Weil  wrote:
>> > Hey,
>> >
>> > We are considering switching to civetweb (the embedded/standalone 
>> > rgw web
>> > server) as the primary supported RGW frontend instead of the 
>> > current apache + mod-fastcgi or mod-proxy-fcgi approach.  
>> > "Supported" here means both the primary platform the upstream 
>> > development focuses on and what the downstream Red Hat product will 
>> > officially support.
>> >
>> > How many people are using RGW standalone using the embedded 
>> > civetweb server instead of apache?  In production?  At what scale?  
>> > What
>> > version(s) (civetweb first appeared in firefly and we've backported 
>> > most fixes).
>> >
>> > Have you seen any problems?  Any other feedback?  The hope is to 
>> > (vastly) simplify deployment.
>> >
>> > Thanks!
>> > sage
>> > ___
>> > ceph-users mailing list
>> > ceph-us...@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majord...@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: using radosgw with mod_proxy_fcgi

2015-02-16 Thread Deneau, Tom
So if I have Apache 2.4.9+, would I use
ProxyPass / unix:///tmp/.radosgw.sock|fcgi://localhost:9000/
and in ceph.conf use
rgw socket path = /tmp/.radosgw.sock

If so, how does the radosgw process know about port 9000?
Or do I need a "rgw frontends" statement to specify that?

-- Tom

-Original Message-
From: Yehuda Sadeh-Weinraub [mailto:yeh...@redhat.com] 
Sent: Wednesday, February 11, 2015 7:10 PM
To: Deneau, Tom
Cc: ceph-devel@vger.kernel.org
Subject: Re: using radosgw with mod_proxy_fcgi



- Original Message -
> From: "Tom Deneau" 
> To: ceph-devel@vger.kernel.org
> Sent: Wednesday, February 11, 2015 4:21:46 PM
> Subject: using radosgw with mod_proxy_fcgi
> 
> I am a Ceph novice and have the rados and rbd setups working and would 
> like to use the radosgw stack.
> 
> I am running on a platform (aarch64) for which there are no pre-built 
> binaries of the ceph patched apache and the ceph patched mod_fastcgi.  
> But since I gather from the mail lists that the future is just to use 
> mod_proxy_fcgi and since my apache install has mod_proxy_fcgi build in 
> already that seems like the way to go.
> 
> I realize the documentation to explain exactly how to set things up 
> with mod_proxy_fcgi is still in the works but if there is any 
> preliminary documentation on this, I'd be glad to try it out.
> 

That's the apache site config that I was using:


ServerName localhost
DocumentRoot /var/www/html

ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined

LogLevel debug


RewriteEngine On

RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

# RewriteRule ^/(.*) /?%{QUERY_STRING} 
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

SetEnv proxy-nokeepalive 1

ProxyPass / fcgi://127.0.01:9000/
#ProxyPass / unix:///tmp/.radosgw.sock|fcgi://localhost:9000/


Note that there are two options here, and it really depends your apache 
version. For vanilla apache 2.4.8+ (or maybe 2.4.7+) you can use the regular 
unix domain socket setup (as described in the docs), just modify the path 
appropriately. For older apaches (2.4.0+) you need to use tcp for fastcgi, and 
that requires somewhat different rgw configuration, e.g. have the following in 
your ceph.conf:

rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0

For apache 2.2 there's no mod-proxy-fcgi out of the box. I did backport that 
module, and you can find it here: https://github.com/ceph/mod-proxy-fcgi, but 
you'll need to compile it yourself at this point.

Note that mod-proxy-fcgi does not support 100-continue, so you'll need to add 
the following to your ceph.conf:

rgw print continue = false

Yehuda




RE: using radosgw with mod_proxy_fcgi

2015-02-12 Thread Deneau, Tom
Ken --

I am running httpd-2.4.10-9

I was able to get a conf set up to do a ProxyPass to a PHP-FPM with this command
   ProxyPassMatch ^/(.*\.php(/.*)?)$ fcgi://127.0.0.1:9000/var/www/mysite/$1

and that worked. (executing a phpinfo() script showed the Fastcgi connection)
I'm still working on the radosgw setup.

By the way, the page that talked about setting things up for the PHP-FPM
also had this Directory record in the conf file.  Do we need anything of that
sort for radosgw?


  Order allow,deny
  Allow from all
  AllowOverride FileInfo All
  # New directive needed in Apache 2.4.3:
  Require all granted


-- Tom

-Original Message-
From: Ken Dreyer [mailto:kdre...@redhat.com] 
Sent: Thursday, February 12, 2015 12:08 PM
To: Deneau, Tom; ceph-devel@vger.kernel.org
Subject: Re: using radosgw with mod_proxy_fcgi

On 02/11/2015 05:21 PM, Deneau, Tom wrote:
> I am running on a platform (aarch64) for which there are no pre-built 
> binaries of the ceph patched apache and the ceph patched mod_fastcgi.

Hi Tom,

The ceph-patched Apache had numerous outstanding CVEs, and I discourage users 
from running it any more. The distro-supplied Apache should be suitable. We're 
still working to update our docs regarding this.

What distribution are you running on aarch64, by the way? Dan or I are probably 
the ones who are going to be packaging mod_proxy_fcgi for Apache 2.2, so I'm 
curious to know what users are running.

- Ken
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


using radosgw with mod_proxy_fcgi

2015-02-11 Thread Deneau, Tom
I am a Ceph novice and have the rados and rbd setups working and would like to 
use the radosgw stack.

I am running on a platform (aarch64) for which there are no pre-built binaries 
of the ceph patched apache and the ceph patched mod_fastcgi.  But since I 
gather from the mail lists that the future is just to use mod_proxy_fcgi and 
since my apache install has mod_proxy_fcgi build in already that seems like the 
way to go.

I realize the documentation to explain exactly how to set things up with 
mod_proxy_fcgi is still in the works but if there is any preliminary 
documentation on this, I'd be glad to try it out.

-- Tom Deneau, AMD

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: using ceph-deploy on build after make install

2015-02-04 Thread Deneau, Tom
John --

Ah yes, I see that the init.d scripts are not installed by make install.
I tried copying the init.d scripts over manually and still had some problems.
If I really do want to install everything from a new build rather than piece by 
piece, should I just use rpmbuild?

-- Tom

-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Wednesday, February 04, 2015 2:35 AM
To: Deneau, Tom
Cc: ceph-devel@vger.kernel.org
Subject: Re: using ceph-deploy on build after make install

I suspect that your clue is "Failed to execute command:
/usr/sbin/service ceph" -- as a rule, service scripts are part of the 
per-distro packaging rather than make install.

Personally, if I'm installing systemwide I always start with some built 
packages, and if I need to substitute a home-built binary for debugging then I 
do so by directly overwriting it in /usr/bin/.

John

On Wed, Feb 4, 2015 at 12:58 AM, Deneau, Tom  wrote:
> New to ceph building but here is my situation...
>
> I have been successfully able to build ceph starting from
>git checkout firefly
> (also successful from git checkout master).  After building, I am able 
> to run vstarth.sh from the source directory as ./vstart.sh -d -n -x 
> (or with -X).  I can then do rados commands such as rados bench.
>
> I should also add that when I have installed binaries from rpms (this 
> is a fedora21 aarch64 system), I have been successfully able to deploy 
> a cluster using various ceph-deploy commands.
>
> Now I would like to do make install to install my built version and 
> then use the installed version with my ceph-deploy commands.  In this 
> case I installed ceph-deploy with pip install ceph-deploy which gives 
> me 5.21.
>
> The first ceph-deploy command I use is:
>
> ceph-deploy new myhost
>
> which seems to work fine.
>
> The next command however is
>
>  ceph-deploy mon create-initial
>
> which ends up failing with
>
> [INFO  ] Running command: /usr/sbin/service ceph -c 
> /etc/ceph/ceph.conf start mon.hostname [WARNIN] The service command 
> supports only basic LSB actions (start, stop, restart, try-rest\ art, reload, 
> force-reload, status). For other actions, please try to use systemctl.
> [ERROR ] RuntimeError: command returned non-zero exit status: 2 [ERROR 
> ] Failed to execute command: /usr/sbin/service ceph -c 
> /etc/ceph/ceph.conf start mon.seattle-tdeneau [ERROR ] GenericError: 
> Failed to create 1 monitors
>
> and even the ceph status command fails
>
> #  ceph -c ./ceph.conf status
>  -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for 
> authentication
>   0 librados: client.admin initialization error (2) No such file or 
> directory Error connecting to cluster: ObjectNotFound
>
> Whereas this all worked fine when I used binaries from rpms.
> Is there some install step that I am missing?
>
> -- Tom Deneau, AMD
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majord...@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html


RE: using ceph-deploy on build after make install

2015-02-04 Thread Deneau, Tom
I noticed an rpmbuild starting from ceph-0.80.8.tar.bzw got the following error:

  cc1: error: -Wformat-security ignored without -Wformat 
[-Werror=format-security]
  cc1: some warnings being treated as errors
  Makefile:10333: recipe for target 'test/librbd/ceph_test_librbd_fsx-fsx.o' 
failed

-Original Message-----
From: Deneau, Tom 
Sent: Wednesday, February 04, 2015 11:54 AM
To: 'John Spray'
Cc: ceph-devel@vger.kernel.org
Subject: RE: using ceph-deploy on build after make install

John --

Ah yes, I see that the init.d scripts are not installed by make install.
I tried copying the init.d scripts over manually and still had some problems.
If I really do want to install everything from a new build rather than piece by 
piece, should I just use rpmbuild?

-- Tom

-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Wednesday, February 04, 2015 2:35 AM
To: Deneau, Tom
Cc: ceph-devel@vger.kernel.org
Subject: Re: using ceph-deploy on build after make install

I suspect that your clue is "Failed to execute command:
/usr/sbin/service ceph" -- as a rule, service scripts are part of the 
per-distro packaging rather than make install.

Personally, if I'm installing systemwide I always start with some built 
packages, and if I need to substitute a home-built binary for debugging then I 
do so by directly overwriting it in /usr/bin/.

John

On Wed, Feb 4, 2015 at 12:58 AM, Deneau, Tom  wrote:
> New to ceph building but here is my situation...
>
> I have been successfully able to build ceph starting from
>git checkout firefly
> (also successful from git checkout master).  After building, I am able 
> to run vstarth.sh from the source directory as ./vstart.sh -d -n -x 
> (or with -X).  I can then do rados commands such as rados bench.
>
> I should also add that when I have installed binaries from rpms (this 
> is a fedora21 aarch64 system), I have been successfully able to deploy 
> a cluster using various ceph-deploy commands.
>
> Now I would like to do make install to install my built version and 
> then use the installed version with my ceph-deploy commands.  In this 
> case I installed ceph-deploy with pip install ceph-deploy which gives 
> me 5.21.
>
> The first ceph-deploy command I use is:
>
> ceph-deploy new myhost
>
> which seems to work fine.
>
> The next command however is
>
>  ceph-deploy mon create-initial
>
> which ends up failing with
>
> [INFO  ] Running command: /usr/sbin/service ceph -c 
> /etc/ceph/ceph.conf start mon.hostname [WARNIN] The service command 
> supports only basic LSB actions (start, stop, restart, try-rest\ art, reload, 
> force-reload, status). For other actions, please try to use systemctl.
> [ERROR ] RuntimeError: command returned non-zero exit status: 2 [ERROR 
> ] Failed to execute command: /usr/sbin/service ceph -c 
> /etc/ceph/ceph.conf start mon.seattle-tdeneau [ERROR ] GenericError: 
> Failed to create 1 monitors
>
> and even the ceph status command fails
>
> #  ceph -c ./ceph.conf status
>  -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for 
> authentication
>   0 librados: client.admin initialization error (2) No such file or 
> directory Error connecting to cluster: ObjectNotFound
>
> Whereas this all worked fine when I used binaries from rpms.
> Is there some install step that I am missing?
>
> -- Tom Deneau, AMD
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majord...@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

using ceph-deploy on build after make install

2015-02-03 Thread Deneau, Tom
New to ceph building but here is my situation...

I have been successfully able to build ceph starting from
   git checkout firefly
(also successful from git checkout master).  After building, I am able
to run vstarth.sh from the source directory as ./vstart.sh -d -n -x
(or with -X).  I can then do rados commands such as rados bench.

I should also add that when I have installed binaries from rpms (this
is a fedora21 aarch64 system), I have been successfully able to deploy
a cluster using various ceph-deploy commands.

Now I would like to do make install to install my built version and
then use the installed version with my ceph-deploy commands.  In this
case I installed ceph-deploy with pip install ceph-deploy which gives
me 5.21.

The first ceph-deploy command I use is:

ceph-deploy new myhost

which seems to work fine.

The next command however is

 ceph-deploy mon create-initial

which ends up failing with

[INFO  ] Running command: /usr/sbin/service ceph -c /etc/ceph/ceph.conf start 
mon.hostname
[WARNIN] The service command supports only basic LSB actions (start, stop, 
restart, try-rest\
art, reload, force-reload, status). For other actions, please try to use 
systemctl.
[ERROR ] RuntimeError: command returned non-zero exit status: 2
[ERROR ] Failed to execute command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.seattle-tdeneau
[ERROR ] GenericError: Failed to create 1 monitors

and even the ceph status command fails

#  ceph -c ./ceph.conf status
 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for 
authentication
  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound

Whereas this all worked fine when I used binaries from rpms.
Is there some install step that I am missing?

-- Tom Deneau, AMD  

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html