date:20131107

Any chance this option will be included for future emperor binaries? I don't 
mind compiling software, but I would like to keep things upgradable via apt-get 
…

Thanks,
Dinu 


On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote:

 Hi Dinu,
 
 You currently need to compile yourself, and pass --with-zfs to 
 ./configure.
 
 Once it is built in, ceph-osd will detect whether the underlying fs is zfs 
 on its own.
 
 sage
 
 
 
 On Wed, 6 Nov 2013, Dinu Vlad wrote:
 
 Hello,
 
 I'm testing the 0.72 release and thought to give a spin to the zfs support. 
 
 While I managed to setup a cluster on top of a number of zfs datasets, the 
 ceph-osd logs show it's using the genericfilestorebackend: 
 
 2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is NOT supported
 2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
 2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
 syscall fully supported (by glibc and kernel)
 
 I noticed however that the ceph sources include some files related to zfs: 
 
 # find . | grep -i zfs
 ./src/os/ZFS.cc
 ./src/os/ZFS.h
 ./src/os/ZFSFileStoreBackend.cc
 ./src/os/ZFSFileStoreBackend.h 
 
 A coupel of questions: 
 
 - is 0.72-rc1 package currently in the raring repository compiled with zfs 
 support ? 
 - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? 
 
 Thanks,
 Dinu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.72 with zfs

2013-11-07 Thread Sage Weil

The challenge here is that libzfs is currently a build time dependency, which 
means it needs to be included in the target distro already, or we need to 
bundle it in the Ceph.com repos.

I am currently looking at the possibility of making the OSD back end 
dynamically linked at runtime, which would allow a separately packaged zfs back 
end; that may (or may not!) help.

sage

Dinu Vlad dinuvla...@gmail.com wrote:
Any chance this option will be included for future emperor binaries? I
don't mind compiling software, but I would like to keep things
upgradable via apt-get …

Thanks,
Dinu 


On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote:

 Hi Dinu,
 
 You currently need to compile yourself, and pass --with-zfs to 
 ./configure.
 
 Once it is built in, ceph-osd will detect whether the underlying fs
is zfs 
 on its own.
 
 sage
 
 
 
 On Wed, 6 Nov 2013, Dinu Vlad wrote:
 
 Hello,
 
 I'm testing the 0.72 release and thought to give a spin to the zfs
support. 
 
 While I managed to setup a cluster on top of a number of zfs
datasets, the ceph-osd logs show it's using the
genericfilestorebackend: 
 
 2013-11-06 09:27:59.386392 7fdfee0ab7c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
FIEMAP ioctl is NOT supported
 2013-11-06 09:27:59.386409 7fdfee0ab7c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
 2013-11-06 09:27:59.391026 7fdfee0ab7c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
 
 I noticed however that the ceph sources include some files related
to zfs: 
 
 # find . | grep -i zfs
 ./src/os/ZFS.cc
 ./src/os/ZFS.h
 ./src/os/ZFSFileStoreBackend.cc
 ./src/os/ZFSFileStoreBackend.h 
 
 A coupel of questions: 
 
 - is 0.72-rc1 package currently in the raring repository compiled
with zfs support ? 
 - if yes - how can I inform ceph-osd to use the
ZFSFileStoreBackend ? 
 
 Thanks,
 Dinu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

I had great results from the older 530 series too.

In this case however, the SSDs were only used for journals and I don't know if
ceph-osd sends TRIM to the drive in the process of journaling over a block
device. They were also under-subscribed, with just 3 x 10G partitions out of
240 GB raw capacity. I did a manual trim, but it hasn't changed anything.

I'm still having fun with the configuration so I'll be able to use Mike
Dawson's suggested tools to check for latencies.

On Nov 6, 2013, at 11:35 PM, ja...@peacon.co.uk wrote:

On 2013-11-06 20:25, Mike Dawson wrote:

We just fixed a performance issue on our cluster related to spikes of high
latency on some of our SSDs used for osd journals. In our case, the slow
SSDs showed spikes of 100x higher latency than expected.

Many SSDs show this behaviour when 100% provisioned and/or never TRIM'd,
since the pool of ready erased cells is quickly depleted under steady write
workload, so it has to wait for cells to charge to accommodate the write.

The Intel 3700 SSDs look to have some of the best consistency ratings of any
of the more reasonably priced drives at the moment, and good IOPS too:

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

Obviously the quoted IOPS numbers are dependent on quite a deep queue mind.

There is a big range of performance in the market currently; some Enterprise
SSDs are quoted at just 4,000 IOPS yet cost as many pounds!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.72 with zfs

Looking forward to it. Tests done so far show some interesting results - so I'm 
considering it for future production use.  

On Nov 7, 2013, at 1:01 PM, Sage Weil s...@newdream.net wrote:

 The challenge here is that libzfs is currently a build time dependency, which 
 means it needs to be included in the target distro already, or we need to 
 bundle it in the Ceph.com repos.
 
 I am currently looking at the possibility of making the OSD back end 
 dynamically linked at runtime, which would allow a separately packaged zfs 
 back end; that may (or may not!) help.
 
 sage
 
 
 
 Dinu Vlad dinuvla...@gmail.com wrote:
 Any chance this option will be included for future emperor binaries? I don't 
 mind compiling software, but I would like to keep things upgradable via 
 apt-get …
 
 Thanks,
 Dinu 
 
 
 On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote:
 
  Hi Dinu,
  
  You currently need to compile yourself, and pass --with-zfs to 
  ./configure.
  
  Once it is built in, ceph-osd will detect whether the underlying fs is zfs 
  on its own.
  
  sage
  
  
  
  On Wed, 6 Nov 2013, Dinu Vlad wrote:
  
  Hello,
  
  I'm testing the 0.72 release and thought to give a spin to the zfs support. 
  
  While I managed to setup a cluster
 on top of a number of zfs datasets, the ceph-osd logs show it's using the 
 genericfilestorebackend: 
  
  2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is NOT supported
  2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
  2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
 syscall fully supported (by glibc and kernel)
  
  I noticed however that the ceph sources include some files related to zfs: 
  
  # find . | grep -i zfs
  ./src/os/ZFS.cc
  ./src/os/ZFS.h
  ./src/os/ZFSFileStoreBackend.cc
  ./src/os/ZFSFileStoreBackend.h 
  
  A coupel of questions: 
  
  - is 0.72-rc1 package
 currently in the raring repository compiled with zfs support ? 
  - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? 
  
  Thanks,
  Dinu
 
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ANN] ceph-deploy 1.3 released!

2013-11-07 Thread nicolasc


Hi every one,

The version 1.3 of ceph-deploy I installed yesterday from official repo 
used:

sudo wget ... | apt-key add
to install which failed because the apt-key command was not run with 
sudo, but the version 1.3.1 I got this morning seems to work (no pipe 
anymore, it uses a file, and sudo for both commands).


The version 1.3 also used Python's os.rename() in a weird way, which 
triggered errors about cross-device renaming (my root filesystem has 
separate mountpoints for /tmp and /var), for instance with ceph-deploy 
config push, or when ceph-deploy osd create would call 
write_keyring(), and this bug also disappeared in 1.3.1.


So, to sum up: all bugs went away, so I am happy

Could you indicate where release notes for ceph-deploy may be found, so 
I do not have to blindly struggle with that kind of issue again?


Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ANN] ceph-deploy 1.3 released!

On Thu, Nov 7, 2013 at 7:53 AM, nicolasc nicolas.cance...@surfsara.nl wrote:
 Hi every one,

 The version 1.3 of ceph-deploy I installed yesterday from official repo
 used:
 sudo wget ... | apt-key add
 to install which failed because the apt-key command was not run with sudo,
 but the version 1.3.1 I got this morning seems to work (no pipe anymore, it
 uses a file, and sudo for both commands).

 The version 1.3 also used Python's os.rename() in a weird way, which
 triggered errors about cross-device renaming (my root filesystem has
 separate mountpoints for /tmp and /var), for instance with ceph-deploy
 config push, or when ceph-deploy osd create would call write_keyring(),
 and this bug also disappeared in 1.3.1.

 So, to sum up: all bugs went away, so I am happy

 Could you indicate where release notes for ceph-deploy may be found, so I do
 not have to blindly struggle with that kind of issue again?

You are correct, all of those issues have been corrected and released
as ceph-deploy 1.3.1

The problem here is that you beat me to the punch :)

We try to release ceph-deploy as often as possible and even more so
when there are bugs that are clearly
causing issues for users and preventing them to complete core
functionality (like installing).

The announcement with the complete changelog (and link) is not sent
out immediately however, because the packaging and
repository synchronization can take a few hours, so I usually wait
until that is complete to make sure the announcement
goes out when all the repositories are in sync and there were no
issues getting ceph-deploy out.

You can find the changelog here:
https://github.com/ceph/ceph-deploy/blob/master/docs/source/changelog.rst

But I should get the announcement out next.

Sorry for all the trouble!


 Best regards,

 Nicolas Canceill
 Scalable Storage Systems
 SURFsara (Amsterdam, NL)



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.3.1 released

Hi All,

There is a new (bug-fix) release of ceph-deploy, the easy deployment
tool for Ceph.

There were a couple of issues related to GPG keys when installing in
Debian and Debian-based distros that where addressed.

A fix was added to improve moving temporary files to overwrite other
files like ceph.conf that was preventing some OSD operations.

The full changelog can be found at:
https://github.com/ceph/ceph-deploy/blob/master/docs/source/changelog.rst

Make sure you update!

Thanks,


Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] computing PG IDs

2013-11-07 Thread Kenneth Waegeman


Hi everyone,

I just started to look at the documentation of Ceph and I've hit  
something I don't understand.

It's about something on http://ceph.com/docs/master/architecture/


use the following steps to compute PG IDs.

The client inputs the pool ID and the object ID. (e.g., pool =  
?liverpool? and object-id = ?john?)

CRUSH takes the object ID and hashes it.
-- CRUSH calculates the hash modulo the number of OSDs. (e.g., 0x58)  
to get a PG ID.  ---

CRUSH gets the pool ID given the pool name (e.g., ?liverpool? = 4)
CRUSH prepends the pool ID to the PG ID (e.g., 4.0x58).


Shouldn't this be 'CRUSH calculates the hash modulo the the number of  
PGs to get a PG ID' ?


But then what happens if you add more PGs to the pool? Then most of  
the data will be reallocated to another PG?


Thanks for your help!

Kind Regards,
Kenneth Waegeman

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Can't activate OSD with journal and data on the same disk

2013-11-07 Thread Michael Lukzak

Hi!

I have a question about activating OSD on whole disk. I can't bypass this issue.
Conf spec: 8 VMs - ceph-deploy; ceph-admin; ceph-mon0-2 and ceph-node0-2;

I started from creating MON - all good .
After that I want to prepare and activate 3x OSD with dm-crypt.

So I put on ceph.conf this

[osd.0]
host = ceph-node0
cluster addr = 10.0.0.75:6800
public addr = 10.0.0.75:6801
devs = /dev/sdb

Next I use ceph-deploy to activate a OSD and this shows

root@ceph-deploy:~/ceph# ceph-deploy osd prepare ceph-node0:/dev/sdb --dmcrypt
[ceph_deploy.cli][INFO  ] Invoked (1.3.1): /usr/bin/ceph-deploy osd prepare 
ceph-node0:/dev/sdb --dmcrypt
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-node0:/dev/sdb:
[ceph-node0][DEBUG ] connected to host: ceph-node0
[ceph-node0][DEBUG ] detect platform information from remote host
[ceph-node0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-node0
[ceph-node0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-node0][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph-node0 disk /dev/sdb journal None 
activate False
[ceph-node0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --dmcrypt 
--dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb
[ceph-node0][ERROR ] INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[ceph-node0][ERROR ] ceph-disk: Error: partition 1 for /dev/sdb does not appear 
to exist
[ceph-node0][DEBUG ] Information: Moved requested sector from 34 to 2048 in
[ceph-node0][DEBUG ] order to align on 2048-sector boundaries.
[ceph-node0][DEBUG ] The operation has completed successfully.
[ceph-node0][DEBUG ] Information: Moved requested sector from 2097153 to 
2099200 in
[ceph-node0][DEBUG ] order to align on 2048-sector boundaries.
[ceph-node0][DEBUG ] Warning: The kernel is still using the old partition table.
[ceph-node0][DEBUG ] The new table will be used at the next reboot.
[ceph-node0][DEBUG ] The operation has completed successfully.
[ceph-node0][ERROR ] Traceback (most recent call last):
[ceph-node0][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, 
in run
[ceph-node0][ERROR ] reporting(conn, result, timeout)
[ceph-node0][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in 
reporting
[ceph-node0][ERROR ] received = result.receive(timeout)
[ceph-node0][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py,
 line 455, in receive
[ceph-node0][ERROR ] raise self._getremoteerror() or EOFError()
[ceph-node0][ERROR ] RemoteError: Traceback (most recent call last):
[ceph-node0][ERROR ]   File string, line 806, in executetask
[ceph-node0][ERROR ]   File , line 35, in _remote_run
[ceph-node0][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph-node0][ERROR ]
[ceph-node0][ERROR ]
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk-prepare 
--fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph 
-- /dev/sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs


It's looks like ceph-disk-prepare can't mount (activate?) the one of disk.
So I go to ceph-node0 and listed disk, this shows:

root@ceph-node0:~# ls /dev/sd
sda   sda1  sda2  sda5  sdb   sdb2

Ups - there are no sdb1. 

So I printed all partitions on /dev/sdb and there is two:

Number  Beg End  Size  Filesystem  Name Flags
 2 1049kB1074MB  1073MB  ceph journal
 1 1075MB16,1GB  15,0GB  ceph data

Where sdb1 should be for data and sdb2 for journal. 

When I restart the VM /dev/sdb1 start showing.
root@ceph-node0:~# ls /dev/sd
sda   sda1  sda2  sda5  sdb   sdb1   sdb2 
But I cant mount 

When I put journal to separate file/disk, there is no problem with activating 
(journal are on separate disk, and all partition data are on sdb1).
There is log from this acction (I put journal to file in /mnt/sdb2)

root@ceph-deploy:~/ceph# ceph-deploy osd prepare ceph-node0:/dev/sdb:/mnt/sdb2 
--dmcrypt
[ceph_deploy.cli][INFO  ] Invoked (1.3.1): /usr/bin/ceph-deploy osd prepare 
ceph-node0:/dev/sdb:/mnt/sdb2 --dmcrypt
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph-node0:/dev/sdb:/mnt/sdb2
[ceph-node0][DEBUG ] connected to host: ceph-node0
[ceph-node0][DEBUG ] detect platform information from remote host
[ceph-node0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-node0
[ceph-node0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-node0][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph-node0 disk /dev/sdb journal 
/mnt/sdb2

[ceph-users] please help me.problem with my ceph

2013-11-07 Thread ????

1.  I have installed ceph with one mon/mds and one osd.When i use 'ceph 
-s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck 
unclean; recovery 21/42 degraded (50.000%)
2. i mount a client.'192.168.3.189:/   100G 1009M   97G   2% /mnt/ceph' 
 but i can't creat a file or a directory because of no permission.
my conf is listed bellow.please tell my how to fix these problems,thanks
;
; Sample ceph ceph.conf file.
;
; This file defines cluster membership, the various locations
; that Ceph stores data, and any other runtime options.


; If a 'host' is defined for a daemon, the init.d start/stop script will
; verify that it matches the hostname (or else ignore it).  If it is
; not defined, it is assumed that the daemon is intended to start on
; the current host (e.g., in a setup with a startup.conf on each
; node).


; The variables $type, $id and $name are available to use in paths
; $type = The type of daemon, possible values: mon, mds and osd
; $id = The ID of the daemon, for mon.alpha, $id will be alpha
; $name = $type.$id


; For example:
; osd.0
;  $type = osd
;  $id = 0
;  $name = osd.0


; mon.beta
;  $type = mon
;  $id = beta
;  $name = mon.beta


; global
[global]
; enable secure authentication
auth supported = cephx


; allow ourselves to open a lot of files
max open files = 131072


; set log file
log file = /var/log/ceph/$name.log
; log_to_syslog = true; uncomment this line to log to syslog


; set up pid files
pid file = /var/run/ceph/$name.pid


; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't 
possible
;ms bind ipv6 = true


; monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any node failures.  Always create an odd number.
[mon]
mon data = /data/$name


; If you are using for example the RADOS Gateway and want to have your 
newly created
; pools a higher replication level, you can set a default
osd pool default size = 1


; You can also specify a CRUSH rule for new pools
; Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
;osd pool default crush rule = 0


; Timing is critical for monitors, but if you want to allow the clocks 
to drift a
; bit more, you can specify the max drift.
;mon clock drift allowed = 1


; Tell the monitor to backoff from this warning for 30 seconds
;mon clock drift warn backoff = 30


; logging, for debugging monitor crashes, in order of
; their likelihood of being helpful :)
;debug ms = 1
;debug mon = 20
;debug paxos = 20
;debug auth = 20


[mon.alpha]
host = ca189
mon addr = 192.168.3.189:6789




; mds
;  You need at least one.  Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /data/keyring.$name


; mds logging to debug issues.
;debug ms = 1
;debug mds = 20


[mds.alpha]
host = ca189


; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like.
[osd]
; This is where the osd expects its data
osd data = /data/$name


; Ideally, make the journal a separate disk or partition.
; 1-10GB should be enough; more if you have fast or many
; disks.  You can use a file under the osd data dir if need be
; (e.g. /data/$name/journal), but it will be slower than a
; separate disk or partition.
; This is an example of a file-based journal.
osd journal = /data/$name/journal
osd journal size = 1000 ; journal size, in megabytes


; If you want to run the journal on a tmpfs (don't), disable DirectIO
;journal dio = false


; You can change the number of recovery operations to speed up recovery
; or slow it down if your machines can't handle it
; osd recovery max active = 3


; osd logging to debug osd issues, in order of likelihood of being
; helpful
;debug ms = 1
;debug osd = 20
;debug filestore = 20
;debug journal = 20




; ### The below options only apply if you're using mkcephfs
; ### and the devs options
; The filesystem used on the volumes
osd mkfs type = btrfs
; If you want to specify some other mount options, you can do so.
; for other filesystems use 'osd mount options $fstype'
osd mount options btrfs = rw,noatime
; The options used to format the filesystem via mkfs.$fstype
; for other filesystems use 'osd mkfs options $fstype'
; osd mkfs options btrfs =




[osd.0]
host = ca191


; if 'devs' is not specified, you're responsible for
; setting up the 'osd data' dir.
devs =

Re: [ceph-users] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel), Requires: python-sphinx

On Wed, Nov 6, 2013 at 8:25 PM, Eyal Gutkind ey...@mellanox.com wrote:
 Trying to install ceph on my machines.

 Using RHEL6.3 I get the following error while invoking ceph-deploy.



 Tried to install sphinx on ceph-node, seems to be success full and
 installed.

 Still, it seems that during the installation there is an unresolved
 dependency.

This looks like you don't have the right repos enabled on your box. I
think you need to enable the EPEL
repositories to resolve these.





 [apollo006][INFO  ] Running command: sudo yum -y -q install ceph

 [apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel)

 [apollo006][ERROR ]Requires: python-sphinx





 Below is the deploying command line





 $ ceph-deploy install apollo006

 [ceph_deploy.cli][INFO  ] Invoked (1.3): /usr/bin/ceph-deploy install
 apollo006

 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster
 ceph hosts apollo006

 [ceph_deploy.install][DEBUG ] Detecting platform for host apollo006 ...

 [apollo006][DEBUG ] connected to host: apollo006

 [apollo006][DEBUG ] detect platform information from remote host

 [apollo006][DEBUG ] detect machine type

 [ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server
 6.3 Santiago

 [apollo006][INFO  ] installing ceph on apollo006

 [apollo006][INFO  ] Running command: sudo rpm --import
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [apollo006][INFO  ] Running command: sudo rpm -Uvh --replacepkgs
 http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm

 [apollo006][DEBUG ] Retrieving
 http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm

 [apollo006][DEBUG ] Preparing...
 ##

 [apollo006][DEBUG ] ceph-release
 ##

 [apollo006][INFO  ] Running command: sudo yum -y -q install ceph

 [apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel)

 [apollo006][ERROR ]Requires: python-sphinx

 [apollo006][DEBUG ]  You could try using --skip-broken to work around the
 problem

 [apollo006][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest

 [apollo006][ERROR ] Traceback (most recent call last):

 [apollo006][ERROR ]   File
 /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/process.py, line
 68, in run

 [apollo006][ERROR ] reporting(conn, result, timeout)

 [apollo006][ERROR ]   File
 /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/log.py, line 13,
 in reporting

 [apollo006][ERROR ] received = result.receive(timeout)

 [apollo006][ERROR ]   File
 /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py,
 line 455, in receive

 [apollo006][ERROR ] raise self._getremoteerror() or EOFError()

 [apollo006][ERROR ] RemoteError: Traceback (most recent call last):

 [apollo006][ERROR ]   File string, line 806, in executetask

 [apollo006][ERROR ]   File , line 35, in _remote_run

 [apollo006][ERROR ] RuntimeError: command returned non-zero exit status: 1

 [apollo006][ERROR ]

 [apollo006][ERROR ]

 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y -q
 install ceph



 Thank you  for your help,

 EyalG


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Havana RBD - a few problems

2013-11-07 Thread Jens-Christian Fischer

Hi all

we have installed a Havana OpenStack cluster with RBD as the backing storage 
for volumes, images and the ephemeral images. The code as delivered in 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498
 fails, because the RBD.path it not set. I have patched this to read:

@@ -419,10 +419,12 @@ class Rbd(Image):
 if path:
 try:
 self.rbd_name = path.split('/')[1]
+self.path = path
 except IndexError:
 raise exception.InvalidDevicePath(path=path)
 else:
 self.rbd_name = '%s_%s' % (instance['name'], disk_name)
+self.path = 'volumes/%s' % self.rbd_name
 self.snapshot_name = snapshot_name
 if not CONF.libvirt_images_rbd_pool:
 raise RuntimeError(_('You should specify'

but am not sure this is correct. I have the following problems:

1) can't inject data into image

2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver 
[req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
2096e25f5e814882b5907bc5db342308] [instance: 
2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into
 image fc8179d4-14f3-4f21-a76d-72b03b5c1862
2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api 
[req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image 
(Error mounting volumes/ instance-
0089_disk with libguestfs (volumes/instance-0089_disk: No such file or 
directory))

possibly the self.path = … is wrong - but what are the correct values?


2) Creating a new instance from an ISO image fails completely - no bootable 
disk found, says the KVM console. Related?

3) When creating a new instance from an image (non ISO images work), the disk 
is not resized to the size specified in the flavor (but left at the size of the 
original image)

I would be really grateful, if those people that have Grizzly/Havana running 
with an RBD backend could pipe in here…

thanks
Jens-Christian


-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/socialmedia

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Gruher, Joseph R

-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Dinu Vlad
Sent: Thursday, November 07, 2013 3:30 AM
To: ja...@peacon.co.uk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster performance

In this case however, the SSDs were only used for journals and I don't know if
ceph-osd sends TRIM to the drive in the process of journaling over a block
device. They were also under-subscribed, with just 3 x 10G partitions out of
240 GB raw capacity. I did a manual trim, but it hasn't changed anything.

If your SSD capacity is well in excess of your journal capacity requirements 
you could consider overprovisioning the SSD.  Overprovisioning should increase 
SSD performance and lifetime.  This achieves the same effect as trim to some 
degree (lets the SSD better understand what cells have real data and which can 
be treated as free).  I wonder how effective trim would be on a Ceph journal 
area.  If the journal empties and is then trimmed the next write cycle should 
be faster, but if the journal is active all the time the benefits would be lost 
almost immediately, as those cells are going to receive data again almost 
immediately and go back to an untrimmed state until the next trim occurs.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Havana RBD - a few problems

Under grizzly we disabled completely the image injection via 
libvirt_inject_partition = -2 in nova.conf. I'm not sure rbd images can even be 
mounted that way - but then again, I don't have experience with havana. We're 
using config disks (which break live migrations) and/or the metadata service 
(which does not) in combination with cloud-init, to bootstrap instances. 

On Nov 7, 2013, at 6:15 PM, Jens-Christian Fischer 
jens-christian.fisc...@switch.ch wrote:

 Hi all
 
 we have installed a Havana OpenStack cluster with RBD as the backing storage 
 for volumes, images and the ephemeral images. The code as delivered in 
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498
  fails, because the RBD.path it not set. I have patched this to read:
 
 @@ -419,10 +419,12 @@ class Rbd(Image):
  if path:
  try:
  self.rbd_name = path.split('/')[1]
 +self.path = path
  except IndexError:
  raise exception.InvalidDevicePath(path=path)
  else:
  self.rbd_name = '%s_%s' % (instance['name'], disk_name)
 +self.path = 'volumes/%s' % self.rbd_name
  self.snapshot_name = snapshot_name
  if not CONF.libvirt_images_rbd_pool:
  raise RuntimeError(_('You should specify'
 
 but am not sure this is correct. I have the following problems:
 
 1) can't inject data into image
 
 2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver 
 [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
 2096e25f5e814882b5907bc5db342308] [instance: 
 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into
  image fc8179d4-14f3-4f21-a76d-72b03b5c1862
 2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api 
 [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
 2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image 
 (Error mounting volumes/ instance-
 0089_disk with libguestfs (volumes/instance-0089_disk: No such file or 
 directory))
 
 possibly the self.path = … is wrong - but what are the correct values?
 
 
 2) Creating a new instance from an ISO image fails completely - no bootable 
 disk found, says the KVM console. Related?
 
 3) When creating a new instance from an image (non ISO images work), the disk 
 is not resized to the size specified in the flavor (but left at the size of 
 the original image)
 
 I would be really grateful, if those people that have Grizzly/Havana running 
 with an RBD backend could pipe in here…
 
 thanks
 Jens-Christian
 
 
 -- 
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/socialmedia
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Mark Nelson

On 11/07/2013 11:47 AM, Gruher, Joseph R wrote:

-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Dinu Vlad
Sent: Thursday, November 07, 2013 3:30 AM
To: ja...@peacon.co.uk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster performance

In this case however, the SSDs were only used for journals and I don't know if
ceph-osd sends TRIM to the drive in the process of journaling over a block
device. They were also under-subscribed, with just 3 x 10G partitions out of
240 GB raw capacity. I did a manual trim, but it hasn't changed anything.

If your SSD capacity is well in excess of your journal capacity requirements you could 
consider overprovisioning the SSD.  Overprovisioning should increase SSD performance and 
lifetime.  This achieves the same effect as trim to some degree (lets the SSD better 
understand what cells have real data and which can be treated as free).  I wonder how 
effective trim would be on a Ceph journal area.  If the journal empties and is then 
trimmed the next write cycle should be faster, but if the journal is active all the time 
the benefits would be lost almost immediately, as those cells are going to receive data 
again almost immediately and go back to an untrimmed state until the next 
trim occurs.

over-provisioning is definitely something to consider, especially if you 
aren't buying SSDs with high write endurance.  The more cells you can 
spread the load out over the better.  We've had some interesting 
conversations on here in the past about whether or not it's more cost 
effective to buy large capacity consumer grade SSDs with more cells or 
shell out for smaller capacity enterprise grade drives.  My personal 
opinion is that it's worth paying a bit extra for a drive that employs 
something like MLC-HET, but there's a lot of enterprise grade drives 
out there with low write endurance that you really have to watch out 
for.  If you are going to pay extra, at least get something with high 
write endurance and reasonable write speeds.

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread james


On 2013-11-07 17:47, Gruher, Joseph R wrote:


I wonder how effective trim would be on a Ceph journal area.
If the journal empties and is then trimmed the next write cycle 
should

be faster, but if the journal is active all the time the benefits
would be lost almost immediately, as those cells are going to receive
data again almost immediately and go back to an untrimmed state
until the next trim occurs.


If it's under-provisioned (so the device knows there are unused cells), 
the device would simply write to an empty cell and flag the old cell for 
erasing, so there should be no change.  Latency would rise when 
sustained write rate exceeded the devices' ability to clear cells, so 
eventually the stock of ready cells would be depleted.


FWIW, I think there is considerable mileage in the larger-consumer 
grade argument.  Assuming drives will be half the price in a years time, 
so selecting devices that can last only a year is preferable to spending 
3x the price on one that can survive three.  That though opens the tin 
of worms that is SMART reporting and moving journals at some future 
point mind.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Manual Installation steps without ceph-deploy

2013-11-07 Thread John Wilkins

I've seen this before too. CentOS starts up without networking on by
default. In my case, the problem was that the monitors cannot form a
quorum and OSDs cannot find each other or monitors. Hence, you get
that broken pipe error. You either need to have the networking start
on startup before the OSDs, or start ceph after you boot up and ensure
the network is running properly. The nodes have to be able to reach
each other for Ceph to work. As for Ubuntu, I believe the networking
is on by default.

On Wed, Nov 6, 2013 at 1:35 PM, Trivedi, Narendra
narendra.triv...@savvis.com wrote:
 Hi All,



 I did a fresh install of Ceph (this might be like 10th or 11th install) on 4
 new VMs (one admin, one MON and two OSDs) built from CentOS 6.4 (x64) .iso ,
 did a yum update on all of them. They are all running on vmware ESXi 5.1.0.
 I did everything sage et al suggested (i.e. creation of /ceph/osd* and
 making sure /etc/ceph is present on all nodes. /etc/ceph gets created all
 the ceph-deploy install and contains rbdmap FYI). Unusually, I ended up with
 the same problem while activating OSDs (the last 4 lines keep going on and
 on forever):



 2013-11-06 14:37:39,626 [ceph_deploy.cli][INFO  ] Invoked (1.3):
 /usr/bin/ceph-deploy osd activate ceph-node2-osd0-centos-6-4:/ceph/osd0
 ceph-node3-osd1-centos-6-4:/ceph/osd1

 2013-11-06 14:37:39,627 [ceph_deploy.osd][DEBUG ] Activating cluster ceph
 disks ceph-node2-osd0-centos-6-4:/ceph/osd0:
 ceph-node3-osd1-centos-6-4:/ceph/osd1:

 2013-11-06 14:37:39,901 [ceph-node2-osd0-centos-6-4][DEBUG ] connected to
 host: ceph-node2-osd0-centos-6-4

 2013-11-06 14:37:39,902 [ceph-node2-osd0-centos-6-4][DEBUG ] detect platform
 information from remote host

 2013-11-06 14:37:39,917 [ceph-node2-osd0-centos-6-4][DEBUG ] detect machine
 type

 2013-11-06 14:37:39,925 [ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4
 Final

 2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] activating host
 ceph-node2-osd0-centos-6-4 disk /ceph/osd0

 2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] will use init type:
 sysvinit

 2013-11-06 14:37:39,925 [ceph-node2-osd0-centos-6-4][INFO  ] Running
 command: sudo ceph-disk-activate --mark-init sysvinit --mount /ceph/osd0

 2013-11-06 14:37:40,145 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06
 14:37:41.075310 7fac2414c700  0 -- :/1029546  10.12.0.70:6789/0
 pipe(0x7fac20024480 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac200246e0).fault

 2013-11-06 14:37:43,167 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06
 14:37:44.071697 7fac1ebfd700  0 -- :/1029546  10.12.0.70:6789/0
 pipe(0x7fac14000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14000e60).fault

 2013-11-06 14:37:46,140 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06
 14:37:47.071938 7fac2414c700  0 -- :/1029546  10.12.0.70:6789/0
 pipe(0x7fac14003010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003270).fault

 2013-11-06 14:37:50,165 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06
 14:37:51.071245 7fac1ebfd700  0 -- :/1029546  10.12.0.70:6789/0
 pipe(0x7fac14003a70 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003cd0).fault



 It might be bad luck but I want to try a manual installation without
 ceph-deploy because it seems  I am jinxed with ceph-deploy. Could anyone
 please forward me the steps. I am happy to share the ceph.log with anyone
 who would like to research on this error but I don’t a have clue.





 Thanks a lot!

 Narendra Trivedi | savviscloud




 This message contains information which may be confidential and/or
 privileged. Unless you are the intended recipient (or authorized to receive
 for the intended recipient), you may not read, use, copy or disclose to
 anyone the message or any information contained in the message. If you have
 received the message in error, please advise the sender by reply e-mail and
 delete the message and any attachment(s) thereto without retaining any
 copies.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

I was under the same impression - using a small portion of the SSD via 
partitioning (in my case - 30 gigs out of 240) would have the same effect as 
activating the HPA explicitly. 

Am I wrong? 


On Nov 7, 2013, at 8:16 PM, ja...@peacon.co.uk wrote:

 On 2013-11-07 17:47, Gruher, Joseph R wrote:
 
 I wonder how effective trim would be on a Ceph journal area.
 If the journal empties and is then trimmed the next write cycle should
 be faster, but if the journal is active all the time the benefits
 would be lost almost immediately, as those cells are going to receive
 data again almost immediately and go back to an untrimmed state
 until the next trim occurs.
 
 If it's under-provisioned (so the device knows there are unused cells), the 
 device would simply write to an empty cell and flag the old cell for erasing, 
 so there should be no change.  Latency would rise when sustained write rate 
 exceeded the devices' ability to clear cells, so eventually the stock of 
 ready cells would be depleted.
 
 FWIW, I think there is considerable mileage in the larger-consumer grade 
 argument.  Assuming drives will be half the price in a years time, so 
 selecting devices that can last only a year is preferable to spending 3x the 
 price on one that can survive three.  That though opens the tin of worms that 
 is SMART reporting and moving journals at some future point mind.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw questions

2013-11-07 Thread John Wilkins

For #2, I just wrote a document on setting up a federated
architecture. You can view it here:
http://ceph.com/docs/master/radosgw/federated-config/ This
functionality will be available in the Emperor release.

The use case I described involved two zones in a master region talking
to the same underlying Ceph Storage Cluster, but with different sets
of pools for each zone. You can certainly set up pools for zones on
completely different Ceph Storage Clusters. I assumed that was
overkill, but you can certainly do it. See
http://ceph.com/docs/master/radosgw/federated-config/#configure-a-master-region
for configuring a master region.

If you want to use separate storage clusters for each zone, you need to:

1. Setup the set of pools for each zone in the respective ceph storage
cluster for your data center.
2. http://ceph.com/docs/master/radosgw/federated-config/#create-a-keyring
should use different cluster names to ensure that the keyring gets
populated in both Ceph Storage Clusters. We assume the default -c
/etc/ceph/ceph.conf for simplicity.
3. 
http://ceph.com/docs/master/radosgw/federated-config/#add-instances-to-ceph-config-file
when adding the instances to the Ceph configuration file, you need to
note that the storage cluster might be named. For example, instead of
ceph.conf, it might be us-west.conf and us-east.conf for the
respective zones, assuming you are setting up Ceph clusters
specifically to run the gateways--or whatever naming convention you
already use.

4. Most of the usage examples omit the Ceph configuration file (-c
file/path.conf) and the admin key (-k path/to/admin.keyring). You may
need to specify them explicitly when calling radosgw-admin so that you
are issuing commands to the right Ceph Storage Cluster.

I'd love to get your feedback on the document!

For #3. Yes. In fact, if you just setup a master region with one
master zone, that works fine. You don't have to respect pool naming.
Whatever you create in the storage cluster and map to a zone pool will
work. However, I would suggest following the conventions as laid out
in the document. You can create a garbage collection pool called
lemonade, but you will probably confuse the community when looking
for help as they will expect .{region-name}-{zone-name}.rgw.gc. If you
just use region-zone.{pool-name-default}, like us-west.rgw.root most
people in the community will understand any questions you have and can
more readily help you with additional questions.




On Wed, Nov 6, 2013 at 3:17 AM, Alessandro Brega
alessandro.bre...@gmail.com wrote:
 Good day ceph users,

 I'm new to ceph but installation went well so far. Now I have a lot of
 questions regarding radosgw. Hope you don't mind...

 1. To build a high performance yet cheap radosgw storage, which pools should
 be placed on ssd and which on hdd backed pools? Upon installation of
 radosgw, it created the following pools: .rgw, .rgw.buckets,
 .rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users,
 .users.email.

 2. In order to have very high availability I like to setup two different
 ceph clusters, each in its own datacenter. How to configure radowsgw to make
 use of this layout? Can I have a multi-master setup with having a load
 balancer (or using geo-dns) which distributes the load to radosgw instances
 in both datacenters?

 3. Is it possible to start with a simple setup now (only one ceph cluster)
 and later add the multi-datacenter redundancy described above without
 downtime? Do I have to respect any special pool-naming requirements?

 4. Which number of replaction would you suggest? In other words, which
 replication is need to achive 99.9% durability like dreamobjects states?

 5. Is it possible to map fqdn custom domain to buckets, not only subdomains?

 6. The command radosgw-admin pool list returns could not list placement
 set: (2) No such file or directory. But radosgw seems to work as expected
 anyway?

 Looking forward to your suggestions.

 Alessandro Brega


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

 Once I know a drive has had a head failure, do I trust that the rest of the 
 drive isn't going to go at an inconvenient moment vs just fixing it right 
 now when it's not 3AM on Christmas morning? (true story)  As good as Ceph 
 is, do I trust that Ceph is smart enough to prevent spreading corrupt data 
 all over the cluster if I leave bad disks in place and they start doing 
 terrible things to the data?

I have a lot more disks than I have trust in disks. If a drive lost a
head then I want it gone.

I love the idea of using smart data but can foresee see some
implementation issues. We have seen some raid configurations where
polling smart will halt all raid operations momentarily. Also, some
controllers require you to use their CLI tool to pool for smart vs
smartmontools.

It would be similarly awesome to embed something like an apdex score
against each osd, especially if it factored in hierarchy to identify
poor performing osds, nodes, racks, etc..

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] please help me.problem with my ceph

2013-11-07 Thread Kai

Hi 皓月,

You can try ls -al /mnt/ceph , check if the current user have W/R access to 
the directory. Maybe you need to use chown to change the directory owner.

Regards,
Kai

At 2013-11-06 22:03:31,皓月 suzhenh...@qq.com wrote:

1.  I have installed ceph with one mon/mds and one osd.When i use 'ceph 
-s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck 
unclean; recovery 21/42 degraded (50.000%)
2. i mount a client.'192.168.3.189:/   100G 1009M   97G   2% /mnt/ceph' 
 but i can't creat a file or a directory because of no permission.
my conf is listed bellow.please tell my how to fix these problems,thanks
;
; Sample ceph ceph.conf file.
;
; This file defines cluster membership, the various locations
; that Ceph stores data, and any other runtime options.


; If a 'host' is defined for a daemon, the init.d start/stop script will
; verify that it matches the hostname (or else ignore it).  If it is
; not defined, it is assumed that the daemon is intended to start on
; the current host (e.g., in a setup with a startup.conf on each
; node).


; The variables $type, $id and $name are available to use in paths
; $type = The type of daemon, possible values: mon, mds and osd
; $id = The ID of the daemon, for mon.alpha, $id will be alpha
; $name = $type.$id


; For example:
; osd.0
;  $type = osd
;  $id = 0
;  $name = osd.0


; mon.beta
;  $type = mon
;  $id = beta
;  $name = mon.beta


; global
[global]
; enable secure authentication
auth supported = cephx


; allow ourselves to open a lot of files
max open files = 131072


; set log file
log file = /var/log/ceph/$name.log
; log_to_syslog = true; uncomment this line to log to syslog


; set up pid files
pid file = /var/run/ceph/$name.pid


; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't 
possible
;ms bind ipv6 = true


; monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any node failures.  Always create an odd number.
[mon]
mon data = /data/$name


; If you are using for example the RADOS Gateway and want to have your 
newly created
; pools a higher replication level, you can set a default
osd pool default size = 1


; You can also specify a CRUSH rule for new pools
; Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
;osd pool default crush rule = 0


; Timing is critical for monitors, but if you want to allow the clocks 
to drift a
; bit more, you can specify the max drift.
;mon clock drift allowed = 1


; Tell the monitor to backoff from this warning for 30 seconds
;mon clock drift warn backoff = 30


; logging, for debugging monitor crashes, in order of
; their likelihood of being helpful :)
;debug ms = 1
;debug mon = 20
;debug paxos = 20
;debug auth = 20


[mon.alpha]
host = ca189
mon addr = 192.168.3.189:6789




; mds
;  You need at least one.  Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /data/keyring.$name


; mds logging to debug issues.
;debug ms = 1
;debug mds = 20


[mds.alpha]
host = ca189


; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like.
[osd]
; This is where the osd expects its data
osd data = /data/$name


; Ideally, make the journal a separate disk or partition.
 ; 1-10GB should be enough; more if you have fast or many
 ; disks.  You can use a file under the osd data dir if need be
 ; (e.g. /data/$name/journal), but it will be slower than a
 ; separate disk or partition.
; This is an example of a file-based journal.
osd journal = /data/$name/journal
osd journal size = 1000 ; journal size, in megabytes


; If you want to run the journal on a tmpfs (don't), disable DirectIO
;journal dio = false


; You can change the number of recovery operations to speed up recovery
; or slow it down if your machines can't handle it
; osd recovery max active = 3


; osd logging to debug osd issues, in order of likelihood of being
; helpful
;debug ms = 1
;debug osd = 20
;debug filestore = 20
;debug journal = 20




; ### The below options only apply if you're using mkcephfs
; ### and the devs options
; The filesystem used on the volumes
osd mkfs type = btrfs
; If you want to specify some other mount options, you can do so.
; for other filesystems use 'osd mount options $fstype'
osd mount options btrfs = rw,noatime
; The options used to format the filesystem via mkfs.$fstype
; for other filesystems use 'osd mkfs options $fstype'
; osd mkfs options btrfs =




[osd.0]
host = ca191


; if 'devs' is not specified, you're responsible for
; setting up the 'osd data' dir.
devs = /dev/mapper/vg_ca191-lv_ceph

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] please help me.problem with my ceph

2013-11-07 Thread Gruher, Joseph R

From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of ??
Sent: Wednesday, November 06, 2013 10:04 PM
To: ceph-users
Subject: [ceph-users] please help me.problem with my ceph

1.  I have installed ceph with one mon/mds and one osd.When i use 'ceph -
s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck
unclean; recovery 21/42 degraded (50.000%) 

I would think this is because Ceph defaults to a replication level of 2 and you 
only have one OSD (nowhere to write a second copy) so you are degraded?  You 
could add a second OSD or perhaps you could set the replication level to 1?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw questions

 1. To build a high performance yet cheap radosgw storage, which pools should
 be placed on ssd and which on hdd backed pools? Upon installation of
 radosgw, it created the following pools: .rgw, .rgw.buckets,
 .rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users,
 .users.email.

There is a lot that goes into high performance, a few questions come to mind:

Do you want high performance reads, writes or both?
How hot is your data, can you bet better performance from buying more
memory for caching?
What size objects do you expect to handle, how many per bucket?

 4. Which number of replaction would you suggest? In other words, which
 replication is need to achive 99.9% durability like dreamobjects states?

DreamObjects Engineer here, we used Ceph's durability modeling tools here:

https://github.com/ceph/ceph-tools

You will need to research your data disk's MTBF numbers and convert
them to FITS, measure your OSD backfill MTTR and factor in your
replication count. DreamObjects uses 3 replicas on enterprise SAS
disks. The durability figures exclude black swan events like fires and
other such datacenter or regional disasters, which is why having a
second location is important for DR.

 5. Is it possible to map fqdn custom domain to buckets, not only subdomains?

You could map a domain's A/ records to an endpoint but if the
endpoint changes your SOL, using a CNAME at the domain root violates
DNS rfcs. Some DNS providers will fake a CNAME by doing a recursive
lookup in response to an A/ request as a work around.

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Mike Dawson




Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture
Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 11/7/2013 2:12 PM, Kyle Bader wrote:

Once I know a drive has had a head failure, do I trust that the rest of the 
drive isn't going to go at an inconvenient moment vs just fixing it right now 
when it's not 3AM on Christmas morning? (true story)  As good as Ceph is, do I 
trust that Ceph is smart enough to prevent spreading corrupt data all over the 
cluster if I leave bad disks in place and they start doing terrible things to 
the data?


I have a lot more disks than I have trust in disks. If a drive lost a
head then I want it gone.

I love the idea of using smart data but can foresee see some
implementation issues. We have seen some raid configurations where
polling smart will halt all raid operations momentarily. Also, some
controllers require you to use their CLI tool to pool for smart vs
smartmontools.

It would be similarly awesome to embed something like an apdex score
against each osd, especially if it factored in hierarchy to identify
poor performing osds, nodes, racks, etc..


Kyle,

I think you are spot-on here. Apdex or similar scoring for gear 
performance is important for Ceph, IMO. Due to pseudo-random placement 
and replication, it can be quite difficult to identify 1) if hardware, 
software, or configuration are the cause of slowness, and 2) which 
hardware (if any) is slow. I recently discovered a method that seems 
address both points built.


Zackc, Loicd, and I have been the main participants in a weekly 
Teuthology call the past few weeks. We've talked mostly about methods to 
extend Teuthology to capture performance metrics. Would you be willing 
to join us during the Teuthology and Ceph-Brag sessions at the Firefly 
Developer Summit?


Cheers,
Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph Block Storage QoS

2013-11-07 Thread Gruher, Joseph R

Is there any plan to implement some kind of QoS in Ceph?  Say I want to provide 
service level assurance to my OpenStack VMs and I might have to throttle 
bandwidth to some to provide adequate bandwidth to others - is anything like 
that planned for Ceph?  Generally with regard to block storage (rbds), not 
object or filesystem.

Or is there already a better way to do this elsewhere in the OpenStack cloud?

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

 ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i.

The problem might be SATA transport protocol overhead at the expander.
Have you tried directly connecting the SSDs to SATA2/3 ports on the
mainboard?

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

 Zackc, Loicd, and I have been the main participants in a weekly Teuthology
 call the past few weeks. We've talked mostly about methods to extend
 Teuthology to capture performance metrics. Would you be willing to join us
 during the Teuthology and Ceph-Brag sessions at the Firefly Developer
 Summit?

I'd be happy to!

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS

2013-11-07 Thread Wido den Hollander


On 11/07/2013 08:42 PM, Gruher, Joseph R wrote:

Is there any plan to implement some kind of QoS in Ceph?  Say I want to
provide service level assurance to my OpenStack VMs and I might have to
throttle bandwidth to some to provide adequate bandwidth to others - is
anything like that planned for Ceph?  Generally with regard to block
storage (rbds), not object or filesystem.

Or is there already a better way to do this elsewhere in the OpenStack
cloud?



I don't know if OpenStack supports it, but in CloudStack we recently 
implemented the I/O throttling mechanism of Qemu via libvirt.


That might be a solution if OpenStack implements that as well?


Thanks,

Joe



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS

2013-11-07 Thread Andrey Korolyov

On Thu, Nov 7, 2013 at 11:50 PM, Wido den Hollander w...@42on.com wrote:
 On 11/07/2013 08:42 PM, Gruher, Joseph R wrote:

 Is there any plan to implement some kind of QoS in Ceph?  Say I want to
 provide service level assurance to my OpenStack VMs and I might have to
 throttle bandwidth to some to provide adequate bandwidth to others - is
 anything like that planned for Ceph?  Generally with regard to block
 storage (rbds), not object or filesystem.

 Or is there already a better way to do this elsewhere in the OpenStack
 cloud?


 I don't know if OpenStack supports it, but in CloudStack we recently
 implemented the I/O throttling mechanism of Qemu via libvirt.

 That might be a solution if OpenStack implements that as well?

Just a side note - current QEMU implements more gentle throttling than
a rest of the versions and it is very useful thing for NBD I/O burst
handling.


 Thanks,

 Joe



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage

2013-11-07 Thread Nelson Jeppesen

I've looked around but could not find it. Can I open a ticket for this
issue?

Not being able to enumerate users via API is a road block for me and I'd
like to work and get it resolved. Thanks.

-- 
Nelson Jeppesen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage

2013-11-07 Thread Yehuda Sadeh

You can do it through the metadata api. Try doing something like:

GET /admin/metadata/user

Yehuda

On Thu, Nov 7, 2013 at 12:06 PM, Nelson Jeppesen
nelson.jeppe...@gmail.com wrote:
 I've looked around but could not find it. Can I open a ticket for this
 issue?

 Not being able to enumerate users via API is a road block for me and I'd
 like to work and get it resolved. Thanks.

 --
 Nelson Jeppesen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Can someone please help me here?

2013-11-07 Thread Trivedi, Narendra

I can't install Ubuntu... I am not sure why would it do on a new install of 
CentOS. I wanted to try this to if I can take it as RBD/Radosgw backend for 
OpenStack production but I can't believe it has taken forever to get it running 
and I am not there yet!

-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra
Sent: Wednesday, November 06, 2013 4:45 PM
To: ja...@peacon.co.uk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy

Unfortunately, I don't have that luxury. 

Thanks!
Narendra

-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ja...@peacon.co.uk
Sent: Wednesday, November 06, 2013 4:43 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy

I also had some difficulty with ceph-deploy on CentOS.

I eventually moved to Ubuntu 13.04 - and haven't looked back.

On 2013-11-06 21:35, Trivedi, Narendra wrote:
 Hi All,

 I did a fresh install of Ceph (this might be like 10th or 11th
 install) on 4 new VMs (one admin, one MON and two OSDs) built from 
 CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This message contains information which may be confidential and/or privileged. 
Unless you are the intended recipient (or authorized to receive for the 
intended recipient), you may not read, use, copy or disclose to anyone the 
message or any information contained in the message. If you have received the 
message in error, please advise the sender by reply e-mail and delete the 
message and any attachment(s) thereto without retaining any copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This message contains information which may be confidential and/or privileged. 
Unless you are the intended recipient (or authorized to receive for the 
intended recipient), you may not read, use, copy or disclose to anyone the 
message or any information contained in the message. If you have received the 
message in error, please advise the sender by reply e-mail and delete the 
message and any attachment(s) thereto without retaining any copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

 I think this is a great idea.  One of the big questions users have is
 what kind of hardware should I buy.  An easy way for users to publish
 information about their setup (hardware, software versions, use-case,
 performance) when they have successful deployments would be very valuable.
 Maybe a section of wiki?

It would be interesting to a site where a Ceph admin can download an
API key/package that could be optionally installed and report
configuration information to a community API. The admin could then
supplement/correct that base information. Having much of the data
collection be automated lowers the barrier for contribution.  Bonus
points if this could be extended to SMART and failed drives so we
could have a community generated report similar to Google's disk
population study they presented at FAST'07.

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage

2013-11-07 Thread Nelson Jeppesen

Sweet, thanks!

I had to add --caps=metadata=read ,but it worked great.



On Thu, Nov 7, 2013 at 12:11 PM, Yehuda Sadeh yeh...@inktank.com wrote:

 You can do it through the metadata api. Try doing something like:

 GET /admin/metadata/user

 Yehuda

 On Thu, Nov 7, 2013 at 12:06 PM, Nelson Jeppesen
 nelson.jeppe...@gmail.com wrote:
  I've looked around but could not find it. Can I open a ticket for this
  issue?
 
  Not being able to enumerate users via API is a road block for me and I'd
  like to work and get it resolved. Thanks.
 
  --
  Nelson Jeppesen
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 




-- 
Nelson Jeppesen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] deployment architecture practices / new ideas?

2013-11-07 Thread Danny Al-Gaaf

Am 06.11.2013 15:05, schrieb Gautam Saxena:
We're looking to deploy CEPH on about 8 Dell servers to start, each of
which typically contain 6 to 8 harddisks with Perc RAID controllers which
support write-back cache (~512 MB usually). Most machines have between 32
and 128 GB RAM. Our questions are as follows. Please feel free to comment
on even just one of the questions below if that's the area of your
expertise/interest.

1. Based on various best practice guides, they suggest putting the OS
on a separate disk. But, we though that would not be good because we'd
sacrifice a whole disk on each machine (~3 TB) or even two whole disks (~6
TB) if we did a hardware RAID 1 on it. So, do people normally just
sacrifice one whole disk? Specifically, we came up with this idea:
1. We set up all hard disks as pass-through in the raid controller,
so that the RAID controller's cache is still in effect, but the OS sees
just a bunch of disks (6 to 8 in our case)
2. We then do a SOFTWARE-baised RAID 1 (using Centos 6.4) for the OS
across all 6 to 8 hardisks
3. We then do a SOFTWARE-baised RAID 0 (using Centos 6.4) for the
SWAP space.
4. *Does anyone see any flaws in our idea above? We think that RAID 1
is not computationally expensive for the machines to computer,
and most of
the time, the OS should be in RAM. Similarly, we think RAID 0 should be
easy for the CPU to compute, and hopefully, we won't hit much SWAP if we
have enough RAM. And this way, we don't sacrific 1 or 2 whole disks for
just the OS.*

Why not simply using smaller disks for the system?

That is what we do: use e.g. 500G 2.5 disks (e.g. WB VelociRaptor) for
the root system, if needed put these disks into a RAID1 HW-Raid. I would
always prefer hw- oder sw raid.

You could use the not needed space on these disks also for the OSD
journals if you use HDDs with 10.000 RPM.

2. Based on the performance benchmark blog of Marc Nelson (

http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/),
has anything substantially changed since then? Specifically, it suggests
that SSDs may not be really necessary if one has raid controllers with
write-back cache. Is this still true even though the article was written
with a version of CEPH that was over 1 year old? (Marc suggests that things
may change with newer versions of CEPH)
3. Based on our understanding, it would seem that CEPH can deliver very
high throughput performance (especially for reads) if dozens and dozeons of
hard disks are being accessed simultaneously across multiple machines. So,
we could have several GBs throughput, right? (CEPH never advertises the
advantage of read throughput with distributed architecture, so I'm
wondering if I'm missing something.) If so, then is it reasonable to assume
that one common bottleneck is the ethernet? So if we only use 1 NIC card at
1 GBs, that'll be a major bottleneck? If so, we're thinking of trying to
bond multiple 1 GB/s ethernet cards to make a bonded ethernet
connection of 4 GBs (4 * 1 GB/s).

Even these 4GB could be easily your bottleneck. That depends on you
workload. Especially if you use separated networks for the clients and
the cluster OSD backend (replication, backfill, recovery).

But we didn't see anyone discuss this
strategy? Is there any holes in it? Or does CEPH automatically take
advantage of multiple NIC cards without us having to deal with the
complexity (and expense of buying a new switch which supports bonding) for
doing bonding? That is, is it possible and a good idea to have CEPH OSDs be
set up to use specific NICs, so that we spread the load? (We read through
the recommendation of having different NICs for front-end traffic vs
back-end traffic, but we're not worried about network attacks -- so we're
thinking that just creating a big fat ethernet pipe gives us the most
flexibility.)

Depending on your budget it may makes sense to use 10G cards instead.

Separated traffic networkss isn't only about DDoS it's also to make sure
your replication traffic doesn't affect your client traffic and vice versa.

Danny
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cinder-volume rbd driver assumes that image is raw without checking

2013-11-07 Thread Dmitry Borodaenko

It appears that the RBD driver in Cinder only checks that the image is
accessible, and if it is, assumes it is cloneable, regardless of its
format. I think it would be more useful if the driver also confirmed
the image format, and reverted to straight copy instead of
copy-on-write if the format is anything else but raw.

Background:
https://bugs.launchpad.net/fuel/+bug/1246219

Thoughts?

-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

I have 2 SSDs (same model, smaller capacity) for / connected on the mainboard. 
Their sync write performance is also poor - less than 600 iops, 4k blocks. 

On Nov 7, 2013, at 9:44 PM, Kyle Bader kyle.ba...@gmail.com wrote:

 ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i.
 
 The problem might be SATA transport protocol overhead at the expander.
 Have you tried directly connecting the SSDs to SATA2/3 ports on the
 mainboard?
 
 -- 
 
 Kyle
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can someone please help me here?

On Thu, Nov 7, 2013 at 3:25 PM, Trivedi, Narendra
narendra.triv...@savvis.com wrote:
 I can't install Ubuntu... I am not sure why would it do on a new install of 
 CentOS. I wanted to try this to if I can take it as RBD/Radosgw backend for 
 OpenStack production but I can't believe it has taken forever to get it 
 running and I am not there yet!

 -Original Message-
 From: ceph-users-boun...@lists.ceph.com 
 [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra
 Sent: Wednesday, November 06, 2013 4:45 PM
 To: ja...@peacon.co.uk; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy

 Unfortunately, I don't have that luxury.

 Thanks!
 Narendra

 -Original Message-
 From: ceph-users-boun...@lists.ceph.com 
 [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ja...@peacon.co.uk
 Sent: Wednesday, November 06, 2013 4:43 PM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy

 I also had some difficulty with ceph-deploy on CentOS.

It would be useful to know what didn't work for you so we can improve
it, even if it is in the form of better/more docs.


 I eventually moved to Ubuntu 13.04 - and haven't looked back.


 On 2013-11-06 21:35, Trivedi, Narendra wrote:
 Hi All,

 I did a fresh install of Ceph (this might be like 10th or 11th
 install) on 4 new VMs (one admin, one MON and two OSDs) built from
 CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 This message contains information which may be confidential and/or 
 privileged. Unless you are the intended recipient (or authorized to receive 
 for the intended recipient), you may not read, use, copy or disclose to 
 anyone the message or any information contained in the message. If you have 
 received the message in error, please advise the sender by reply e-mail and 
 delete the message and any attachment(s) thereto without retaining any copies.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 This message contains information which may be confidential and/or 
 privileged. Unless you are the intended recipient (or authorized to receive 
 for the intended recipient), you may not read, use, copy or disclose to 
 anyone the message or any information contained in the message. If you have 
 received the message in error, please advise the sender by reply e-mail and 
 delete the message and any attachment(s) thereto without retaining any copies.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Loic Dachary



On 08/11/2013 04:57, Kyle Bader wrote:
 I think this is a great idea.  One of the big questions users have is
 what kind of hardware should I buy.  An easy way for users to publish
 information about their setup (hardware, software versions, use-case,
 performance) when they have successful deployments would be very valuable.
 Maybe a section of wiki?
 
 It would be interesting to a site where a Ceph admin can download an
 API key/package that could be optionally installed and report
 configuration information to a community API. The admin could then
 supplement/correct that base information. Having much of the data
 collection be automated lowers the barrier for contribution.  Bonus
 points if this could be extended to SMART and failed drives so we
 could have a community generated report similar to Google's disk
 population study they presented at FAST'07.
 

Would this be something like 
http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread james


On 2013-11-06 09:33, Sage Weil wrote:

On Wed, 6 Nov 2013, Loic Dachary wrote:

Hi Ceph,

People from Western Digital suggested ways to better take advantage 
of

the disk error reporting... when one head out of ten fails :
disks can keep working with the nine remaining heads. Losing 1/10 of 
the

disk is likely to result in a full re-install of the Ceph osd. But,
again, the disk could keep going after that, with 9/10 of its 
original

capacity. And Ceph is good at handling osd failures.


Yeah...but if you lose 1/10 of a block device any existing local file
system is going to blow up.  I suspet this is something that 
newgangled

interfaces like Kinetic will be much better at


I found some info on this at last in the SATA-IO 3.2 Spec, seemingly 
it's more to with RAID rebuilds:


Rebuild Assist: when a drive in a RAID configuration fails due to 
excessive data errors, it is possible
to reconstruct the data from the failed drive from the remaining drives 
– this is called a Rebuild. The
Rebuild Assist function speeds up the rebuild process by quickly 
recognizing which data on the failed

drive is unreadable

Source: 
https://www.sata-io.org/sites/default/files/images/SATA-IO%20FAQ%20-%20071813a%20%283%29.pdf


There is also some interesting info on SSHDs:

SSHD Optimization: a Solid State Hybrid Drive (SSHD) is an HDD that 
contains some amount of Flash
memory, thus increasing the performance of the drive. The Hybrid 
Information feature provides a
mechanism wherein a host can tell the drive which data to cache, 
further enhancing the
performance of the SSHD. In today’s SATA drives, reading and writing 
log data required the use of
non-queued commands, impacting overall system performance, especially 
SSHDs. A new feature in
v3.2 allows such commands to be queued, minimizing the impact on 
performance


But it seems the manufacturers are taking a different path; the new 
Seagate 3.5 Hybrid drives don't even support the ATA-8 NV Cache feature 
set unfortunately, according to the product manual:


http://www.seagate.com/files/staticfiles/support/docs/manual/desktop%20sshd/100726566.pdf



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

 Would this be something like 
 http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ?

Something very much like that :)

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Loic Dachary

Hi,

It looks like there indeed is enough interest to move forward :-) The next 
action items would be :

* Setup a home page somewhere ( should it be a separate web site or could we 
simply take over http://ceph.com/ ? )
* Create the About page describing the User Committee and get a consensus 
from interested parties on its goal and scope.
* Create an Event section to record meetups / conferences schedules ( 
http://ceph.com/event/ ? )
* Collect use cases ( publish them under http://ceph.com/community/blog/ ? 
elsewhere ? ) 
* Schedule a User Committee session for the next CDS

Cloudwatt kindly agreed to assign someone ( starting next week ) to help with 
logistics to organize events. For instance I'll be talking about Ceph in two 
weeks in Toulouse http://capitoledulibre.org/ : I will happily distribute 
goodies, if they are there. But if I have to acquire them ... It will be even 
more useful to organize a Ceph booth during FOSDEM or even ( too short notice 
maybe ? ) for the Cloudstack event in Amsterdam.

I'm willing to take action on setting up the home page, the about page and 
collecting use cases. Anyone willing to work on the other two ( event page, CDS 
session ) ? Or have ideas for even more action items ? Maybe organizing a local 
meetup ( the Cloudwatt person could help with that too, as long as she is told 
where and when ) ?

Cheers

P.S. If you're in Berlin november 30th 2013, we're having a Ceph oriented 
friendly poker game at http://c-base.org/ starting 7pm. I'm not sure that 
counts as an event but it's definitely an opportunity to discuss Ceph ;-)

On 07/11/2013 01:35, Loic Dachary wrote:
 Hi Ceph,
 
 I would like to open a discussion about organizing a Ceph User Committee. We 
 briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil 
 today during the OpenStack summit. A pad was created and roughly summarizes 
 the idea:
 
 http://pad.ceph.com/p/user-committee
 
 If there is enough interest, I'm willing to devote one day a week working for 
 the Ceph User Committee. And yes, that includes sitting at the Ceph booth 
 during the FOSDEM :-) And interviewing Ceph users and describing their use 
 cases, which I enjoy very much. But also contribute to a user centric 
 roadmap, which is what ultimately matters for the company I work for.
 
 If you'd like to see this happen but don't have time to participate in this 
 discussion, please add your name + email at the end of the pad. 
 
 What do you think ?
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Havana RBD - a few problems


On 11/08/2013 12:15 AM, Jens-Christian Fischer wrote:

Hi all

we have installed a Havana OpenStack cluster with RBD as the backing
storage for volumes, images and the ephemeral images. The code as
delivered in
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498
 fails,
because the RBD.path it not set. I have patched this to read:


Using libvirt_image_type=rbd to replace ephemeral disks is new with
Havana, and unfortunately some bug fixes did not make it into the
release. I've backported the current fixes on top of the stable/havana
branch here:

https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd


  * @@ -419,10 +419,12 @@ class Rbd(Image):
  * if path:
  * try:
  * self.rbd_name = path.split('/')[1]
  * + self.path = path
  * except IndexError:
  * raise exception.InvalidDevicePath(path=path)
  * else:
  * self.rbd_name = '%s_%s' % (instance['name'], disk_name)
  * + self.path = 'volumes/%s' % self.rbd_name
  * self.snapshot_name = snapshot_name
  * if not CONF.libvirt_images_rbd_pool:
  * raise RuntimeError(_('You should specify'


but am not sure this is correct. I have the following problems:

1) can't inject data into image

2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver
[req-f813ef24-de7d-4a05-ad6f-558e27292495
c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308]
[instance: 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into
  image fc8179d4-14f3-4f21-a76d-72b03b5c1862
2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api
[req-f813ef24-de7d-4a05-ad6f-558e27292495
c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308]
Ignoring error injecting data into image (Error mounting volumes/
instance-
0089_disk with libguestfs (volumes/instance-0089_disk: No such file
or directory))

possibly the self.path = … is wrong - but what are the correct values?


Like Dinu mentioned, I'd suggest disabling file injection and using
the metadata service + cloud-init instead. We should probably change
nova to log an error about this configuration when ephemeral volumes
are rbd.


2) Creating a new instance from an ISO image fails completely - no
bootable disk found, says the KVM console. Related?


This sounds like a bug in the ephemeral rbd code - could you file
it in launchpad if you can reproduce with file injection disabled?
I suspect it's not being attached as a cdrom.


3) When creating a new instance from an image (non ISO images work), the
disk is not resized to the size specified in the flavor (but left at the
size of the original image)


This one is fixed in the backports already.


I would be really grateful, if those people that have Grizzly/Havana
running with an RBD backend could pipe in here…


You're seeing some issues in the ephemeral rbd code, which is new
in Havana. None of these affect non-ephemeral rbd, or Grizzly.
Thanks for reporting them!

Josh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Patrick McGarry

Thanks for floating this out there Loic!

A few thoughts inline below:


On Thu, Nov 7, 2013 at 6:45 PM, Loic Dachary l...@dachary.org wrote:
 Hi,

 It looks like there indeed is enough interest to move forward :-) The next 
 action items would be :

 * Setup a home page somewhere ( should it be a separate web site or could we 
 simply take over http://ceph.com/ ? )

I think the best idea would be to use Ceph.com.  I don't have a
problem with giving certain people access to help curate content and
clean up site design during their tenure on the user committee.
Ceph.com really belongs to the community anyway (and is in need of
some love).

 * Create the About page describing the User Committee and get a consensus 
 from interested parties on its goal and scope.

Definitely.  I'd like to have a place to define the what, keep a
running tab of the who, and document some of the how.

 * Create an Event section to record meetups / conferences schedules ( 
 http://ceph.com/event/ ? )

This is probably best as a wiki page so that anyone can add meetups.

 * Collect use cases ( publish them under http://ceph.com/community/blog/ ? 
 elsewhere ? )

I'm guessing this starts as a wiki page for collection and can
probably be curated into Ceph.com.

 * Schedule a User Committee session for the next CDS

I can guarantee it.


 Cloudwatt kindly agreed to assign someone ( starting next week ) to help with 
 logistics to organize events. For instance I'll be talking about Ceph in two 
 weeks in Toulouse http://capitoledulibre.org/ : I will happily distribute 
 goodies, if they are there. But if I have to acquire them ... It will be even 
 more useful to organize a Ceph booth during FOSDEM or even ( too short notice 
 maybe ? ) for the Cloudstack event in Amsterdam.

Awesome, logistics are often the most difficult part.  I'm guessing
part of the user committee is going to be interfacing with corps (like
Inktank and others) who are willing to support the efforts of the
community.


 I'm willing to take action on setting up the home page, the about page and 
 collecting use cases. Anyone willing to work on the other two ( event page, 
 CDS session ) ? Or have ideas for even more action items ? Maybe organizing a 
 local meetup ( the Cloudwatt person could help with that too, as long as she 
 is told where and when ) ?

I'm happy to make sure the CDS session is set up and can facilitate
getting some folks access to ceph.com once we have a few volunteers
that raise their hands for web dev/content work.


 Cheers

 P.S. If you're in Berlin november 30th 2013, we're having a Ceph oriented 
 friendly poker game at http://c-base.org/ starting 7pm. I'm not sure that 
 counts as an event but it's definitely an opportunity to discuss Ceph ;-)

Awesome. Wish I was a wee bit closer.



The only other thing I want to add here is that while I think it's
important to outline a few things to give it structure (we don't want
folks to just flounder around and get frustrated), I also don't want
this to be extremely heavy-handed.  The more that we can keep this a
lightweight group the better off we'll be.

For now I'm happy to keep this ad hoc and try to coordinate things
around ceph.com; with the expectation being that we come out of CDS
with a solid plan for moving forward.


Best Regards,

Patrick


 On 07/11/2013 01:35, Loic Dachary wrote:
 Hi Ceph,

 I would like to open a discussion about organizing a Ceph User Committee. We 
 briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil 
 today during the OpenStack summit. A pad was created and roughly summarizes 
 the idea:

 http://pad.ceph.com/p/user-committee

 If there is enough interest, I'm willing to devote one day a week working 
 for the Ceph User Committee. And yes, that includes sitting at the Ceph 
 booth during the FOSDEM :-) And interviewing Ceph users and describing their 
 use cases, which I enjoy very much. But also contribute to a user centric 
 roadmap, which is what ultimately matters for the company I work for.

 If you'd like to see this happen but don't have time to participate in this 
 discussion, please add your name + email at the end of the pad.

 What do you think ?



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 --
 Loïc Dachary, Artisan Logiciel Libre


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS


On 11/08/2013 03:50 AM, Wido den Hollander wrote:

On 11/07/2013 08:42 PM, Gruher, Joseph R wrote:

Is there any plan to implement some kind of QoS in Ceph?  Say I want to
provide service level assurance to my OpenStack VMs and I might have to
throttle bandwidth to some to provide adequate bandwidth to others - is
anything like that planned for Ceph?  Generally with regard to block
storage (rbds), not object or filesystem.

Or is there already a better way to do this elsewhere in the OpenStack
cloud?



I don't know if OpenStack supports it, but in CloudStack we recently
implemented the I/O throttling mechanism of Qemu via libvirt.

That might be a solution if OpenStack implements that as well?


Indeed, that was implemented in OpenStack Havana. I think the docs
haven't been updated yet, but one of the related blueprints is:

https://blueprints.launchpad.net/cinder/+spec/pass-ratelimit-info-to-nova


Thanks,

Joe



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-agent failed to sync object


On 11/07/2013 09:48 AM, lixuehui wrote:

Hi all :
After we build a region with two zones distributed in two ceph
cluster.Start the agent ,it start works!
But what we find in the radosgw-agent stdout is that it failed to sync
objects all the time .Paste the info:
  (env)root@ceph-rgw41:~/myproject# ./radosgw-agent -c cluster-data-sync.conf -q

region map is: {u'us': [u'us-west', u'us-east']}
ERROR:radosgw_agent.worker:failed to sync object 
new-east-bucket/new-east.json: state is error
ERROR:radosgw_agent.worker:failed to sync object 
new-east-bucket/new-east.json: state is error
ERROR:radosgw_agent.worker:failed to sync object 
new-east-bucket/new-east.json: state is error
ERROR:radosgw_agent.worker:failed to sync object 
new-east-bucket/new-east.json: state is error
ERROR:radosgw_agent.worker:failed to sync object 
new-east-bucket/new-east.json: state is error

Metadata has already been copied form the master zone.I'd like to
know the reason ,and what the'state is error 'mean!


This means the destination radosgw failed to fetch to the object from
the source radosgw. Does the system user from the secondary zone exist
in the master zone?

If you enable 'debug rgw=30' for both radosgw and share the logs we
can see why the sync is failing.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS

2013-11-07 Thread Haomai Wang

On Fri, Nov 8, 2013 at 9:31 AM, Josh Durgin josh.dur...@inktank.com wrote:

 On 11/08/2013 03:50 AM, Wido den Hollander wrote:

 On 11/07/2013 08:42 PM, Gruher, Joseph R wrote:

 Is there any plan to implement some kind of QoS in Ceph?  Say I want to
 provide service level assurance to my OpenStack VMs and I might have to
 throttle bandwidth to some to provide adequate bandwidth to others - is
 anything like that planned for Ceph?  Generally with regard to block
 storage (rbds), not object or filesystem.

 Or is there already a better way to do this elsewhere in the OpenStack
 cloud?


 I don't know if OpenStack supports it, but in CloudStack we recently
 implemented the I/O throttling mechanism of Qemu via libvirt.

 That might be a solution if OpenStack implements that as well?


 Indeed, that was implemented in OpenStack Havana. I think the docs
 haven't been updated yet, but one of the related blueprints is:

 https://blueprints.launchpad.net/cinder/+spec/pass-ratelimit-info-to-nova


Yes, it seemed lack of necessary docs to guide users.

I just list commands below to help users to understand:

cinder qos-create high_read_low_write consumer=front-end
read_iops_sec=1000 write_iops_sec=10

cinder type-create type1

cinder qos-associate [qos-spec-id] [type-id]

cinder create --display-name high-read-low-write-volume --volume-type type1
100

nova volume-attach vm-1 high-read-low-write-volume /dev/vdb



  Thanks,

 Joe



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Alek Paunov


Hi Loic,

On 08.11.2013 00:19, Loic Dachary wrote:

On 08/11/2013 04:57, Kyle Bader wrote:

I think this is a great idea.  One of the big questions users have is
what kind of hardware should I buy.  An easy way for users to publish
information about their setup (hardware, software versions, use-case,
performance) when they have successful deployments would be very valuable.
Maybe a section of wiki?


It would be interesting to a site where a Ceph admin can download an
API key/package that could be optionally installed and report
configuration information to a community API. The admin could then
supplement/correct that base information. Having much of the data
collection be automated lowers the barrier for contribution.  Bonus
points if this could be extended to SMART and failed drives so we
could have a community generated report similar to Google's disk
population study they presented at FAST'07.



Would this be something like 
http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ?



It seems that all eyes are looking in the same or very close directions
:-)

Sage initially said wiki page per reference setup - outlined overview of
the context, specifics (e.g. defaults overrides and their reasoning),
possibly essential notes on some regular maintenance activities, etc. In
summary: the minimal readme or receipt enough for an admin to adapt
and replicate a proven setup.

Publishing of few concrete deployments in this form doesn't need any
development and will generate positive effect immediately - I'm doing
setup based on {wiki-page} with ... (differences), but ...

You (Loic) are developing on the practical basis for scaling all of this
at large: Convenient ceph-brag tool and online service - collecting of
detailed snapshot of the setup as it is visible from a Ceph node.

Kyle combines the two, saying: application of the collecting tool
followed by handcrafted shaping, linking and annotations before/after
publishing.

Personally, I most like Kyle's workflow - iterations of: tool based
collection - results in new version in the tool branch; applying fixes
trough the web editor - merging handcrafted defs branch;
publishing/communication.

Once the working prototype goes live, various derivatives could be
considered, e.g.:
 * Nice, possibly interactive diagrams (visual documentation) of the
   setup.
 * Standard reports with anchors for referencing in the mails.
 * Side projects for build and maintenance artifacts generation for
   various management platforms - ceph-deploy or different (of course
   assuming rejoining-back the private bits)
 * View/Report aiming extracting the essentials, roughly equivalent to
   the handcrafted Ceph setup receipt for the context.

Regards,
Alek

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Pool without a name, how to remove it?

I don't remember how this has come up or been dealt with in the past,
but I believe it has been. Have you tried just doing it via the ceph
or rados CLI tools with an empty pool name?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Nov 5, 2013 at 6:58 AM, Wido den Hollander w...@42on.com wrote:
 Hi,

 On a Ceph cluster I have a pool without a name. I have no idea how it got
 there, but how do I remove it?

 pool 14 '' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num
 8 pgp_num 8 last_change 158 owner 18446744073709551615

 Is there a way to remove a pool by it's ID? I couldn't find anything in
 librados do to so.

 --
 Wido den Hollander
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] near full osd

It sounds like maybe your PG counts on your pools are too low and so
you're just getting a bad balance. If that's the case, you can
increase the PG count with ceph osd pool name set pgnum higher
value.

OSDs should get data approximately equal to node weight/sum of node
weights, so higher weights get more data and all its associated
traffic.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler
kevin.wei...@imc-chicago.com wrote:
 All of the disks in my cluster are identical and therefore all have the same
 weight (each drive is 2TB and the automatically generated weight is 1.82 for
 each one).

 Would the procedure here be to reduce the weight, let it rebal, and then put
 the weight back to where it was?


 --

 Kevin Weiler

 IT



 IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606
 | http://imc-chicago.com/

 Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
 kevin.wei...@imc-chicago.com


 From: Aronesty, Erik earone...@expressionanalysis.com
 Date: Tuesday, November 5, 2013 10:27 AM
 To: Greg Chavez greg.cha...@gmail.com, Kevin Weiler
 kevin.wei...@imc-chicago.com
 Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] near full osd

 If there’s an underperforming disk, why on earth would more data be put on
 it?  You’d think it would be less….   I would think an overperforming disk
 should (desirably) cause that case,right?



 From: ceph-users-boun...@lists.ceph.com
 [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez
 Sent: Tuesday, November 05, 2013 11:20 AM
 To: Kevin Weiler
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] near full osd



 Kevin, in my experience that usually indicates a bad or underperforming
 disk, or a too-high priority.  Try running ceph osd crush reweight osd.##
 1.0.  If that doesn't do the trick, you may want to just out that guy.



 I don't think the crush algorithm guarantees balancing things out in the way
 you're expecting.



 --Greg

 On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler kevin.wei...@imc-chicago.com
 wrote:

 Hi guys,



 I have an OSD in my cluster that is near full at 90%, but we're using a
 little less than half the available storage in the cluster. Shouldn't this
 be balanced out?



 --

 Kevin Weiler

 IT



 IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606
 | http://imc-chicago.com/

 Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
 kevin.wei...@imc-chicago.com



 


 The information in this e-mail is intended only for the person or entity to
 which it is addressed.

 It may contain confidential and /or privileged material. If someone other
 than the intended recipient should receive this e-mail, he / she shall not
 be entitled to read, disseminate, disclose or duplicate it.

 If you receive this e-mail unintentionally, please inform us immediately by
 reply and then delete it from your system. Although this information has
 been compiled with great care, neither IMC Financial Markets  Asset
 Management nor any of its related entities shall accept any responsibility
 for any errors, omissions or other inaccuracies in this information or for
 the consequences thereof, nor shall it be bound in any way by the contents
 of this e-mail or its attachments. In the event of incomplete or incorrect
 transmission, please return the e-mail to the sender and permanently delete
 this message and any attachments.

 Messages and attachments are scanned for all known viruses. Always scan
 attachments before opening them.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 

 The information in this e-mail is intended only for the person or entity to
 which it is addressed.

 It may contain confidential and /or privileged material. If someone other
 than the intended recipient should receive this e-mail, he / she shall not
 be entitled to read, disseminate, disclose or duplicate it.

 If you receive this e-mail unintentionally, please inform us immediately by
 reply and then delete it from your system. Although this information has
 been compiled with great care, neither IMC Financial Markets  Asset
 Management nor any of its related entities shall accept any responsibility
 for any errors, omissions or other inaccuracies in this information or for
 the consequences thereof, nor shall it be bound in any way by the contents
 of this e-mail or its attachments. In the event of incomplete or incorrect
 transmission, please return the e-mail to the sender and permanently delete
 this message and any attachments.

 Messages and attachments are scanned for all known viruses. Always scan
 attachments before opening them.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com

Re: [ceph-users] About memory usage of ceph-mon on arm

I don't think this is anything we've observed before. Normally when a
Ceph node is using more memory than its peers it's a consequence of
something in that node getting backed up. You might try looking at the
perf counters via the admin socket and seeing if something about them
is different between your ARM and AMD processors.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Nov 5, 2013 at 7:21 AM, Yu Changyuan rei...@gmail.com wrote:
 Finally, my tiny ceph cluster get 3 monitors, newly added mon.b and mon.c
 both running on cubieboard2, which is cheap but still with enough cpu
 power(dual-core arm A7 cpu, 1.2G) and memory(1G).

 But compare to mon.a which running on an amd64 cpu, both mon.b and mon.c
 easily consume too much memory, so I want to know whether this is caused by
 memory leak. Below is the output of 'ceph tell mon.a heap stats' and 'ceph
 tell mon.c heap stats'(mon.c only start 12hr ago, while mon.a already
 running for more than 10 days)

 mon.atcmalloc heap stats:
 MALLOC:5480160 (5.2 MiB) Bytes in use by application
 MALLOC: + 28065792 (   26.8 MiB) Bytes in page heap freelist
 MALLOC: + 15242312 (   14.5 MiB) Bytes in central cache freelist
 MALLOC: + 10116608 (9.6 MiB) Bytes in transfer cache freelist
 MALLOC: + 10432216 (9.9 MiB) Bytes in thread cache freelists
 MALLOC: +  1667224 (1.6 MiB) Bytes in malloc metadata
 MALLOC:   
 MALLOC: = 71004312 (   67.7 MiB) Actual memory used (physical + swap)
 MALLOC: + 57540608 (   54.9 MiB) Bytes released to OS (aka unmapped)
 MALLOC:   
 MALLOC: =128544920 (  122.6 MiB) Virtual address space used
 MALLOC:
 MALLOC:   4655  Spans in use
 MALLOC: 34  Thread heaps in use
 MALLOC:   8192  Tcmalloc page size
 
 Call ReleaseFreeMemory() to release freelist memory to the OS (via
 madvise()).
 Bytes released to the


 mon.ctcmalloc heap stats:
 MALLOC:  175861640 (  167.7 MiB) Bytes in use by application
 MALLOC: +  2220032 (2.1 MiB) Bytes in page heap freelist
 MALLOC: +  1007560 (1.0 MiB) Bytes in central cache freelist
 MALLOC: +  2871296 (2.7 MiB) Bytes in transfer cache freelist
 MALLOC: +  4686000 (4.5 MiB) Bytes in thread cache freelists
 MALLOC: +  2758880 (2.6 MiB) Bytes in malloc metadata
 MALLOC:   
 MALLOC: =189405408 (  180.6 MiB) Actual memory used (physical + swap)
 MALLOC: +0 (0.0 MiB) Bytes released to OS (aka unmapped)
 MALLOC:   
 MALLOC: =189405408 (  180.6 MiB) Virtual address space used
 MALLOC:
 MALLOC:   3445  Spans in use
 MALLOC: 14  Thread heaps in use
 MALLOC:   8192  Tcmalloc page size
 
 Call ReleaseFreeMemory() to release freelist memory to the OS (via
 madvise()).
 Bytes released to the

 The ceph versin is 0.67.4, compiled with tcmalloc enabled,
 gcc(armv7a-hardfloat-linux-gnueabi-gcc) version 4.7.3 and I also try to dump
 heap, but I can not find anything useful, below is a recent dump, output by
 command pprof --text /usr/bin/ceph-mon mon.c.profile.0021.heap. What extra
 step should I  take to make the dump more meaningful?

 Using local file /usr/bin/ceph-mon.
 Using local file mon.c.profile.0021.heap.
 Total: 149.3 MB
146.2  97.9%  97.9%146.2  97.9% b6a7ce7c
  1.4   0.9%  98.9%  1.4   0.9% std::basic_string::_Rep::_S_create
 ??:0
  1.4   0.9%  99.8%  1.4   0.9% 002dd794
  0.1   0.1%  99.9%  0.1   0.1% b6a81170
  0.1   0.1%  99.9%  0.1   0.1% b6a80894
  0.0   0.0% 100.0%  0.0   0.0% b6a7e2ac
  0.0   0.0% 100.0%  0.0   0.0% b6a81410
  0.0   0.0% 100.0%  0.0   0.0% 00367450
  0.0   0.0% 100.0%  0.0   0.0% 001d4474
  0.0   0.0% 100.0%  0.0   0.0% 0028847c
  0.0   0.0% 100.0%  0.0   0.0% b6a7e8d8
  0.0   0.0% 100.0%  0.0   0.0% 0020c80c
  0.0   0.0% 100.0%  0.0   0.0% 0028bd20
  0.0   0.0% 100.0%  0.0   0.0% b6a63248
  0.0   0.0% 100.0%  0.0   0.0% b6a83478
  0.0   0.0% 100.0%  0.0   0.0% b6a806f0
  0.0   0.0% 100.0%  0.0   0.0% 002eb8b8
  0.0   0.0% 100.0%  0.0   0.0% 0024efb4
  0.0   0.0% 100.0%  0.0   0.0% 0027e550
  0.0   0.0% 100.0%  0.0   0.0% b6a77104
  0.0   0.0% 100.0%  0.0   0.0% _dl_mcount ??:0
  0.0   0.0% 100.0%  0.0   0.0% 003673ec
  0.0   0.0% 100.0%  0.0   0.0% b6a7a91c
  0.0   0.0% 100.0%  0.0   0.0% 00295e44
  0.0   0.0% 100.0%

Re: [ceph-users] RBD back

On Thu, Nov 7, 2013 at 1:26 AM, lixuehui lixue...@chinacloud.com.cn wrote:
 Hi all
 Ceph Object Store service  can spans geographical locals  . Now ceph also
 provides FS and RBD .IF our applications need the RBD service .Can we
 provide backup and disaster recovery for it via gateway  through some
 transfermation ?  In fact the cluster stored RBD data as objects in
 pools(default rbd), for another words, can we accomplish that backup some
 pools in a ceph cluster (without s3 )via the gateway .

There's not any way to do raw RADOS geo-replication at this time. You
*can* do RBD disaster recovery using snapshots and incremental
snapshot exports, though. Check out the export-diff/import-diff and
associated commands in the rbd tool.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kernel Panic / RBD Instability

Well, as you've noted you're getting some slow requests on the OSDs
when they turn back on; and then the iSCSI gateway is panicking
(probably because the block device write request is just hanging).
We've gotten prior reports that iSCSI is a lot more sensitive to a few
slow requests than most use cases, and OSDs coming back in can cause
some slow requests, but if it's a common case for you then there's
probably something that can be done to optimize that recovery. Have
you checked into what's blocking the slow operations or why the PGs
are taking so long to get ready?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Nov 5, 2013 at 1:33 AM, James Wilkins
james.wilk...@fasthosts.com wrote:
 Hello,

 Wondering if anyone else has come over an issue we're having with our POC 
 CEPH Cluster at the moment.

 Some details about its setup;

 6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics
 4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB Nics
 3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel ISCSI) - 4 x 
 10GB Nics.  An RBD device is mounted and exported via targetcli, this is then 
 mounted on a client device to push backup data.

 All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4

 Machines are split over two racks (distinct layer 2 domains) using a 
 leaf/spine model and we use ECMP/quagga on the ISCSI heads to reach the CEPH 
 Cluster.

 Crush map has racks defined to spread data over 2 racks -  I've attached the 
 ceph.conf

 The cluster performs great normally, and we only have issues when simulating 
 rack failure.

 The issue comes when the following steps are taken

 o) Initiate load against the cluster (backups going via ISCSI)
 o) ceph osd set noout
 o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the nodes in 
 1 Rack)
 o) Cluster goes degraded, as expected

   cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4
health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs stuck 
 unclean; recovery 1286582/2700870 degraded (47.636%); 108/216 in osds are 
 down; noout flag(s) set
monmap e3: 5 mons at 
 {fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0},
  election epoch 74, quorum 0,1,2,3,4 
 fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ceph01-mon-04,fh-ceph01-mon-05
osdmap e4237: 216 osds: 108 up, 216 in
 pgmap v117686: 7328 pgs: 266 active+clean, 6 stale+active+clean, 6780 
 active+degraded, 276 stale+active+degraded; 3511 GB data, 10546 GB used, 794 
 TB / 805 TB avail; 1286582/2700870 degraded (47.636%)
mdsmap e1: 0/0/1 up


 2013-11-05 08:51:44.830393 mon.0 [INF] pgmap v117685: 7328 pgs: 1489 
 active+clean, 1289 stale+active+clean, 3215 active+degraded, 1335 
 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 
 1048742/2700870 degraded (38.830%);  recovering 7 o/s, 28969KB/s

 o) As OSDS start returning

 2013-11-05 08:52:42.019295 mon.0 [INF] osd.165 172.17.11.9:6864/6074 boot
 2013-11-05 08:52:42.023055 mon.0 [INF] osd.154 172.17.11.9:6828/5943 boot
 2013-11-05 08:52:42.024226 mon.0 [INF] osd.159 172.17.11.9:6816/5820 boot
 2013-11-05 08:52:42.031996 mon.0 [INF] osd.161 172.17.11.9:6856/6059 boot

 o) We then see some slow requests;

 2013-11-05 08:53:11.677044 osd.153 [WRN] 6 slow requests, 6 included below; 
 oldest blocked for  30.409992 secs
 2013-11-05 08:53:11.677052 osd.153 [WRN] slow request 30.409992 seconds old, 
 received at 2013-11-05 08:52:41.266994: osd_op(client.16010.1:13441679 
 rb.0.21ec.238e1f29.0012fa28 [write 2854912~4096] 3.516ef071 RETRY=-1 
 e4240) currently reached pg
 2013-11-05 08:53:11.677056 osd.153 [WRN] slow request 30.423024 seconds old, 
 received at 2013-11-05 08:52:41.253962: osd_op(client.15755.1:13437999 
 rb.0.21ec.238e1f29.0012fa28 [write 0~233472] 3.516ef071 RETRY=1 e4240) v4 
 currently reached pg

 o) A few minutes , the ISCSI heads start panicking

 Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664305] [ cut 
 here ]
 Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664313] WARNING: at 
 /build/buildd/linux-lts-raring-3.8.0/kernel/watchdog.c:246 wat
 chdog_overflow_callback+0x9a/0xc0()
 Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664315] Hardware name: 
 PowerEdge R620
 Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664317] Watchdog detected 
 hard LOCKUP on cpu 6
 Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664318] Modules linked in: 
 ib_srpt(F) tcm_qla2xxx(F) tcm_loop(F) tcm_fc(F) iscsi_t
 arget_mod(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) 
 target_core_mod(F) rbd(F) libceph(F) ipmi_devintf(F) ipm
 i_si(F) ipmi_msghandler(F) qla2xxx(F) libfc(F) scsi_transport_fc(F) 
 scsi_tgt(F) configfs(F) dell_rbu(F) ib_iser(F) rdma_cm(F) ib_cm(
 F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F)

[ceph-users] radosgw-agent sync object:state is error

2013-11-07 Thread lixuehui

Hi list 
We deploied a master zone and a slave zone  in two clusters to test the 
multi-locals backup . The radosgw-anget sync buckets successfully .We can find 
the same buckets info in the slave zone .
But the running radosgw-agent throw out error info that  objects 
failed to sync , just like that:
ERROR:radosgw_agent.worker:failed to sync object bucket-test4/s3stor.py: state 
is error 
At first , I considered that it was owned to I forget to set the  
placement_pools parm in the zone configure. After correction , it goes on 
.Zone configure :
 { domain_root: .us-east.rgw.root,
  control_pool: .us-east.rgw.control,
  gc_pool: .us-east.rgw.gc,
  log_pool: .us-east.log,
  intent_log_pool: .us-east.intent-log,
  usage_log_pool: .us-east.usage,
  user_keys_pool: .us-east.users,
  user_email_pool: .us-east.users.email,
  user_swift_pool: .us-east.users.swift,
  user_uid_pool: .us-east.users.uid,
  system_key: { access_key: PSUXAQBOE0N60C0Y3QJ7, secret_key: 
l5peNL/nfTkAjl28uLw/WCKk2LSNa4hdS6VheJ6x},
placement_pools: [
  {  key: default-placement,
 val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets}
  }
] 
}
 { domain_root: .us-west.rgw.root,
  control_pool: .us-west.rgw.control,
  gc_pool: .us-west.rgw.gc,
  log_pool: .us-west.log,
  intent_log_pool: .us-west.intent-log,
  usage_log_pool: .us-west.usage,
  user_keys_pool: .us-west.users,
  user_email_pool: .us-west.users.email,
  user_swift_pool: .us-west.users.swift,
  user_uid_pool: .us-west.users.uid,
  system_key: { access_key: WUHDCDMWBG4GMT9B7QL7, secret_key: 
RSaYh90tNIdaImcn9QoSyK\/EuIrZSeXdOoa6Fw7o},
placement_pools: [
  {  key: default-placement,
 val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets}
  }
] 
}

In slave zone, the .rgw.buckets's constent like this:


.dir.us-east.4513.1
.dir.us-east.4513.2
.dir.us-east.4513.3
.dir.us-east.4513.4
There's nothing about objects in slave zone ,which stored in the master zone.
The .rgw.buckets.index of slave zone is empty,while that of master zone 
contains some contents.
 .dir.default.4647.1 
What the problem could it be? We wait for any suggestion from you ! 



lixuehui___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS

2013-11-07 Thread james



On 2013-11-08 03:20, Haomai Wang wrote:
On Fri, Nov 8, 2013 at 9:31 AM, Josh Durgin josh.dur...@inktank.com 
wrote:


I just list commands below to help users to understand:

cinder qos-create high_read_low_write consumer=front-end
read_iops_sec=1000 write_iops_sec=10



Does this have any normalisation of the IO units, for example to 8K or 
something?  In VMware we have similar controls for ages but they're not 
useful, as a Windows server will through out 4MB IO's and skew all the 
metrics.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread James Pearce

when one head out of ten fails: disks can keep working with the 
nine remaining heads...
some info on this at last in the SATA-IO 3.2 Spec... Rebuild 
Assist...


Some info on the command set (SAS  SATA implementations):

http://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/tp620-1-1110us-reducing-raid-recovery.pdf
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Block Storage QoS